Music to Zebra Finch Ears: Which Acoustic Cues Do Zebra Finches Attend to When Listening to Song?

Beth Vernaleo –
Robert J. Dooling –
University of Maryland
Neuroscience and Cognitive Science Program, Dept. of Psychology
College Park, MD 20742

Marjorie R. Leek –
National Center for Rehabilitative Auditory Research
Portland VA Medical Center
Portland, OR 97239

Popular version of paper 3aABb3 presented at the 2010 159th Meeting in Baltimore, Maryland.

Birdsong has long served as a model for vocal development and communication. Zebra finches are a particularly interesting model because they sing just one song their entire lives, and they sing this song with little variation from rendition to rendition. Male zebra finches sing to other males to defend their territories and to females for mating displays. Thus, perception of song is important for species survival.

Songs consists of bursts of sounds (syllables), separated by silence (intervals) (Figure 2). A motif is a specific ordering of syllables. Motifs range from 3 to 8 syllables long, and each syllable in the motif is unique. Zebra finch songs consist of several introductory notes, followed by a few renditions of the motif, sung in a very repetitive manner.

Listen to the zebra finch song motif:


Figure 1. A male zebra finch

Figure 2. A zebra finch song motif.

If we take a closer look at a single song motif, we can see that each individual syllable is acoustically complex. Each syllable contains an amplitude envelope (loudness cues) spectral structure (pitch, timbre, or frequency cues) and temporal fine structure (millisecond phase cues), which result in a set of syllables all of which are acoustically distinct. Are all of these cues equally important for song perception?

Previous work has shown that zebra finches are quite good at discriminating changes to their song motifs, specifically temporal reversals of single syllables (i.e., playing syllables backwards). Below is an example of this type of change (Figure 2). To find out which acoustic cues zebra finches are listening to when they make these discriminations, we tested discrimination of reversals of single syllables in a few types of modified song motifs, which were designed to isolate these acoustic cues.

Figure 3. An example of a temporal reversal of a single syllable with a song motif.

Figure 4. A magnification of the same syllable and it’s reversed version.

Birds were tested on discrimination tasks, in which they had to discriminate between two sounds: the original song motif and a song motif with a temporally reversed syllable. The testing setup consists of a cage with two LEDs (red and green) placed on the wall and a speaker overhead through which sounds are played to the bird. The original song motif was played as a repeating background, and the bird had to peck the left (red) observation LED as this motif is being played. At random times, a target stimulus (the changed motif) was played to the bird instead of the original motif. If the bird detected a change from the original motif, he was trained to peck the right (green) report LED in order to receive a food reward. If he did not detect a change, then he was to peck the left LED. By recording which keys the bird pecked during the changed motifs, we could determine which changes the bird could detect, how quickly he could detect these changes, and which changes the bird seemed unable to detect. Below is a video showing the experimental setup and a bird completing this task.

Syllable reversals were tested in natural motifs, as well as synthetic motifs in which certain acoustic cues were removed. One type of synthetic motif (noise motif) consisted of the motif amplitude envelope filled with random noise. This type of motif removes any spectral structure of individual syllables, and the main cue that remains is amplitude envelope (short-term loudness cues). A second type of synthetic motif (Schroeder motif) consisted of Schroeder harmonic waveforms in which only fine structure changed when the waveform was temporally reversed (i.e., no pitch or short-term loudness cues). The amplitude envelope and spectrum remained the same across time. However, there were phase changes across frequencies such that the starting phase of each harmonic increased. Thus, there was an upward sweep of phase across frequencies that became a downward sweep when the harmonic was temporally reversed. Examples of a syllable reversal in a natural motif and both synthetic motifs are shown below. In each case, the last syllable G is reversed. Both the waveform of the motif and the spectrogram are shown.

Figure 5. A natural motif in which the last syllable is reversed in time.

Figure 6. A synthetic noise motif in which the last syllable is reversed in time.

Figure 7. A synthetic Schroeder motif in which the last syllable is reversed in time.

Birds are best at temporal reversals of syllables when all acoustic cues are present, in the natural song motif (shown in blue). When spectral structure is removed, but the amplitude envelope remains intact, birds show a decrease in performance but are still able to discriminate most syllable reversals. When only phase information is present, birds perform almost as well as they do on natural motifs. This is surprising, because phase changes take place over milliseconds, whereas envelope changes take place over much slower timescales.

This suggests that birds can use the multiple acoustic cues available to them for the task, but they are best when all cues are available. This is reminiscent of human speech, in which there are multiple redundant cues available for speech recognition.However, one difference is that zebra finches are able to use either envelope or temporal fine structure cues alone to discriminate changes in syllables. Humans on the other hand, primarily use envelope cues for speech recognition. Humans can easily understand sentences when the speech envelope is filled with random noise, but cannot understand speech when the envelope is removed.

This ability of zebra finches to use temporal fine structure suggests that it is an important cue for song perception. In human speech perception, temporal fine structure is necessary for understanding speech in noisy environments. Perhaps it serves a similar purpose for zebra finches. Zebra finches live in wooded areas, among many other birds that share similar songs. Perception of temporal fine structure in song may be necessary for singer identification, and transmission of song over distances.