Tipultech logo

Auditory Scene Analysis

Dawn Senathi-Raja

The auditory system takes all sounds derived from rapidly changing pitch percepts of the natural environment and reconstructs a useful representation of reality. The nature of this processing has been described by Bregman as 'auditory scene analysis' (Bregman, 1990, 2002). The early Gestalt psychologists proposed that the brain groups auditory elements into configurations using simple rules of proximity, similarity, good continuation and common fate (Wertheimer, 1923, cited in Deutsch, 1999; see Table 1.3 and Figure 1.7).

Table 1.3

Gestalt Grouping Principles Used in Auditory Scene Analysis

Principle Description
Proximity Closer sounds are grouped together in preference to those spaced further apart.
Similarity Sounds resembling one another are grouped together as likely from the same source.
Symmetry Related sounds exhibit symmetrical auditory properties.
Good continuation Sounds that continue each other are perceptually linked.
Common Fate Sounds moving together are likely to be connected.
Gestalt grouping principles (taken from Shepard & Levitin, 2002, p.512)

Gestalt principles facilitate the grouping of components in an auditory scene (Bregman, 1990, 2002). Distributed areas of the secondary auditory association cortices are thought to determine Gestalt patterns that are perceived from perceptual inputs (Patterson, Uppenkamp, Johnsrude, & Griffiths, 2002; refer to Figure 1.8). Gestalt principles are particularly relevant for degraded or ambiguous stimuli, as is often the case in our environment. In reconstructing sound events, perceptual grouping mechanisms allow linkages to be formed between some elements and inhibit linkages between others (Deutsch, 1999). For instance, in trying to listen to a single stream of events, such as the spoken word 'shoe', the sound energy produced by this auditory event is mixed with frequency components arising from other concurrent events in the environment, such as a violin playing or a car passing. This concept is illustrated in Figures 1.9 and 1.10. For the brain to build separate perceptual descriptions of sound-generating events, it must identify which combination of frequency components has arisen from a particular sound source (Bregman, 1990). Only by combining the right set of complex pitches, or frequency components, into the correct pattern can the identity of the signal be recognised.

Neuroanatomical locations of the secondary auditory association cortex from a lateral perspective 
(taken from Wilson, 1996, p.130). The secondary auditory association cortex corresponds to the middle temporal gyrus,
 the inferior temporal gyrus, the superior temporal gyrus, and the posterior section of the middle, inferior and superior temporal gyrus.
A spectrogram of the word ´┐¢shoe´┐¢ spoken in isolation (taken from Bregman, 2002, p.218).
A spectrogram of a mixture of sounds containing the word ´┐¢shoe´┐¢ (taken from Bregman, 2002, p.219).

Spatial Sound Processing

Auditory scene analysis requires the processing of a sound object's location in space. The role of spatial properties of sound in combining complex pitch patterns of a source is crucial in pattern organisation (Bregman, 2002). Auditory scene analysis deficits have been found to occur in the form of impaired detection of sound movement properties (Bisiach, Cornacchia, Sterzi, & Vallar, 1984; Griffiths et al., 1997; Pinek & Brouchon, 1992). Such deficits have been demonstrated in studies of right hemisphere lesions, suggesting the involvement of areas outside the primary auditory cortex, such as the insula (Griffiths, Bench, & Frackowiak, 1994) and parietal cortex (Anderson, 1995).

Sequential Streaming

Auditory scene analysis relies on sequential streaming in order to track components of a single sound source (Bregman, 1990; Deutsch, 1975). Sequential streaming connects events that have arisen at different times from the same source into a perceptual stream of sound (see Figure 1.11). The Gestalt grouping rules dictate the occurrence of this process through the continuity, proximity, and similarity of sequentially heard tones. Functional magnetic resonance imaging has shown that anterior auditory areas afford a mechanism for tracking pitch patterns from one sound source (Warren, Uppenkamp, Patterson, & Griffiths, 2003).

Example of sequential streaming using six tones (taken from Bregman & Ahad, 1995, p.8).
In the slow version, the alternation of high and low tones is clearly heard. 
In the fast version, two streams of sound are heard, one formed of high tones, the other of low tones.

Simultaneous Streaming

Auditory scene analysis requires the process of simultaneous streaming to determine which combination of components from several sound sources belong together. Simultaneous streaming takes acoustic inputs that occur at the same time, but at different places in the auditory scene, and treats them as properties of a single sound (see Figure 1.12). This process is more likely to occur when sound components follow Gestalt grouping rules of similarity, symmetry, common fate, and proximity (Bregman, 1990).

Example of simultaneous streaming using three tones (taken from Bregman & Ahad, 1995, p.48).
When the frequency separation between tones A and B is large, A segregates from B allowing B to fuse with C.
 When A and B are close in frequency, they fuse together to form one pure-tone stream, breaking the tendency of B to fuse with C,
 thereby excluding C into a second stream.

<< >>