Surrounded By Sound - A Sonic Revolution

Part of The Royal Society’s Summer Science Exhibition 2001

 The Royal Society's Home Page


The exhibit begins to answer the question of how can we electronically manufacture an acoustic world that is indistinguishable from what we normally hear around us by examining the use of cinema style multi-speaker surround-sound, personalised sound over headphones, and the recorded characteristics of both real and virtual rooms.



Stereophonic sound reproduction was introduced in the 1960s and soon became established as the standard playback mechanism for essentially all types of recorded audio, and accolade it has held onto to this day. However, well before this, with the cinematic release of Disney’s Fantasia in 1939 composers and sound engineers began to develop techniques for a more immersive listening experience, electronically creating a virtual acoustic world using multiple speakers positioned around the audience. Multi-speaker techniques are now commonplace in cinema sound systems and are becoming more readily available in the home with the advent of home cinema systems and DVD. PC soundcards with surround-sound enhancements for games and other entertainment software are also commonly available. Surround-sound audio can also be produced using only standard headphones or stereo speakers together with precise measurements obtained from the detailed physical characteristics of our head and ears.

Our ultimate goal is therefore to rise to the challenge of electronically manufacturing a complex three-dimensional acoustic world that is indistinguishable from what we normally hear around us. The potential result has applications in music composition and playback, art, architectural design, cinema, television and gaming entertainment, telecommunications and user interface design.

Our related research objectives can therefore be identified as:



The size, shape and dimensions of a room, together with the actual materials used in its construction, all have a critical part to play in the quality of any sound heard within it. Imagine a gun being fired inside a large room or hall. This short, sharp and very loud event causes a variation in localised air pressure that is transmitted through the air itself and spreads out through the room in every direction – a sound wave. This sound wave will travel unhindered until it reaches an object where it will be partially absorbed and partially reflected. There may also be diffraction effects, where the sound bends round an object or passes through a gap (such as an open door into another room), resulting in further spreading of the original sound wave. Very quickly (within 100ms) the sound from our gunshot has spread throughout the room, and been reflected, absorbed and diffracted according to the room’s physical properties. The resulting superposition of the these complex wavefronts can cause acoustic pressure peaks and nulls at various points around the room, in what is known as constructive and destructive interference, respectively. Some of the wavefronts will travel repeated, regular reflection paths due to the geometry of the room resulting in the dominance of particular interference patterns and the enhancement of particular frequencies at specific positions around the room – these frequencies are called the room modes.

The characteristic build up of sound in a room as simulated using a digital waveguide mesh model. (left to right) (1) A short, sharp, impulsive sound fired into the larger of two rooms causes a circular wavefront to spread out from the sound source. (2) The sound wave is reflected from the walls and part of it passes through a gap into the smaller room. (3) In the larger room, interference effects are clearly visible; in the smaller room, the soundwave has spread out into an arc, demonstrating the effects of diffraction. (4) A short while after the initial event, the sound energy has spread out in a much more random and complex fashion.

This complex acoustic behaviour can be uniquely captured at a point within the room by using a single measurement called the Impulse Response. The gunshot sound input to the room ideally contains equal amounts of every audio frequency we are interested in, and by measuring (or listening) to this sound at another point in the room it is possible to examine how each frequency has been changed by its interactions with the room. The impulse response itself is not very interesting to listen to – lasting anywhere between 0.1 to 10 seconds depending on the size of the room and how reflective the surfaces are – and sounds like a click with a prolonged and decaying tail. This decaying part of the impulse response is due to the reverberation present in the room and is typically the characteristic "hanging on" quality of a sound that can be heard once the sound source itself has become silent. For instance, reverberation can be heard quite clearly in an empty church. However it is possible to take this very boring sound and apply it to any other sound using digital signal processing. The result is that we can make any sound appear to be coming from inside any particular space as long as we know its impulse response.

A profile of the typical characteristic build up of sound in a room consisting of direct sound, early reflections and reverberation.

The impulse response of a room can be measured directly although for most applications it is usually more practical to calculate an approximation using an acoustic model.

Modelling the acoustics of a room, using a digital waveguide mesh computer simulation. Notice the reflections at the boundaries and the diffraction and interference effects in the partitioned area, caused by gaps in the dividing wall. In the background, a room impulse response obtained from such a model.


The wave propagation phenomena present in a room, as discussed above, are equally applicable on a smaller scale when considering the effect of the human head on sound perception. Differences in the arrival time and amplitude of a sound at each ear together with the diffraction of sound waves around the head, when combined with the minute reflections that occur due to the pinnae (the fleshy part of the outer ears), produce particular constructive and destructive interference patterns. These patterns alter the frequency content of the sound that reaches the ear-drums and this direction dependent information allows us to determine the direction of the original sound.

The acoustic behaviour of the head is unique for each person and can be described by a set of Head-Related Impulse Responses, with a left/right ear pair for each sound source direction. Head-Related Impulse Responses last for approximately only 5ms, around 1000 times shorter than that of a typical concert hall, and again sound like a short, sharp click. As with the room based impulse response, digital signal processing allows the characteristics of the Head-Related Impulse Responses to be applied to any audio source. It is then possible to place a sound at any position in a 3-D virtual space around the listener’s head, and reproduce it using only headphones (or stereo speakers). Further, if the room impulse response is measured at the entrance to the listener’s ears, rather than at a single point, then these two sets of frequency characteristics combine to impart both the environmental and directional acoustic properties of the space being measured.

Aspects of 3-D sound: (Foreground) KEMAR mannequin head and torso, together with a speaker, used in the measurement of the head-related impulse response. (Mid) Three-dimensional frequency plot of measured head-related impulse responses varying with the direction of the sound in relation to the head. (Background) Three-dimensional model of the outer ear.


The word "stereophonic" is derived from Greek, and means "solid sound", referring to the construction of believable, solid, stable sound images, regardless of how many loudspeakers are used. It can be applied to surround-sound systems as well as to simple two audio channel techniques - the original Dolby Surround system was called Dolby Stereo, even though it was a four-channel system. However, most people are used to thinking of stereo as having two channels.

When a single sound is played at equal levels from two stereo loudspeakers (or headphones), we perceive the sound as coming from the mid-point between them. This is because there is no difference in the time taken for a sound from each speaker to reach each our ears. Making the sound from one of the speakers louder shifts the sound image towards the louder side, and we perceive the sound as moving either to the left or the right depending on which speaker is now the loudest. This technique is termed panning.

At a basic level a similar technique can be applied to presenting sound images over an array of speakers distributed around the listener. For most cinema or domestic surround-sound systems there are different speakers for different roles in the control and reproduction of the acoustic environment. The front left and right speakers are used for the accompanying music and surround effects. Two speakers are located at the centre of the screen, one of which is used for dialogue, the other being only used for low frequency sounds. There are also two speakers to the rear of the listeners that are used exclusively for surround-sound effects. Sounds can be panned around these speakers to produce various effects. However to produce a convincing effect often other factors have to be considered related to the psychology of how we hear. For instance a Doppler Shift effect can be used to accentuate the movement of a sound as it is panned between the speakers and hence moved around the listener. Doppler Shift is the change of pitch that we hear as a fast moving object moves towards and then away from us – this is typically only found in the real world with very fast moving objects over large distances, such as the sound of a siren on a police car, or the horn on a locomotive. However we associate this characteristic effect with sound movement so it will often be used subtly on other sounds to impart the required dramatic result. Another common effect used in cinema surround-sound is to make an aircraft fly over the heads of the audience – yet the speakers used to do this are at positions not much higher than our ears when we are seated – they are not placed above the audience in the ceiling of the cinema itself. The sound of the aircraft is actually panned around the speakers from front to back – yet we associate the real world sound of an aircraft flying as coming from way above us and so that is what we perceive.

Something to think about...

Bibliography and Links

Acoustics and Psychoacoustics, David M. Howard and James Angus, Focal Press, Butterworth-Heinemann, Oxford, 2000.

Spatial Sound, Francis Rumsey, Focal Press, Butterworth-Heinemann, Oxford, 2001.
The University of York Music Technology Homepage
Web Resource for Acoustics and Psychoacoustics book:
Surround Pro Magazine:
Audio and 3-D sound links:
3-D Sound Information and Resources:
Dolby – Surround-sound developers:
Ambisonic Surround Sound:
Digital Signal Processing: A Tutorial