Surrounded By Sound - A Sonic Revolution

Surrounded By Sound - A Sonic Revolution

Part of The Royal Society’s Summer Science Exhibition 2001

INTRODUCTION

The exhibit begins to answer the question of how can we electronically manufacture an acoustic world that is indistinguishable from what we normally hear around us by examining the use of cinema style multi-speaker surround-sound, personalised sound over headphones, and the recorded characteristics of both real and virtual rooms.

THE EXHIBIT - WHAT YOU WILL SEE AND HEAR

There will be an interactive, multi-speaker surround-sound installation that will immerse you in an artificially created acoustic world and give you control of the sound events that take place within it.
You will hear examples of 3D-sound over headphones, listen through the ears of a dummy head as typically used for binaural monitoring and recording, and be able to examine a 3-D model of the human hearing system that can be used to help illustrate the underlying theories.
There will be examples of how the recorded characteristic of a real room affects the quality of those sounds heard within it, and you will be able to vary these characteristics to alter the result. The recorded characteristics of virtual, computer modelled rooms will also be included and visualised using animation techniques – how do these virtual rooms compare with reality?
You will be challenged to step into the role of acoustician/architect/sound engineer/musician/spectator/etc, etc, and see how all these people are involved in decisions about sound and music in the world around us.

RESEARCH OBJECTIVES

Stereophonic sound reproduction was introduced in the 1960s and soon became established as the standard playback mechanism for essentially all types of recorded audio, and accolade it has held onto to this day. However, well before this, with the cinematic release of Disney’s Fantasia in 1939 composers and sound engineers began to develop techniques for a more immersive listening experience, electronically creating a virtual acoustic world using multiple speakers positioned around the audience. Multi-speaker techniques are now commonplace in cinema sound systems and are becoming more readily available in the home with the advent of home cinema systems and DVD. PC soundcards with surround-sound enhancements for games and other entertainment software are also commonly available. Surround-sound audio can also be produced using only standard headphones or stereo speakers together with precise measurements obtained from the detailed physical characteristics of our head and ears.

Our ultimate goal is therefore to rise to the challenge of electronically manufacturing a complex three-dimensional acoustic world that is indistinguishable from what we normally hear around us. The potential result has applications in music composition and playback, art, architectural design, cinema, television and gaming entertainment, telecommunications and user interface design.

Our related research objectives can therefore be identified as:

To develop tools for the composition, manipulation and control of surround-sound audio.
To increase the level of reality and immersion perceived by the listener.
To more accurately model the acoustic characteristics of rooms and halls, so allowing any arbitrary sound to be placed and heard within such a virtual space.
To develop accurate and reliable personalised sound – the recreation of 3-D audio or surround-sound effects using models of the head and ears, standard headphones or stereo speakers.
To increase our understanding of the factors which affect our perception of sound localisation.

SCIENCE CONCEPTS

CLICK! PART 1 – THE IMPULSE RESPONSE OF A ROOM

The size, shape and dimensions of a room, together with the actual materials used in its construction, all have a critical part to play in the quality of any sound heard within it. Imagine a gun being fired inside a large room or hall. This short, sharp and very loud event causes a variation in localised air pressure that is transmitted through the air itself and spreads out through the room in every direction – a sound wave. This sound wave will travel unhindered until it reaches an object where it will be partially absorbed and partially reflected. There may also be diffraction effects, where the sound bends round an object or passes through a gap (such as an open door into another room), resulting in further spreading of the original sound wave. Very quickly (within 100ms) the sound from our gunshot has spread throughout the room, and been reflected, absorbed and diffracted according to the room’s physical properties. The resulting superposition of the these complex wavefronts can cause acoustic pressure peaks and nulls at various points around the room, in what is known as constructive and destructive interference, respectively. Some of the wavefronts will travel repeated, regular reflection paths due to the geometry of the room resulting in the dominance of particular interference patterns and the enhancement of particular frequencies at specific positions around the room – these frequencies are called the room modes.

The characteristic build up of sound in a room as simulated using a digital waveguide mesh model. (left to right) (1) A short, sharp, impulsive sound fired into the larger of two rooms causes a circular wavefront to spread out from the sound source. (2) The sound wave is reflected from the walls and part of it passes through a gap into the smaller room. (3) In the larger room, interference effects are clearly visible; in the smaller room, the soundwave has spread out into an arc, demonstrating the effects of diffraction. (4) A short while after the initial event, the sound energy has spread out in a much more random and complex fashion.

This complex acoustic behaviour can be uniquely captured at a point within the room by using a single measurement called the Impulse Response. The gunshot sound input to the room ideally contains equal amounts of every audio frequency we are interested in, and by measuring (or listening) to this sound at another point in the room it is possible to examine how each frequency has been changed by its interactions with the room. The impulse response itself is not very interesting to listen to – lasting anywhere between 0.1 to 10 seconds depending on the size of the room and how reflective the surfaces are – and sounds like a click with a prolonged and decaying tail. This decaying part of the impulse response is due to the reverberation present in the room and is typically the characteristic "hanging on" quality of a sound that can be heard once the sound source itself has become silent. For instance, reverberation can be heard quite clearly in an empty church. However it is possible to take this very boring sound and apply it to any other sound using digital signal processing. The result is that we can make any sound appear to be coming from inside any particular space as long as we know its impulse response.

A profile of the typical characteristic build up of sound in a room consisting of direct sound, early reflections and reverberation.

The impulse response of a room can be measured directly although for most applications it is usually more practical to calculate an approximation using an acoustic model.

Modelling the acoustics of a room, using a digital waveguide mesh computer simulation. Notice the reflections at the boundaries and the diffraction and interference effects in the partitioned area, caused by gaps in the dividing wall. In the background, a room impulse response obtained from such a model.

CLICK! PART 2 – THE IMPULSE RESPONSE OF THE EAR

The wave propagation phenomena present in a room, as discussed above, are equally applicable on a smaller scale when considering the effect of the human head on sound perception. Differences in the arrival time and amplitude of a sound at each ear together with the diffraction of sound waves around the head, when combined with the minute reflections that occur due to the pinnae (the fleshy part of the outer ears), produce particular constructive and destructive interference patterns. These patterns alter the frequency content of the sound that reaches the ear-drums and this direction dependent information allows us to determine the direction of the original sound.

The acoustic behaviour of the head is unique for each person and can be described by a set of Head-Related Impulse Responses, with a left/right ear pair for each sound source direction. Head-Related Impulse Responses last for approximately only 5ms, around 1000 times shorter than that of a typical concert hall, and again sound like a short, sharp click. As with the room based impulse response, digital signal processing allows the characteristics of the Head-Related Impulse Responses to be applied to any audio source. It is then possible to place a sound at any position in a 3-D virtual space around the listener’s head, and reproduce it using only headphones (or stereo speakers). Further, if the room impulse response is measured at the entrance to the listener’s ears, rather than at a single point, then these two sets of frequency characteristics combine to impart both the environmental and directional acoustic properties of the space being measured.

Aspects of 3-D sound: (Foreground) KEMAR mannequin head and torso, together with a speaker, used in the measurement of the head-related impulse response. (Mid) Three-dimensional frequency plot of measured head-related impulse responses varying with the direction of the sound in relation to the head. (Background) Three-dimensional model of the outer ear.

SURROUND-SOUND

The word "stereophonic" is derived from Greek, and means "solid sound", referring to the construction of believable, solid, stable sound images, regardless of how many loudspeakers are used. It can be applied to surround-sound systems as well as to simple two audio channel techniques - the original Dolby Surround system was called Dolby Stereo, even though it was a four-channel system. However, most people are used to thinking of stereo as having two channels.

When a single sound is played at equal levels from two stereo loudspeakers (or headphones), we perceive the sound as coming from the mid-point between them. This is because there is no difference in the time taken for a sound from each speaker to reach each our ears. Making the sound from one of the speakers louder shifts the sound image towards the louder side, and we perceive the sound as moving either to the left or the right depending on which speaker is now the loudest. This technique is termed panning.

At a basic level a similar technique can be applied to presenting sound images over an array of speakers distributed around the listener. For most cinema or domestic surround-sound systems there are different speakers for different roles in the control and reproduction of the acoustic environment. The front left and right speakers are used for the accompanying music and surround effects. Two speakers are located at the centre of the screen, one of which is used for dialogue, the other being only used for low frequency sounds. There are also two speakers to the rear of the listeners that are used exclusively for surround-sound effects. Sounds can be panned around these speakers to produce various effects. However to produce a convincing effect often other factors have to be considered related to the psychology of how we hear. For instance a Doppler Shift effect can be used to accentuate the movement of a sound as it is panned between the speakers and hence moved around the listener. Doppler Shift is the change of pitch that we hear as a fast moving object moves towards and then away from us – this is typically only found in the real world with very fast moving objects over large distances, such as the sound of a siren on a police car, or the horn on a locomotive. However we associate this characteristic effect with sound movement so it will often be used subtly on other sounds to impart the required dramatic result. Another common effect used in cinema surround-sound is to make an aircraft fly over the heads of the audience – yet the speakers used to do this are at positions not much higher than our ears when we are seated – they are not placed above the audience in the ceiling of the cinema itself. The sound of the aircraft is actually panned around the speakers from front to back – yet we associate the real world sound of an aircraft flying as coming from way above us and so that is what we perceive.

Something to think about...

Next time you enter an interesting room, do your own impulse response test – clap your hands loudly, once, and listen carefully and you will hear the impulse response of the room as your handclap reverberates and quickly dies away. If a room is big enough so that a sound wave can travel a long path undisturbed before it reflects from a surface you might well hear a distinct echo as well as the reverberant sound. Given that it is possible for us to perceive an echo if it arrives at our ears at least 100ms after we hear the original sound, what is the shortest possible length a room could be for us to hear a distinct echo?
Why do people like singing in the shower? Think about the physical properties of a typical bathroom – what effect will they have on a persons voice? Compare these properties with those of a living room that has many soft furnishings. Think about why choirs are generally heard (for acoustic reasons) in churches or concert halls. Most pop music vocal recordings take place in small rooms or booths – yet the results we hear on a CD have a very different acoustic characteristic. What do they sound like? Why do you think this is?
Although we use the differences in timing and amplitude of a sound at each of our ears (together with the position dependent frequency characteristics) to work out where it is coming from, what other two physical mechanisms do we use to precisely locate a sound source?
Our ears have a very different shape from the front and the back. Why do you think this might be? You can accentuate this difference by cupping a hand round each of your ears. Try this and listen carefully to the sounds around you – how do they change as you move your head around?
Next time you go to the cinema listen carefully to the sound you hear around you. What can you hear from each set of speakers? How good is the surround-sound effect? How realistic do you think it is?
Why is there only one low-frequency speaker at the centre of the screen? Why aren’t there others placed around the audience as with the standard surround speakers? Why is the other centre speaker used for dialogue? Why not use the front left and right speakers on either side of the screen?

Bibliography and Links

Acoustics and Psychoacoustics, David M. Howard and James Angus, Focal Press, Butterworth-Heinemann, Oxford, 2000.

Spatial Sound, Francis Rumsey, Focal Press, Butterworth-Heinemann, Oxford, 2001.

The University of York Music Technology Homepage http://www.elec.york.ac.uk/mustech/

Web Resource for Acoustics and Psychoacoustics book: http://www-users.york.ac.uk/~dmh8/AcPsych/acpsyc.htm

Surround Pro Magazine: http://www.surroundpro.com/

Audio and 3-D sound links: http://www.wareing.dircon.co.uk/3daudio.htm

3-D Sound Information and Resources: http://www.3dsound.com/

Dolby – Surround-sound developers: http://www.dolby.com/

Ambisonic Surround Sound: http://www.york.ac.uk/inst/mustech/3d_audio/ambison.htm

Digital Signal Processing: A Tutorial http://www.dsptutor.freeuk.com/index.htm

The University of York Music Technology Homepage	http://www.elec.york.ac.uk/mustech/
Web Resource for Acoustics and Psychoacoustics book:	http://www-users.york.ac.uk/~dmh8/AcPsych/acpsyc.htm
Surround Pro Magazine:	http://www.surroundpro.com/
Audio and 3-D sound links:	http://www.wareing.dircon.co.uk/3daudio.htm
3-D Sound Information and Resources:	http://www.3dsound.com/
Dolby – Surround-sound developers:	http://www.dolby.com/
Ambisonic Surround Sound:	http://www.york.ac.uk/inst/mustech/3d_audio/ambison.htm
Digital Signal Processing: A Tutorial	http://www.dsptutor.freeuk.com/index.htm