dtm - DDWM

The Dynamically Variable Digital Waveguide Mesh

Damian Murphy, Simon Shelley, Sten Ternström and David Howard

This companion webpage contains sound examples in support of the paper "The Dynamically Variable Digital Waveguide Mesh" by the above authors and presented at the 19th International Congress on Acoustics, Madrid, 2-7 September 2007.

Abstract

The digital waveguide mesh (DWM) is a multi-dimensional numerical simulation technique based on the definition of a regular spatial sampling grid for a particular problem domain. This is generally a vibrating object capable of supporting acoustic wave propagation, with the result being sound output for excitation by a given stimulus. To date the output from most DWM based simulations is the static system impulse response for given initial and boundary value conditions. This method is often applied to room acoustics modelling problems, where the offline generation of impulse responses for computationally large or complex systems might be rendered in real-time using convolution based reverberation. More recently, work has explored how the DWM might be extended to allow dynamic variation and the possibility for real-time interactive sound synthesis. This paper introduces the basic DWM model and how the associated algorithms might be extended to include the possibility of allowing dynamic changes as part of the simulation. Example applications that make use of this new dynamic DWM are explored including the synthesis of simple sound objects and the more complex problem of articulatory speech and singing synthesis based on a multi-dimensional simulation of the vocal tract.

Examples

The following examples are presented from the results section of the above paper, and demonstrate how the dynamic DWM might be used to synthesize sound. This was first used in our 2D DWM VocalTract work. A good overview is given on the VocalTract page of this site, and includes links to Jack Mullen's Thesis, as well as a version of the software used to generate vocal tract synthesis audio examples and referred to in this paper, which will be available for download from these pages shortly.

1. 2D Dynamic Vocal Tract

Another example based on Jack's original work in this area and generated using the VocalTract system introduced above, and newly presented for this paper. The spectrogram below shows a smooth interpolation between area function data for the /u/ - ?food?, and /?/ - ?but?, vowels, under noise source excitation, highlighting the resulting change in formant patterns.

Fig 1. Spectrogram of a smooth interpolation between area function data for the /u/ - ?food?, and /?/ - ?but?, vowels, under noise source excitation.

The sound of this simple dynamic articulation of the 2D vocal tract is available here. Note that the change is smooth and natural sounding. More complex articulation requires more complete control of the vocal tract model itself.

2. 2D Dynamic Shrinking Membrane

In this example a rectangular membrane of size A (2.86m x 3.3m) is simulated using a 6-port DWM. Over 80,000 time-steps this membrane is smoothly reduced to one of size B (1.54m x 1.33m) by increasing the impedance of the mesh from the outer edge inwards using a linearly varying impedance map. Note that in the screenshots below the z-axis denotes increasing impedance.

Fig 2. Mesh dimensions are altered by linearly increasing the impedance from the outer edge inwards.

The resulting smooth change in modal frequencies is highlighted in the following spectrogram.

spectrogram of modal shift in shrinking membrane

Fig 3. Spectrogram demonstrating a smooth transition in resonant modes from mesh size A to mesh size B.

Sound examples for this simulation follow in Table 1 below. Note that with an impulse-like excitation the result is a broad-band noise like impulse response. This can be heard in Membrane_IR (Raw). A low pass filtered version is also included where the change is more clearly evident. Both of these impulse responses are then used to process a simple 'dry' drum loop source.

Membrane IR (Raw)	Membrane IR (LowPass)	Drums (Original)	Drums (Raw IR)	Drums (LowPass IR)
Tab. 1 Dynamic Shrinking Membrane Sound Examples

3. 2D Membrane Deformation

In the following example a 2D membrane is deformed by slowly adding and then removing regions of high impedance over the course of a simulation. The impedance map applied is based on a variation of the raised cosine impedance map used in the vocal tract model.

Fig 4. Regions of high impedance slowly added and then removed over the course of a simulation.

The resulting spectrogram is shown in Fig. 5 below. For the first 20,000 samples the result is that of a normal square mesh, then the regions or columns of high impedance slowly appear, reaching a maximum at 60,000 samples. They start to dissapear at 80,000 samples such that by 120,000 samples they have completely dissapeared, returning the mesh to its initial shape.

Fig 5. Spectrogram of the audio output from a a square 2D mesh that has smoothly applied deformations applied to it and then removed.

Sound examples for this simulation follow in Table 2 below. Note that, as before, with an impulse-like excitation the result is a broad-band noise like impulse response and this can be heard in Membrane_IR (Raw). A low pass filtered version is also included where the change is more clearly evident. These impulse responses are then used to process a guitar and and an orchestral audio sample.

Membrane IR (Raw)	Membrane IR (LowPass)
Guitar (Original)	Guitar (LowPass IR)
Strings (Original)	Strings (Raw IR)
Tab. 2 Membrane Deformation Sound Examples

4. Articulatory Vocal Tract Speech/Singing Synthesis

Given the results presented in Example 1 above, the question arises of how a multi-parametric vocal synthesis system based on a 2D DWM might be better articulated to give more natural speech output. Hence the VocalTract system has been further adapted to import A(x) data as a series of text files, with dynamic interpolation from one file to the next allowing more complex articulation than the dipthong synthesis presented in Example 1. The area function information is generated by the APEX Speech Articulation system [1] as shown in Fig. 6 below:

Fig 6. The APEX system front end.

APEX is a tool that can be used to synthesize sound and generate articulatory voice related parameters, based on the positioning of lips, tongue tip, tongue body, jaw opening and larynx height, all mapped from X-ray data. These parameters include vocal tract cross-sectional area function data A(x) as shown in Fig. 7:

Fig 6 - Cross Sectional Area Function Data

Fig 7. A(x) data generated by the APEX system and used to drive the vocal tract model.

The phrase "A Boy I Adore" is synthesized as a series of nine vocal tract profiles, /a/ /b/-/O:/-/i/ /a:/-/i/ /a/-/d/-/o/, with a vowel transition time of 250ms. The results of which are shown in Fig. 8.

Fig 8. Articulating the 2-D dynamic DWM vocal tract using the APEX system. The phrase
"A Boy I Adore" is synthesized from nine vocal tract profiles equally spaced in time.

The changes in formant pattern from vowel to vowel are clearly evident, with the tract constrictions for /b/ and /d/ being particularly noticeable, demonstrating the potential for high-level synthesis using a dynamically varying DWM. The audio output from this simulation is available here.

References

[1] J. Stark, C. Ericsdotter, P. Branderud, J. Sundberg, H-J. Lundberg and J. Lander, "The APEX model as a tool in the specification of speaker specific articulatory behaviour", Proc. XIVth ICPhS, San Francisco (1999).

dtm, August 2007

Dr Damian Murphy

t h e d y n a m i c D W M