Vocal Tract Modelling with the 2D Digital Waveguide Mesh

Jack Mullen

The following details some of the results of Jack's PhD project at the University of York.
The project began in October 2002 with the final thesis submitted in April 2006.



Abstract

Acoustical physical modelling synthesis uses mathematical algorithms to describe a real-world sound production process or propagational environment. Digital waveguides can be used to form a 1D model of the vocal tract, simplistically represented as a series of cylindrical tubes of varying radius along a straight axis. This 1D signal propagating element can also be extended to create a digital waveguide mesh (DWM), giving acoustical synthesis of a higher dimensional structure, such as a 2D plate or 3D space.

The work contained in this thesis is an investigation into the effects of increased dimensionality in the 1D waveguide vocal tract paradigm. A 2D DWM is configured as a model of the tract, such that shape characteristics are set within the width of the mesh. Wave propagation and reflection is simulated along the tract from the glottis to the lips, as well as across it, between the two inner walls, thereby removing plane-wave limitations inherent in the 1D model. The 2D tract is found to give accurate formant synthesis, producing vowels that give a good match to real-world targets. However, problems associated with high sampling frequency limitations and discontinuous dynamic operation are identified. Movements readily occurring in speech, such as diphthongs, are not easily accommodated by the static mesh structure.

A novel alternative approach is also presented which maintains a rectangular mesh, but maps the changing tract shapes onto the waveguide impedances. This allows for stable dynamic manipulation of the modelled space. Furthermore, sampling frequency limitations are removed, such that real-time operation and interaction with the 2D tract model is achieved.



Vocal Tract Modelling using the 1D Digital Waveguide

Currently, widely used methods for speech synthesis use concatenation of pre-recorded samples. Such techniques have been used to generate artificial speech at a level of near percieved realism. Because of the sample based nature of the method, synthesis is limited to the content and vocal identity of that originally recorded.

Articulatory vocal tract models attempt to simulate the behaviour of the speech apparatus itself, rather than just the sound it produces. This is called a physical model as it uses real world data about the vocal tract in order to virtually recreate it, and hence synthesise the sounds that would be observed from the original speaker.The human speech apparatus is a highly complex system. A full physical model of the vocal tract, including glottis, jaws, lips and teeth would result in computationally intensive simulations. As such, previous articulatory vocal tract models have been based around simplified 1D acoustical wave propagation definitions.

Figure 1 shows the typical form of a chain of digital waveguides used to simulate wave propagation in a distributed system. Left going pl and right going pr travelling wave pressure signals are temporally sampled at a discrete time interval T, (or equivalently, spatially sampled at a discrete distance interval d) and separated by delay units indicated by z -1 .

Fig. 1 A chain of digital waveguides

Figure 2 illustrates how the vocal tract shape details contained in an area function are spatially sampled as a series of adjoining cylindrical tubes. Each of the acoustic tubes is represented with a digital waveguide, where the tube cross-sectional area is set within the waveguide impedance.

Fig. 2 From /i/ vowel area function to piecewise cylinder analogy


Vocal Modelling using the 2D Digital Waveguide Mesh

The main point of this project is to investigate the effects that increased dimensionality might have on the vocal tract model. Figure 3 demonstrates the how the 1D digital wavegude can be extended to form a rectilinear digital waveguide mesh (DWM).

Fig. 3 Rectilinear topology: (a) the 4-port junction and (b) arbitrary shape mesh

An acoustical physical model of the vocal tract has been constructed using a 2D DWM to model the tube between the glottis and the lips. As demonstrated in Figure 4, area function data is converted into mesh width (diagram taken from [1]). Results so far have shown that similar formant patterns to those generated with the 1D model can be achieved. The 2D model also presents a linear bandwidth response to changes in the additional boundary reflection parameter.

Fig. 4 From /i/ vowel area function to widthwise mapped 2D DWM vocal tract model

The disadvantage of this model is that the DWM has a static structure and so the air cavity contained within it cannot be dynamically manipulated without undesirable waveforms discontinuities introdued into the system.

A novel alternative method of applying the vocal tract area function to the model has been developed. Figure 5 shows how the area function is mapped onto the impedance of the waveguides of a rectangular mesh (diagram taken from [1]). This acts to impart effects of the the tract shape onto a mesh such that accurate formant synthesis is achieved. The structure retains its shape, whilst changes to the modelled space can be made to the impedances. This allows for stable, dynamic manipulations to the model in real-time and facilitates synthesis of vowel dipthongs and plosive articulations.

Fig. 5 From /i/ vowel area function to impedance mapped 2D DWM vocal tract model



Audio Examples

Application of the LF glottal waveform results in production of the following synthesised vowels.

'bead' 'bard' 'booed' 'book' 'but'
'ball' 'bat' 'bed' 'bird' 'bit'
Tab. 1 2D Width Mapped Vowels
'bead' 'bard' 'booed'
'ball' 'bat' 'bed'
Tab. 2 2D Impedance Mapped Vowels


Finally, applying some vibrato and pitch change for increased naturalness...

'bead' 'bard' 'booed' 'book'
'but' 'bed' 'bird' 'bit'
Tab. 3 2D Width Mapped Vowels With Vibrato
'bead' 'bard' 'booed' 'book'
'but' 'bed' 'bird' 'bit'
Tab. 4 2D Impedance Mapped Vowels With Vibrato

Probably the most natural vowel sound I've made with the width mapped mesh - 'bard'

And lastly, heres an example of the real-time software output created with slides between various vowels and some plosive articulation generated by moving the sliders - output



Download

The software generated for this project can be downloaded for use here

  • Unzip the file and double click VocalModel.exe.
  • Select either a 1D or 2D model and the required sampling frequency on the bottom left hand side and then click Run/Stop to the bottom right.
  • Toggle the Glottis and Noise buttons on the left hand side to alter the excitation manner. 
  • You may then select which vowel area function is used in the model from one of the buttons at the top.
  • The area function can then be adjusted at any point along the tract using the sliders.

Disclaimer: The software has been constructed to run on a standard PC with Windows. It is testbed for the ideas generated in the research project. It has not been fully tested and no guarantees can be made on its robustness or safety. Downloaders do so at their own risk.



Publications and Presentations from the Project:

Mullen, J., "Physical Modelling of the Vocal Tract with the 2D Digital Waveguide Mesh", A thesis submitted in partial fulfilment of the requirements for the degree of Doctor of Philosophy in Electronics, University of York, April 2006. Download - 3Mb.

Mullen, J., Howard, D. M., and Murphy, D. T. , "Real-Time Dynamic Articulations in the 2D Waveguide Mesh Vocal Tract Model", IEEE Transactions on Audio, Speech and Language Processing , In Press, 2007.

Mullen, J., Howard, D. M., and Murphy, D. T., "Waveguide Physical Modeling of Vocal Tract Acoustics: Flexible Formant Bandwidth Control From Increased Model Dimensionality", IEEE Transactions on Audio, Speech and Language Processing, vol. 14. no. 3, pp. 964-971, 2006.

Mullen, J., "Multidimensional Waveguide Physical Modeling of the Vocal Tract", Presented at A STINT on Voice Research , York, UK, in association with The British Voice Association and KTH University, Stockholm, Sweden

Mullen, J., Howard, D. M., and Murphy, D. T.,"Acoustical Simulations of the Human Vocal Tract using the 1D and 2D Digital Waveguide Software Model", Proceedings of the 7th International Conference on Digital Audio Effects (DAFX-04), pp. 311-314, Naples, Italy, Oct 5-8, 2004.

Mullen, J., Murphy, D.T. and Howard, D.M., "Digital Waveguide Mesh Modelling of the Vocal Tract Acoustics", Proceedings of the 2003 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics , pp. 163-168, New Paltz, NY, USA, Oct. 19-22, 2003.



Additional Contact Details:

Jack Mullen
e: jackmullen[@]postmaster.co.uk

This work has been conducted under the supervision of:

Prof. David M. Howard
e: dh[@]ohm.york.ac.uk
w: http://www-users.york.ac.uk/~dmh8

and

Dr. Damian T. Murphy
who is maintaining this website and can be contacted above.
dtm, August 2007