Vocal Tract Modelling

Presented here is an overview of those projects we have run at York related to physical modelling of the vocal tract, based on the original PhD work of Jack Mullen, with supporting web pages, examples and downloadable content.

The 2-D Digital Waveguide Mesh Vocal Tract

A real-time dynamic simulation of the vocal tract implemented using a 2-D digital waveguide mesh offering a comparison and improvement over the more traditional 1-D Kelly-Lochbaum model.

Key Publications:


Mullen, J., Howard, D.M., and Murphy, D.T., "Real-Time Dynamic Articulations in the 2D Waveguide Mesh Vocal Tract Model", IEEE Transactions on Audio, Speech and Language Processing, vol. 15, no. 2, pp. 577-585, 2007, [DOI].

Mullen, J., Howard, D.M., and Murphy, D.T., "Waveguide Physical Modeling of Vocal Tract Acoustics: Flexible Formant Bandwidth Control From Increased Model Dimensionality", IEEE Transactions on Audio, Speech and Language Processing, vol. 14. no. 3, pp. 964-971, 2006, [DOI].  

The Dynamic Digital Waveguide Mesh

An implementation of the digital waveguide mesh that enables dynamic variation, based on Mullen et al. 2007, as listed above, and now used for other applications, including a first attempt at articulatory vocal tract synthesis.

Key Publications:


Murphy, D.T., Shelley, S., and Ternström, S., "The Dynamically Varying Digital Waveguide Mesh", Proc. of the 19th Int. Congress on Acoustics, Madrid, Spain, September 2-7, 2007 [Invited Paper].

Murphy, D.T., Kelloniemi, A., Mullen, J., and Shelley, S., "Acoustic Modeling using the Digital Waveguide Mesh", IEEE Signal Processing Magazine, vol. 24, no. 2, pp. 55-66, March 2007 [InvitedPaper], [DOI].

3-D Vocal Tract Models based on MRI

The 2-D digital waveguide mesh vocal tract was developed into a comparable 3-D model as part of Matt Speed's PhD work. Initially this implementation was tested using 3-D acrylic tube models, and then using 3-D geometries obtained from vocal tract MRI measurements of professional singers. The results are verified using acoustic measurements/recordings obtained under comparable conditions.
 

Key Publications:


Speed, M., Murphy, D.T., Howard, D.M., "Modeling the Vocal Tract Transfer Function using a 3D Digital Waveguide Mesh", IEEE Transactions on Audio, Speech, and Language Processing, vol. 22, no. 2, pp. 453 - 464, Feb. 2014, [DOI].

Speed, M., Murphy, D.T., Howard, D.M., "Three-Dimensional Digital Waveguide Mesh Simulation of Cylindrical Vocal Tract Analogs", IEEE Transactions on Audio, Speech and Language Processing, vol. 21, no. 2, pp. 449-455, Feb. 2013, [DOI].

Articulatory Vocal Tract Synthesis in SuperCollider

2-D vocal tract articulation was first attempted in Murphy, Shelley and Ternström (2007), as listed above, where we used the APEX system to generate cross-sectional area function information for input into our 2-D dynamic digital waveguide mesh model of the vocal tract. APEX has since been updated for implementation in SuperCollider and in this paper is used to control a traditional 1-D Kelly-Lochbaum tube model. The goal is to implement our 2-D model in SuperCollider as this framework proves to be a useful control and synthesis paradigm.

Key Publications:


Murphy, D. T., Mátyás, J., and Ternström, S., "Articulatory vocal tract synthesis in Supercollider", Proc. of the 18th Int. Conference on Digital Audio Effects (DAFx-15), pp. 307-313, Trondheim, Norway, Nov. 30-Dec. 3, 2015.

Supporting SuperCollider scripts and source code available here.

3-D Dynamic Vocal Tract Models based on MRI

A development of the work of Speed et al., that explored 3-D static vocal tract models from MRI data, and Mullen et al., that developed a 2-D dynamic vocal tract model. In this research project, a 3-D dynamic vocal tract model as the next logical step is explored. The approach, in terms of working with the MRI data and developing the vocal tract models is outlined in the main paper below (with accompanying data), and compared against these previous methods, including benchmarking, in part, against a detailed FEM model. Methods to develop an articulatory control method for such models have also been investigated.

Key Publications:


Gully A. J., Daffern, H., and Murphy, D. T., “Diphthong Synthesis Using the Dynamic 3D Digital Waveguide Mesh”, IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 26, no. 2, pp. 243-255, Feb. 2018. [DOI]. Supporting Materials for this article:
    Dataset and MATLAB Scripts [DOI].

Gully, A. J., Yoshimura, T., Murphy, D. T., Hashimoto, K., Nankaku, Y. & Tokuda, K., “Articulatory Text-to-Speech Synthesis Using the Digital Waveguide Mesh Driven by a Deep Neural Network”, Proc. of InterSpeech 2017, pp. 234-238, Stockholm, Sweden, Aug. 20-24, 2017. [DOI].