Vocal Tract Modelling
Presented here is an overview of those projects we have run at York
related to physical modelling of the vocal tract, based on the original
PhD work of Jack Mullen, with supporting web pages, examples and
downloadable content.The 2-D Digital Waveguide Mesh Vocal
Tract
A real-time dynamic simulation of the vocal tract implemented using a
2-D digital waveguide mesh offering a comparison and improvement over
the more traditional 1-D Kelly-Lochbaum model.Key Publications:
Mullen, J., Howard, D.M., and Murphy, D.T., "Real-Time Dynamic Articulations in the 2D Waveguide Mesh Vocal Tract Model", IEEE Transactions on Audio, Speech and Language Processing, vol. 15, no. 2, pp. 577-585, 2007, [DOI].
Mullen, J., Howard, D.M., and Murphy, D.T., "Waveguide Physical Modeling of Vocal Tract Acoustics: Flexible Formant Bandwidth Control From Increased Model Dimensionality", IEEE Transactions on Audio, Speech and Language Processing, vol. 14. no. 3, pp. 964-971, 2006, [DOI].
The
Dynamic Digital Waveguide Mesh
An implementation of the digital waveguide mesh that enables dynamic
variation, based on Mullen et al. 2007, as listed above, and now used
for other applications, including a first attempt at articulatory
vocal tract synthesis.Key Publications:
Murphy, D.T., Shelley, S., and Ternström, S., "The Dynamically Varying Digital Waveguide Mesh", Proc. of the 19th Int. Congress on Acoustics, Madrid, Spain, September 2-7, 2007 [Invited Paper].
Murphy, D.T., Kelloniemi, A., Mullen, J., and Shelley, S., "Acoustic Modeling using the Digital Waveguide Mesh", IEEE Signal Processing Magazine, vol. 24, no. 2, pp. 55-66, March 2007 [InvitedPaper], [DOI].
3-D Vocal
Tract Models based on MRI
The 2-D digital waveguide mesh vocal tract was developed into a
comparable 3-D model as part of Matt Speed's PhD work. Initially this
implementation was tested using 3-D acrylic tube models, and then
using 3-D geometries obtained from vocal tract MRI measurements of
professional singers. The results are verified using acoustic
measurements/recordings obtained under comparable conditions. Key Publications:
Speed, M., Murphy, D.T., Howard, D.M., "Modeling the Vocal Tract Transfer Function using a 3D Digital Waveguide Mesh", IEEE Transactions on Audio, Speech, and Language Processing, vol. 22, no. 2, pp. 453 - 464, Feb. 2014, [DOI].
Speed, M., Murphy, D.T., Howard, D.M., "Three-Dimensional Digital Waveguide Mesh Simulation of Cylindrical Vocal Tract Analogs", IEEE Transactions on Audio, Speech and Language Processing, vol. 21, no. 2, pp. 449-455, Feb. 2013, [DOI].
Articulatory Vocal Tract Synthesis in SuperCollider
2-D vocal tract articulation was first attempted in Murphy, Shelley
and Ternström (2007), as listed above, where we used the APEX system
to generate cross-sectional area function information for input into
our 2-D dynamic digital waveguide mesh model of the vocal tract. APEX
has since been updated for implementation in SuperCollider and in this
paper is used to control a traditional 1-D Kelly-Lochbaum tube model.
The goal is to implement our 2-D model in SuperCollider as this
framework proves to be a useful control and synthesis paradigm.Key Publications:
Murphy, D. T., Mátyás, J., and Ternström, S., "Articulatory vocal tract synthesis in Supercollider", Proc. of the 18th Int. Conference on Digital Audio Effects (DAFx-15), pp. 307-313, Trondheim, Norway, Nov. 30-Dec. 3, 2015.
Supporting SuperCollider scripts and source code available here.
3-D Dynamic Vocal Tract Models based on MRI
A development of the work of Speed et al., that explored 3-D static
vocal tract models from MRI data, and Mullen et al., that developed a
2-D dynamic vocal tract model. In this research project, a 3-D dynamic
vocal tract model as the next logical step is explored. The approach,
in terms of working with the MRI data and developing the vocal tract
models is outlined in the main paper below (with accompanying data),
and compared against these previous methods, including benchmarking,
in part, against a detailed FEM model. Methods to develop an
articulatory control method for such models have also been
investigated.Key Publications:
Gully A. J., Daffern, H., and Murphy, D. T., “Diphthong Synthesis Using the Dynamic 3D Digital Waveguide Mesh”, IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 26, no. 2, pp. 243-255, Feb. 2018. [DOI]. Supporting Materials for this article:
Dataset and MATLAB Scripts [DOI].
Gully, A. J., Yoshimura, T., Murphy, D. T., Hashimoto, K., Nankaku, Y. & Tokuda, K., “Articulatory Text-to-Speech Synthesis Using the Digital Waveguide Mesh Driven by a Deep Neural Network”, Proc. of InterSpeech 2017, pp. 234-238, Stockholm, Sweden, Aug. 20-24, 2017. [DOI].