![]() |
Project Publications |
||||
|
Conference Papers1998"Prosynth: An Integrated Prosodic Approach to Device-Independent, Natural-Sounding Speech Synthesis."ABSTRACT This paper outlines ProSynth, an approach to speech synthesis which takes a rich linguistic structure as central to the generation of natural-sounding speech. We start from the assumption that the speech signal is informationally rich, and that this acoustic richness reflects linguistic structural richness and underlies the percept of naturalness. Naturalness achieved by structural richness produces a perceptually robust signal intelligible in adverse listening conditions. ProSynth uses syntactic and phonological parses to model the fine acoustic-phonetic detail of real speech, segmentally, temporally and intonationally. [International Conference Speech and Language Processing (1998)] Download PDF "Hybrid approach to high-quality formant synthesis using HLsyn"ABSTRACT Procsy is a hybrid method of automatically producing natural-sounding formant-based synthetic speech from an existing speech signal by using copy-synthesis and esti- mated articulatory trajectories as input to the HLsyn synthesizer (Sensimetrics Corporation). The purpose is to allow controlled manipulation of selected acoustic pa- rameters. Parameters for HLsyn are derived from labelled speech _les in two ways. Broadly, vowels and approxi- mants are copy-synthesized from the acoustic signal, while obstruents and nasals are synthesized by rule: articulatory trajectories and constriction areas are estimated from the segment label and duration, and converted into HL pa- rameter values. HLsyn combines information from both sources to calculate parameter values for a Klatt-type syn- thesizer. Strengths of the method are (i) simple HLsyn input captures acoustically complex obstruents, and (ii) HLsyn parameters automatically produce complex acoustic properties that accompany consonantal closures, especially at segment boundaries. These properties are hard to syn- thesize and thus typically absent in formant TTS,yet they provide some of the systematic variability we hypothesize contributes to robust, natural-sounding synthesis. Poten- tial applications are discussed. [3rd ESCA/COCOSDA International Workshop on Speech Synthesis (1998)] Download PDF 1999"Synthesizing Systematic Variation At Boundaries Between Vowels and Obstruents"ABSTRACT This work assesses whether natural-sounding excitation near segment boundaries enhances the intelligibility of formant synthesis. Excitation type at fricative-vowel (FV) and vowel-fricative (VF) boundaries and durations of voicing in voiced stop closures are described for one male speaker of British English. Most VF boundaries have mixed aperiodic and periodic excitation, whereas most FV boundaries change abruptly from aperiodic to periodic excitation. Syllable stress, vowel height, and final/non-final position within the phrase influenced the incidence and duration of mixed excitation. Voicing in stop closures varied in well-understood ways. Synthesized phrases proved more intelligible in noise when excitation at fricative boundaries and in voiced stop closures was structurally appropriate. Implications for formant synthesis are discussed. come from the detailed waveshape. The waveform amplitude envelope provides useful perceptual information to listeners. The waveshape at segment boundaries seems especially likely to contribute perceptual coherence, and hence more natural-sounding and robust synthetic speech. For example, the abruptness of voicing offset reliably differentiates between voiced and voiceless fricatives at vowel-fricative boundaries, and in the vicinity of the boundaries between obstruents and non-obstruent articulations there are often regions of mixed periodic and aperiodic excitation as the vocal-tract opens or closes through a critical range of constriction areas. Acoustic patterns of these types may contribute to perceptual coherence by enhancing stream segregation, as for nonspeech sounds. [XIVth International Congress of Phonetic Sciences (1999)] Download PDF "Intonation modelling in ProSynth: an integrated prosodic approach to speech synthesis"ABSTRACT Intonation modelling in ProSynth involves mapping the defining characteristics of an F0 contour on to the constituents of a hierarchical prosodic structure, which constitutes our core linguistic representation. The paper describes the use of a labelled speech database exemplifying selected structures to create a template for a particular pitch pattern in a given context, and the observed systematic structural effects on the alignment and shape of that template. The research confirms the importance of structural domains in determining systematic variation in pitch accent realization. Implemented in XML, our structure integrates intonational, temporal and segmental information to determine coherent parameter values for synthesis. [XIVth International Congress of Phonetic Sciences (1999)] Download PDF Also see Poster PDF. "Temporal Interpretation in ProSynth, a prosodic speech synthesis system"ABSTRACT ProSynth is an approach to speech synthesis which takes a rich linguistic structure as central to the generation of natural- sounding speech. This paper outlines the model of temporal interpretation employed in ProSynth in generating polysyllabic utterances, and the phonological structures used to drive the synthesis. We start from the assumption that the speech signal is informationally rich, and that this acoustic richness reflects linguistic structural richness. The primary timing unit is the syllable, situated within a prosodic hierarchy. Two mechanisms are used for timing: (1) Syllables are joined by overlaying one over another (2) Syllables are temporally compressed to produce the correct rhythmical effects. [XIVth International Congress of Phonetic Sciences (1999)] Download PDF "Representation and processing of linguistic structures for an all-prosodic synthesis system using XML"ABSTRACT The ProSynth speech synthesis project aims to re-implement and extend the YorkTalk all-prosodic synthe-sis system in an open manner preserving its most ap-pealing theoretical aspects. A significant novel aspect of the architecture of ProSynth is the use of the extensible mark-up language (XML) as a computational formalism for the representation of hierarchical linguistic structures. The facilities provided by XML match closely the re-quirements to represent the phonological features of an utterance in a metrical prosodic structure, namely: nodes described by attribute-value pairs forming strict hierar-chies. The XML formalism also leads to an elegant and efficient method for representing declarative phonological contexts under which phonetic interpreta-tion is performed. [EuroSpeech 99, Budapest] Download PDF Book Chapters1999"PROCSY,a hybrid approach to high quality formant synthesis using Hlsyn"ABSTRACT
Procsy is a hybrid method of automatically producing natural-
sounding formant-based synthetic speech from an existing speech
signal by using copy-synthesis and estimated articulatory trajec-
tories as input to the HLsyn TM synthesizer. The purpose is to
allow controlled manipulation of selected acoustic parameters. Pa-
rameters for HLsyn are derived from prosodically parsed and la-
belled speech _les in two ways. Broadly, vowels and approximants
are copy-synthesized from the acoustic signal, while obstruents and
nasals are synthesized by rule: articulatory trajectories and con-
striction areas are estimated from the segment label and duration,
together with attributes such as syllable stress where relevant, and
converted into HL parameter values. HLsyn combines information
from both sources to calculate parameter values for a Klatt-type
synthesizer. Strengths of the method are (i) simple HLsyn input
captures acoustically complex obstruents, and (ii) HLsyn parame-
ters automatically produce complex acoustic properties that accom-
pany consonantal closures, especially at segment boundaries. These
properties are hard to synthesize and thus typically absent in for-
mant TTS, yet they provide some of the systematic variability we
hypothesize contributes to robust, natural-sounding synthesis. Po-
tential applications are discussed.
[Proceedings of 3rd ESCA Workshop on Speech Synthesis]
Download PDF
DO NOT QUOTE WITHOUT PERMISSION
2000Prosynth: An Integrated Prosodic Approach to Device-Independent, Natural-Sounding Speech Synthesis.Ogden, R., Hawkins, S., House, J., Huckvale, M., Local, J., Carter, P., Dankovicova, J., and Heid, S.Computer Speech and Language 14, 177-210. Project Reports1999Report on perceptual evaluationABSTRACT This report summarises the work to date on the perceptual evaluation of Prosynth. Four preliminary perceptual experiments establish that the principles being implemented increase the intelligibility and/or naturalness of synthetic speech. Experiments 1 and 2 tested the intelligibility in noise of segmental detail and timing respectively. Experiments 3 and 4 tested aspects of intonation: Experiment 3 assessed naturalness, or more properly, neutralness; Experiment 4 measured reaction time to answer questions about read stories. Download HTML Report on resonance effect researchABSTRACT This paper describes work on resonance effects due to liquids (long-domain coarticulation) at the Phonetics Laboratory, University of Cambridge. The work is part of the ProSynth project funded by EPSRC Grant GR/L53069. Previous research shows that when an English utterance contains an /r/ or /l/, certain acoustic correlates of that segment appear in other segments, not just in the same or adjacent syllables, but sometimes several syllables earlier. The intelligibility of synthetic speech increases significantly when these "liquid resonance effects" (distributed acoustic properties due to /r/ and /l/) are included. This paper presents work aimed at determining factors that influence the spread of acoustic properties that reflect liquid resonance effects. The work establishes that resonance effects can spread through stressed syllables in some but not all circumstances. The frequency of F4 generally shows larger and more stable effects than lower formants. A number of other structural and segmental influences have been identified, and are currently still under analysis. Download HTML Report on phonological structure and temporal modellingABSTRACT This report provides a description of some of the work conducted at York as part of the ProSynth project. York's primary role has been to specify the phonological grammar and to develop and implement a temporal model for synthesis. Download HTML Use of XML in ProSynth ProjectABSTRACT This page gives information about the use of XML in the ProSynth project: it provides a motivation for the use of XML in the project and gives links to internal and external resources. Download HTML |
Last changed: 27 Feb 2000