Phonetic and interactional features of attitude in everyday conversation

Overview and objectives

The purpose of the research is to advance understanding of the ways speakers and listeners use clusters of phonetic parameters in shaping and interpreting talk in natural conversation. Specifically, we will investigate how participants engaged in everyday conversation encode and decode attitude in their interactions.

The proposed research will focus on two kinds of sequence which occur in everyday conversation: those where participants display a particular attitude (e.g. in responding to good and bad news) and those where participants claim a particular attitude (e.g. via some explicit lexical formulation). The research will provide an account of

  1. the linguistic-phonetic and sequential-interactional features involved in displays and claims of attitude;
  2. the interactional uses of displays and claims of attitude.
One distinctive aspect of the research is that all parts of the analysis (linguistic-phonetic and interactional) will be grounded firmly in the observable behaviour of ordinary people in everyday conversation.

Background

Attitude is widely acknowledged as making an important contribution to the meanings which can be attributed to utterances.1 Linguists have a long-standing interest in the expression of attitude and their analyses regularly make appeal to speaker attitude in determining the meaning of utterances. For instance, in intonation studies there is a continuing tradition of employing lay attitudinal categories (e.g. ``challenging'', ``surprised'', ``sad'', ``involved'', ``uncertain'') in trying to account for the distribution and meaning of intonation contours (Cruttenden, 1997; Schubiger, 1958; Pierrehumbert and Hirschberg, 1990; Ladd, 1986). Within pragmatics, too, claims about particular pragmatic practices and stylistic effects (e.g. epistemic markers, facticity, irony, politeness, reported speech, sarcasm) and the intended force of utterances are routinely linked to speaker attitude (Mey 1993; Sperber and Wilson 1986; Leech 1983; Blakemore 1992).

The contribution of attitude to meaning is particularly evident where lexically identical utterances have different meanings and those differences in meaning are claimed to be the result of the phonetic design of those utterances indexing different attitudes. In vernacular terms, these different attitudes are indexed through the production of lexically identical utterances with a different ``tone of voice''.

To date, most of the systematic work on the phonetic correlates of attitude has come from linguistic and social-psychological research into affectual states (Tolkmitt et al. 1988; Davitz 1964; Banse and Scherer 1996; Cowie et al. 2000; Cowie and Cornelius 2003; Roach et al. 1998; Douglas-Cowie et al. 2003 and Scherer 2003 provide comprehensive reviews of previous work, thematic issues and challenges). Although this work has afforded a number of insights into the phonetic correlates of speaker attitude and the affectual aspects of speech, it has relied on a variety of experimental methodologies. The most prominent of these has been the use of actors to simulate extreme or archetypal attitudes and emotions (e.g. Engberg et al. 1997; Banse and Scherer 1996) in non-interactional circumstances.

Importantly, the categories mobilised in dealing with data produced in experimental contexts are not warranted in, or excavated from, the behaviour of participants in everyday conversation: ``the most natural, the most frequent, and the most widespread occurrences of spoken language'' (Abercrombie, 1965: 3). Even in those cases where attention has been directed at corpora of naturally occurring speech (see e.g. Roach et al. 1998; Douglas-Cowie et al. 2003) researchers have relied on external lay raters in order to identify or characterise the attitudinal affectual content rather than investigating the behaviour of the participants engaged in those interactions in any direct fashion, or investigating the interactional ends to which the expression of attitude might be being put. As a result, it is not at all clear whether the findings of such research can be legitimately or usefully applied to the everyday conversational talk of ordinary people (Batliner et al. 2000; Douglas-Cowie et al. 2003).

There are three novel features of the proposed research and its outcomes which distinguish it from investigations into attitude to date:

  1. It will only deal with data drawn from naturally-occurring everyday conversations.
  2. It will employ techniques of linguistic and interactional analysis which allow us to arrive at analytic categories inductively from the behaviour of participants engaged in everyday conversation.
  3. Arising from 2., it will be legitimate to view the specification of linguistic-phonetic and sequential-interactional properties arising from the analysis as having a reality for the participants and forming a part of their core linguistic and communicative competences.

The proposed research is timely given the recent upsurge of interest in the speech synthesis, recognition and speech understanding communities in describing the phonetic correlates of attitude (see e.g. the special volume of Speech Communication, 2003). A good deal of recent research has been dedicated to the developments of new databases. The methodology employed in the proposed research will allow us to work in new ways on existing databases. Both the methodology and the analytic results which will arise from it will be of interest to those working in the domains of pragmatics, phonetics, conversation analysis, communication research, speech synthesis, discourse modeling, discursive psychology, and attitude/emotion research. In addition to informing subsequent analyses in those areas, it will also allow us to re-assess the accuracy of past claims concerning the relationship between attitude and linguistic meaning in the light of the observed behaviour of participants engaged in everyday conversation.

Main research focus

In order to further understanding of participants' encoding and decoding of attitude in spontaneous, everyday conversation we aim to provide an integrated phonetic and sequential-interactional account of two different types of sequence, both of which occur in everyday talk-in-interaction. They are

  1. sequences in which participants display a particular attitude in response to good and bad news tellings;
  2. sequences in which participants claim a particular attitude by means of an explicit lexical formulation.
The reasons for selecting these two kinds of sequences are:
  1. They both show orientations by the participants to attitude, albeit in different ways. Individually, they provide a rich environment in which to study participants' encoding and decoding of attitude, where attitude plainly has a relevance for the participants.
  2. They will facilitate comparative analysis of the interactional and linguistic designs of displaying a particular attitude on the one hand and claiming it on the other.
  3. They are readily identifiable and occur frequently in everyday talk. A preliminary data search of a subset of the corpora described in Section 4 has yielded 63 instances of the target phenomena in 5 hours of recordings.

Displayed attitude sequences

Speakers engaged in everyday conversation may display a current attitude. For instance, participants may display attitude via non-lexical (i.e. phonetic and sequential) means.

The first focus of the research will be non-lexical manifestations of attitude in relatively short sequences (i.e. a few turns) of conversation in which good or bad news is told. We refer to these here as sequences involving attitude (mis)matchings - cases where talk is produced to be, and treated by the participants as, matched (or in some cases mis-matched) with respect to attitudinal aspects of the immediately prior talk. Cases of attitude matching might include the occurrence of a ``sympathetic'' response to the telling of bad news, or a ``joyous'' response to the telling of good news (some analysis of good and bad news is provided by Freese and Maynard 1998 and Maynard 1997).

To take a concrete example: a preliminary data search indicates that ``wow'' can be produced as a response to both good and bad news tellings. Moreover, ``wow'' can receipt both kinds of tellings unproblematically for both participants. That it is unproblematic for the participants is evidenced in part by the news-teller continuing with the next part of the story immediately after ``wow'', irrespective of whether the news is good or bad. As ``wow'' is deployed in response to tellings with different polarities, it might be expected that an attitudinally matched response to good news would index something like ``joy'' while an attitudinally matched response to bad news might be ``sympathy''. Indeed, we find the phonetics of ``wow'' to be different. One case of ``wow'' which receipts good news

Another case of ``wow'' which receipts bad news

The different deployment of phonetic resources on these cases suggests that in order to be attitudinally matched to the prior talk the phonetic features have to be different. A further case of ``wow'' as a response to a news telling supports this initial hypothesis: in response to a good news telling, a ``wow'' is produced which engenders a sequence in which the news-teller, rather than incrementing the story as in the above, treats the ``wow'' response as problematic by ``upgrading'' her telling with a reformulation and, shortly after, bringing here skepticism to the surface by saying ``are you being serious or sarcastic''. Most notably, the phonetic details of this case distinguish it from those described above. This attitudinally mis-matched ``wow''

So here, analogous with phonetics performing attitude matching as described above, it would seem that non-lexical properties (i.e. phonetic design and temporal placement) lead to a treatment of an utterance as attitudinally mis-matched.

This necessarily brief description demonstrates how non-lexical responses which are treated as attitudinally (mis-)matched with respect to the prior talk are produced with different phonetic and sequential design features. Taking this observation as a starting point for analysis, the research will examine a range of non-lexical responses to good and bad news tellings and document the particular linguistic, sequential, phonetic design features which differentiate responses to good news on the one hand, and bad news on the other.

Claimed attitude sequences

The second focus of research will be an investigation of sequences in which speakers claim a particular attitude through their lexical choices (Edwards 1999). Speakers engaged in everyday conversation (and other forms of talk-in-interaction, such as visits to doctors and counsellors) may claim a current attitude through explicit lexical formulations of attitude. Lexical formulations may offer a self-attribution of attitude (e.g. ``I'm so tired', ``I feel so sad''), or an other-attribution of attitude (e.g. ``you sound happy'', ``you sound a bit preoccupied'').

Investigation of these sequences will focus on

  1. whether there are particular phonetic and/or sequential design features in a speaker's talk which favour the subsequent occurrence of lexical formulations of attitude;
  2. whether lexical formulations of attitude are a systematically deployed interactional resource i.e. whether they are used recurrently in the service of some particular interactional goal.
Preliminary data exploration suggests that there is no simple mapping between the phonetic design of a speaker's talk and a co-participant's attribution of a current attitude: a speaker may make an other-attribution of attitude (e.g. sounding ``happy'' or ``bored'') in the absence of any particularly noticeable phonetic design features. Moreover, a co-participant may not remark on audibly present features of talk from another which might lead to assumptions about the talker being physically unwell (e.g. noticeably strained breathing, aggressively disturbed phonation). This suggests that claimed attitude sequences may be a resource which speakers can use to undertake some interactional work other than the simple attribution of attitude. For instance, in a number of cases of other-attribution of attitude, talk on the topic of the attribution-recipient's state immediately follows, irrespective of whether this involves the acceptance or refutation of the other-attribution.


Data

Data for the research will be taken from existing digitised corpora of everyday conversation housed at the University of York, large parts of which are transcribed. These corpora were collected under the terms of BSA ethical guidelines: where appropriate, subject consent was obtained (including consent to use recordings for research/education purposes other than those specified for the research for which they were originally collected).

These audio recordings of everyday conversation include:

  1. telephone calls made between friends and family members (12 hours), recorded around Britain over the last 20 years;
  2. face-to-face interactions between speakers of Newcastle and Derby English (10 hours) recorded as part of an ESRC-funded project (R000234892: ``Phonological variation and change in contemporary British English'');
  3. face-to-face interactions between friends (8 hours), which took place, unobserved, in a recording studio at the University of York: acoustically, the data is therefore of a very high standard, and suitable for a range of acoustic phonetic investigations.
We also have access to a corpus of American English telephone calls recorded as part of the Callhome corpus, distributed by the Linguistic Data Consortium, University of Pennsylvania (60 hours). These recordings are particularly amenable to detailed analysis as the speech of each participant involved in the conversation was recorded on a separate channel, making it possible to recover and analyse talk which occurred in overlap. We may also have access to video recordings of health consultations (40 hours). These data were collected in a research project funded by the Department of Health: Health in Partnership Programme, on the topic of patient participation, and subsequently reanalysed in an ESRC-funded project (R000223791: ``Effective consultations with patients: a comparative multidisciplinary study'') on which Local was a coapplicant. We are currently in the process of applying for ethics approval to continue using these recordings.

Methodology

The first part of the research strategy will involve the identification of all instances of the target phenomena (i.e. responses to good and bad news tellings and explicit lexical formulations of attitude) in the recordings. These data sets will then be subjected to detailed analysis of interactional and linguistic-phonetic aspects of the sequences, with both strands of the analysis taking place side-by-side (cf French and Local 1983; Local 2004).

Analysis of the sequential-interactional organisation of turns-at-talk will be grounded in the principles of Conversation Analysis (for an overview, see Drew 1994). Conversation Analysis (CA) sets out to document the procedures participants employ to construct and make intelligible their talk, and the events that occur within it. Because participants in conversation display their analysis of prior talk through their subsequent actions, the sequential organization of conversation provides rigorous, empirical ways of understanding how participants themselves make sense of the talk they are engaged in.

The methodology of CA is therefore a particularly useful one to apply to the study of attitude in conversation: the analysis of talk on a turn-by-turn basis requires the analyst to inspect one turn for what insights it gives into the current speaker's displayed understanding, and treatment, of what preceded it. In the case of attitude (mis-)matchings, this will involve detailed inspection of at least (i) the news telling, (ii) the response to the news telling and (iii) the talk following the response. In this way, a picture can be constructed of how participants' themselves orient to displays of attitude in conversation. In the case of lexical formulations of attitude, analysis will involve detailed inspection of (i) the talk which preceded the lexical formulation, (ii) the formulation itself, and (iii) the talk which follows it. Moreover, the CA methodology requires that analysts do not dismiss, a priori, details of any kind, be they lexical, syntactic, phonetic, or sequential (see the papers in Ochs et al. 1996; Couper-Kuhlen and Selting 1996 for exemplification of this approach). Given that features of lexical, sequential, and phonetic organisation are implicated in the phenomena to be investigated, the CA framework is an appropriate one in which to ground the study of attitude.

Phonetic analysis will employ a range of parametric auditory and acoustic techniques to examine the fine organisational detail of the talk produced (a similar combination of techniques are employed by Walker 2004; Curl et al. 2004; Local 2003; Local and Walker To appear; Local 2004; Docherty et al. 2002; Local 1996). Many investigations into the encoding of attitude in speech have focussed primarily on prosodic (e.g. intonational) features. Although this may have intuitive appeal, previous work on the linguistic analysis of everyday talk conducted by the applicant has demonstrated that participants in talk systematically manipulate clusters of general phonetic parameters (Abercrombie 1965) - encompassing rhythm, tempo, loudness, pitch, voice quality, and independent articulatory parameters - in order to structure their contributions to interaction. The phonetic analysis will therefore place no a priori constraints on which phonetic parameters will be studied. For all interactional sequences investigated, close inspection will be made of features of articulation (consonants and vowels), voice quality, loudness, pitch (contour and span), rhythmic organisation and speaking rate, and the nature and role of silences will be examined.

The analysis (both interactional and phonetic) will be primarily qualitative; quantitative techniques will be employed where this is warranted by findings being presented, and where the details of individual cases facilitates quantified comparison (Docherty et al. 2002; Local 1996). Quantitative phonetic analysis will involve the use of the PRAAT speech analysis software for detailed hand-labelling of acoustic events.

Research background and previous work by the applicant

The Department of Language and Linguistic Science at the University of York has an international reputation for the phonetic analysis of data derived from talk-in-interaction. The Department has a large group of investigators actively engaged in research which combines analysis of phonetics and interaction in the ways that we have described, the group consisting of investigators at both postgraduate and faculty level. The Department also has working links - through jointly funded research grants (RES-000-23-0035: ``Affiliation and disaffiliation in interaction''), joint supervision of doctoral candidates, and inter-departmental seminars - with the Department of Sociology at the University of York, which has a long-standing international reputation for research in the Conversation Analysis framework by key practitioners including Professor Paul Drew and Dr Tony Wootton.

Local has extensive experience in the parametric phonetic analysis of data derived from talk-in-interaction, using acoustic and impressionistic phonetic techniques, and working within both qualitative and quantitative paradigms. Much of Local's work has combined analysis of phonetic details and their interactional consequences. These research interests are reflected in (i) Local's published output (see e.g. Local 1992; Kelly and Local 1989; Local and Kelly 1986; French and Local 1983; Local and Walker To appear; Local 2004,2003,1986,1996) and (ii) previous funded research projects on which Local has been an applicant or coapplicant (R000223791, R000223534).

Research strategy

The funding being sought would last one year. It is anticipated that the stages of the research will be organised as follows:
 

  Months   Action  
         
  1-3   Systematic collection of instances of the phenomena from the data corpora  
  4-9   Analysis of relevant sequences  
  7-9   Preliminary drafting of results  
  10-12   Writing up and dissemination  

Outcomes

The research will yield:

  1. An integrated linguistic-phonetic and sequential-interactional account of attitude, grounded in the behaviour of participants engaged in everyday conversation;
  2. A qualitative analysis which would be usable as a robust basis for future quantitative analyses and models. Obvious applications would be the use of the categories which emerge from the analysis in coding large databases and in perceptual testing of speech stimuli; another would be the inclusion of the identified linguistic and sequential properties in dialogue modelling systems;
  3. A clearer understanding of the (phonetic and sequential) features which contribute to the meaning of utterances and the ways in which speakers and listeners manipulate fine phonetic detail and phonetic variability in producing and interpreting the moment-to-moment flow of everyday conversation.

Dissemination

The findings of the proposed research will be disseminated in two main ways:

  1. Presentation at national and international conferences, including the Meeting of the Linguistic Association of Great Britain (Cambridge, UK, 2005) and the International Pragmatics Association (Riva del Garda, Italy, 2005).
  2. Refereed journal articles, including a submission to Journal of Pragmatics focussing on the interactional aspects; and a submission to either Speech Communication or Journal of the International Phonetic Association focussing on the phonetic aspects.
It is envisaged that findings will also be presented in seminars and invited talks.

Footnotes

... utterances.1
We use ``attitude'' here as a cover term for constructs which have been referred to elsewhere as ``attitude'', ``emotion'', ``affect'' and ``stance''.

Bibliography

Abercrombie, D. (1965).
Conversation and spoken prose.
In Studies in Phonetics and Linguistics, pp. 1-9. London: Oxford University Press.
Banse, R. and K. Scherer (1996).
Acoustic profiles in vocal emotion expression.
Journal of Personality & Social Psychology 70(3), 614-636.
Batliner, A., K. Fischer, R. Huber, J. Spilker, and E. Noeth (2000).
Desperately seeking emotions or: actors, wizards and human beings.
In Proceedings of ISCA ITRW on Speech and Emotion: Developing a Conceptual Framework, pp. 195-200.
Blakemore, D. (1992).
Understanding Utterances.
Oxford: Blackwell.
Couper-Kuhlen, E. and M. Selting (Eds.) (1996).
Prosody in Conversation: Interactional Studies.
Cambridge: Cambridge University Press.
Cowie, R. and R. Cornelius (2003).
Describing the emotional states that are expressed in speech.
Speech Communication 40, 5-32.
Cowie, R., E. Douglas-Cowie, S. Savvidou, E. McMahon, M. Sawey, and M. Schröder (2000).
`FEELTRACE': an instrument for recording perceived emotion in real time.
In Proceedings of ISCA ITRW on Speech and Emotion: Developing a Conceptual Framework, pp. 19-24.
Cruttenden, A. (1997).
Intonation (2nd ed.).
Cambridge: Cambridge University Press.
Curl, T. S., J. Local, and G. Walker (2004).
Repetition and the prosody-pragmatics interface.
York Papers in Linguistics Series 2 1, 29-63.
Davitz, J. R. (1964).
Auditory correlates of vocal expression of emotional feeling.
In J. R. Davitz (Ed.), The communication of emotional meaning, pp. 101-112. New York: McGraw Hill.
Docherty, G., P. Foulkes, B. Dodd, and L. Milroy (2002).
The emergence of structured variation in the speech of Tyneside infants.
Final Report on ESRC Project R000237417.
Douglas-Cowie, E., N. Campbell, R. Cowie, and P. Roach (2003).
Emotional speech: towards a new generation of databases.
Speech Communication 40, 33-60.
Drew, P. (1994).
Conversation Analysis.
In Encyclopedia of Language and Linguistics, Volume 2, pp. 749-754. Oxford: Pergamon Press.
Edwards, D. (1999).
Emotion discourse.
Culture & Psychology 5(3), 271-291.
Engberg, I., A. Hansen, O. Andersen, and P. Dalsgaard (1997).
Design, recording and verification of a Danish emotional speech database.
In Proceedings of Eurospeech '97, pp. 1695-1698.
Freese, J. and D. W. Maynard (1998).
Prosodic features of bad news and good news in conversation.
Language in Society 27, 195-219.
French, P. and J. Local (1983).
Turn competitive incomings.
Journal of Pragmatics 7, 701-715.
Kelly, J. and J. Local (1989).
On the use of general phonetic techniques in handling conversational material.
In D. Roger and P. Bull (Eds.), Conversation: An Interdisciplinary Perspective, pp. 197-212. Clevedon: Multilingual Matters.
Ladd, D. R. (1986).
Intonational phrasing: The case for recursive prosodic structure.
Phonology Yearbook 3, 311-340.
Leech, G. (1983).
Principles of Pragmatics.
London: Longman.
Local, J. (1986).
Patterns and problems in a study of Tyneside intonation.
In C. Johns-Lewis (Ed.), Intonation in Discourse, pp. 181-198. London: Croom Helm.
Local, J. (1992).
Continuing and restarting.
In P. Auer and A. di Luzio (Eds.), The Contextualization of Language, pp. 273-296. Amsterdam: John Benjamins.
Local, J. (1996).
Some aspects of news receipts in everyday conversation.
In E. Couper-Kuhlen and M. Selting (Eds.), Prosody in Conversation, pp. 177-230. Cambridge: Cambridge University Press.
Local, J. (2003).
Variable domains and variable relevance: Interpreting phonetic exponents.
Journal of Phonetics 31, 321-339.
Local, J. (2004).
Getting back to prior talk: and-uh(m) as a back-connecting device.
In E. Couper-Kuhlen and C. E. Ford (Eds.), Sound Patterns in Interaction. Amsterdam: John Benjamins.
To appear.
Local, J. and J. Kelly (1986).
Projection and `silences': Notes on phonetic and conversational structure.
Human Studies 9, 185-204.
Local, J. and G. Walker (To appear).
Abrupt-joins as a resource for the production of multi-unit, multi-action turns.
Journal of Pragmatics.
Maynard, D. W. (1997).
The news delivery sequence: Bad news and good news in conversational interaction.
Research on Language and Social Interaction 30(2), 93-130.
Mey, J. (1993).
Pragmatics: an Introduction.
Oxford: Blackwell.
Ochs, E., E. A. Schegloff, and S. A. Thompson (Eds.) (1996).
Interaction and Grammar.
Cambridge: Cambridge University Press.
Pierrehumbert, J. and J. Hirschberg (1990).
The meaning of intonational contours in the interpretation of discourse.
In P. Cohen, J. Morgan, and M. Pollack (Eds.), Intentions in Communication, pp. 271-311. Cambridge, Mass: MIT Press.
Roach, P., R. Stibbard, J. Osborne, S. Arnfield, and J. Setter (1998).
Transcription of prosodic and paralinguistic features of emotional speech.
Journal of the International Phonetic Association 28, 83-94.
Scherer, K. (2003).
Vocal communication of emotion: A review of research paradigms.
Speech Communication 40, 227-256.
Schubiger, M. (1958).
English Intonation: its Form and Function.
Tubingen: Niemeyer.
Sperber, D. and D. Wilson (1986).
Relevance: Communication and Cognition.
Oxford: Blackwell.
Tolkmitt, F., G. Bergmann, T. Goldbeck, and K. Scherer (1988).
Experimental studies on vocal communication.
In K. Scherer (Ed.), Facets of Emotion: Recent Research, pp. 119-138. Hillsdale, NJ: Erlbaum.
Walker, G. (2004).
On some interactional and phonetic properties of increments to turns in talk-in-interaction.
In E. Couper-Kuhlen and C. E. Ford (Eds.), Sound Patterns in Interaction. Amsterdam: John Benjamins.
To appear.

About this document ...

Phonetic and interactional features of attitude in everyday conversation

This document was generated using the LaTeX2HTML translator Version 2002 (1.62)

Copyright © 1993, 1994, 1995, 1996, Nikos Drakos, Computer Based Learning Unit, University of Leeds.
Copyright © 1997, 1998, 1999, Ross Moore, Mathematics Department, Macquarie University, Sydney.