Prosodies in the management of turn-taking

Jag vill begynna denhär föreställningen med att tacka er, och i synnerhet Jan Svennevig, för att ha inbjudit mig till dethär mötet: det är en stor glädje för mig att få vara med, att få träffa kolleger från Norden och att äntligen få chans att besöka Norge!

I feel I am here a little under false pretences, and so I will start my presentation by introducing myself a little to you so that you can see where I have come from and what kinds of research questions I am interested in. Then I will go on and tell you something about the work I have been doing in the past year or two, and explain what I think it contributes to linguistic theory.

I think I’ve been invited because of the word ‘prosodic’. In phonology this word is understood to mean a wide range of things, but in particular intonation. It also of course refers to other components of speech like speed (or tempo), loudness, rhythmicality, voice quality and so on. A rather ungenerous description that isn’t too far from the truth would be to say that prosody refers to any significant aspect of speech which there isn’t a letter of the alphabet for. Instead, prosodic aspects of speech are usually trancribed in our writing system with punctuation.

It’s a bit ironic, but this ‘non-alphabetic’ understanding of prosody is also one with a long and rather noble history in modern linguistics. In 1934, two Danish linguists, Louis Hjemslev and Hans-Jørgen Uldall, presented a paper at the International Phonetics Congress in London, where they argued that some aspects of phonology were prosodic; and the example they gave was the Greek diacritic for ‘rough’ and ‘smooth’ breathings, that is the presence of absence of /h/ at the start of vowel-initial words. Since this is a contrast only possible at the start of a word, they argued that it can be treated as a property of the word as a whole and is therefore prosodic: when you hear /h/, you know you are at the start of a word. This conference was attended by J R Firth, who became the first professor of linguistics in Britain just after World War II; in his famous paper Sounds and Prosodies, he used exactly the same example, and made the same point. Firth also treated glottal stop in German prosodically, because it marks off vowel-initial morphemes. For Firth and his co-workers, this approach became a way of working that set them apart from phonemic phonology. They started not from minimal pairs, but from grammatical patterns, taking a more holistic approach to meaning in language; and for them, unlike the phonemicists, meaning was deeply embedded in their analysis, not separate from it. Their approach to conducting analysis came to be known as Prosodic Analysis. Their work was very influential in Britain even in the late 1960’s. At York we have a special interest in this work, and it has influenced everyone who has worked on phonetics and phonology at York over the past two decades.

Among our collection of Firthian material at York, we have some lecture notes taken by Elizabeth Anderson, who was to become Elizabeth (Betsy) Ulldal, wife of Hans-Jørgen. She attended lectures given by Firth in 1938-39. Among her notes are some extraordinarily modern insights—presumably Firth’s—which chime in closely with many of our interests and methodology here at this workshop. In a section on Pareto, phonetics is described as “the study of social actions”, which is full of abstract categories. Phonetics is described as containing a complex of information, the result of “interacting forces” which are “mutually influential” and can’t be separated out. In here we see what was to become a principle of Firthian linguistics: the notion of ‘context of situation’: in other words the idea that language is deeply embedded in the occasions of its use. The social and interactional context in which it is used cannot, according to Firth, be separated from its structure and form. So we need to work out what is meant by ‘context of situation’. Interactional linguistics, with its anthropological and sociological basis, provides us with a way into this.

In another comment in the notes, we read “you’ve got to find the values of the elements you analyse out. To be interpreted from the point of view of the person whose values they are and who is performing the action”. Does this sound familiar to you? This is after all how conversation analysts go about establishing categories: by establishing the relevance of the categories to the participants themselves, rather than relying on the analyst’s (or speaker’s) intuitions, as in much of linguistics.

So, there are two strands of thought that bring me to the work I want to present to you, and both of them have historical connections with linguistics in Britain and, come to that, Scandinavia.

Firstly, I want to pursue the idea that ‘prosody’ means something more than intonation. I want to take the Firthian line that any phonetic parameter is potentially prosodic: what makes something prosodic is not its extent in phonetics, but its domain of contrast; and as a Firthian, I see it as essential to work out what not just what the form is, but what the function is too.

Secondly, I want to pursue the conversation analytic methodology of showing that the categories established can be shown to have relevance to participants in interaction, and aren’t just linguists’ constructs. Because the data we’re working with isn’t elicited specially, but is ordinary talk of the kind we all engage in every day, we have to accept that we can’t do with it the same kinds of things that we can do with laboratory data. But what we probably can do is to see what it means to talk about the function of linguistic categories in a way that isn’t just impressionistic or based on our ‘feeling’ about what is going on.

So, let me say what the data is that I will talk about today.

I’ve chosen two things that you can think of as ‘prosodic’ and I will look at them from an interactional linguistic point of view. Nearly all the material is from Finnish. The first bit of analysis is an exploration of the use of creak in managing turn transition in Finnish. I want to show that it is possible to show (rather than intuit) that creak is a marker of turn-finality in Finnish. The second bit of analysis is much more speculative at the moment and it is probably tied in with more delicate levels of interactional and sequential organisation which I haven’t yet managed to unpick. This is material where the speakers are exhibiting all sorts of prosodic control which we generally don’t think of as typical of speech, but more typical of singing or chanting. It’s not exactly ‘stylised’ speech; but when you hear the material, you’ll see how extraordinary it is.

Linguists often treat ‘unscripted’ data as more messy than lab data; and it is, if you look at it as an aggregate. But a lot of conversational talk exhibits local cohesion and often quite remarkable levels of phonetic control. The qualitative approach to speech which interactional linguistics offers is a useful way in for phoneticians, because it helps us to see what things could plausibly be grouped together, and what couldn’t. It also helps us to see what the function of a particular phenomenon is, not just its form.

Creak section
Analysis of a corpus of Finnish talk-in-interaction reveals that places where it is relevant for turn transition to occur are signalled using a range of phonetic resources, including intonation, tempo, duration and voice quality. This section of my talk concentrates on one such parameter, voice quality. Overwhelmingly, creak is used turn-finally, although other non-modal forms of phonation are used as well (such as breathiness, voicelessness, and whisper); and in certain interactional circumstances, other phonetic resources can be used too, such as stylised pitch contours. The ‘default’ case though is creak (cf. Iivonen 1998). This section will show that in Finnish, creak is one resource used for turn-yielding activities.

The data in this paper are taken from a radio phone-in programme broadcast on Finnish national radio and recorded in May 2000. Listeners call in and ask for a piece of folk music to be played. There are two presenters, who encourage the callers to talk about why they have chosen that piece, and they usually develop the conversation so as to inform the listeners about the musicians or the music. Each presenter (one male, one female) takes it in turns to take a call. Although each call has a similar overall structure, the content varies widely.

There are eleven calls in the corpus, and nine of them have been transcribed. Some calls were eliminated, because the caller’s speech was not clear enough. The duration of the material analysed is approximately 23 minutes in total.

The data are transcribed according to standard conventions set out at the end of the paper. It is essentially a form of modified orthography which captures some prosodic features of spontaneous talk. In the transcriptions, P stands for the main presenter for the call, P2 for the other presenter, C for the caller.

The onset of turn-final creak in spontaneous talk correlates closely with phonological structure (Table 1). In 68% of all cases, creak begins after a voiceless obstruent that is, one of /p t k s/; if there is another voiceless obstruent during the creaky stretch, whispery or voiceless phonation may be initiated after it. In 69% of all cases, creak starts in a syllable other than the first one within a word. (Finnish words always bear main stress on the first syllable.) In total, 86% of final creaky stretches start after a voiceless obstruent and/or outside the first syllable of the word

Table 1. Placement of creak turn-finally followed by change of speaker.

Turns marked with final creak.
n = 82 Not in first syllable of word In first syllable of word
After voiceless obstruent
42 (51%) 14 (17%)
Not after voiceless obstruent 15 (18%) 11 (13%)

Extract 1.
Extract 1 below provides two examples of creaky stretches leading to turn transition. It is taken from the point in the call where C explains what her request is: a piece entitled ‘The farm machines’ day off’. The turn at l. 21 ends creaky. The creak is initiated outside the first syllable and after a voiceless obstruent. Note that although the turn ends with ‘and’ (and thus might naïvely be understood as necessarily projecting more talk to come), it is treated as complete by P, who comes in at l. 22. The turn at l. 23 achieves two things. It moves the conversation to dealing with C’s request, and the word sitte makes it clear, by connecting to the content of the prior turn at l. 20-21, why that material is relevant to her choice of record. So l. 23 shows orientation to prosodic detail and is pragmatically well placed. The turn at lines 23-24 is syntactically, pragmatically and prosodically complete. In this line, creak is followed by whisper. This turn also leads to turn transition, with C’s turn at l. 26.

Maajussin tytär 2/20-26
20 C ?oon:´ ?oon kyllä hh `maajussin `tyt:ärenä
be-1SG be-1SG certainly peasant-GEN daughter-ESS
I was I was of course

{C} {C--------------------}{f}
21 Æ `kirj{a}mmellises(*) `synt{yny ja kasvanu ^j}{a}=
literal-ADV be born-PPC and grow-PPC and
literally born and brought up as a peasant’s daughter and

22 P =.hhh

23 no `kerrotko sitte `kaikile `kuulijoille että
PRT tell-2SG-QCLI then all-PL-ALL listener-PL-ALL COMP
well why don’t you tell all the listeners then what

24 {C} {C-}{W-}
Æ mikä `tää sun `t{o}iv{e:}{on}
what this 2SG-GEN wish is
your request is

25 ?? .hh

26 C {C-}
nii se on `semmone kun maatalous`koneet `muist{aa}kseni
PRT it is such as farm-machine-PL remember-INF1-TRA-1SGPOS
yeah, it’s something with farm machines as far as I remember

Thus turn transition is managed at a number of levels, one of these being phonetic and relating to voice quality. Creak is one of the recurrent properties of turn-finality. Participants in interaction can be shown to orient to creak in this way, as this and other examples in this paper show.

Extract 2.
Extract 2 is taken from the start of a call. Relevant actions at this stage of the call include (i) checking the caller’s name (l. 1), (ii) checking that they are indeed connected (l. 3), (iii) exchanging greetings (l. 4-5).

Äijö 1/1-5
1 P {C-}
onkos meillä nyt Liisa Johanss{on}
is=QCLI=CLI 1PL-ADE now name name
do we now have Liisa Johansson

2 C kyllä on [h
certainly is
you certainly do

3 P {C------}
on the line

4 {C--}
`tervet:uloa `muk{aan}.
welcome-PART with
welcome to the programme

5 C {H-}
thank you

In l. 1, P initiates a creaky stretch at a place which completes one of the actions of the opening sequence, checking the caller’s name. The creak occurs after a voiceless obstruent, and outside the first syllable. C orients to this creaky stretch, and treats it as marking relevant turn transition. At l. 2, she offers a reply to the question begun but not syntactically completed at l. 1. P comes back at l. 3, with a creaky stretch. Here the creak continues that initiated in l. 1, and completes the sentence started at l. 1. Creak also marks the pragmatic completion of the sentence; this would be a relevant place for C to come in, but she has already done this. P’s talk proceeds immediately with a new TCU in l. 4. This TCU initiates the next action in the opening sequence of the call, exchanging greetings and welcoming the caller. The completion of this turn is marked again with creak at a prime site, outside the first syllable, and after a voiceless obstruent. In l. 5, it can be seen that C orients to this completion (which is syntactic, pragmatic and prosodic), and comes in with her response to the greeting. Her turn is marked with final voicelessness, and the next speaker is P; so the voicelessness and outbreath at l. 5 can be interpreted as turn-yielding.

Turn-final creak without a change of speaker.
Creak does not always lead to a change of speaker. In all such cases, there are other properties of the talk after the creaky stretch which demonstrate orientation to a TRP which has been ‘retracted’. For example, the voice quality may change to modal, accompanied by a rising pitch; or there may be an abrupt change in tempo, as in Extract 3.

Extract 3.
Pelimanni poika 1/14-16.
14 P {C-}
mikäs siihen liitt{yy}.
what it-ILL connect-3SG
what is connected to that ((choice))

15 Æ {all------ ------{C,p-----------}
{mitä:[p|] (.) mitä t{u[lee mieleen}].
what what come-3SG mind-ILL
what what comes to mind

16 C [no sii]hen- siihen liittyy
PRT it-ILL it-ILL connect-3SG
well it’s- it’s connected with…

This extract comes from the point where C is invited to explain her request. In l. 14, there is creak placed outside the first syllable and after a voiceless obstruent, the prime site for turn-final creak to be initiated. The turn is syntactically, pragmatically and prosodically complete, and turn transition could be expected to occur. However, this turn is immediately followed by another turn in l. 15 in which the talk is faster. (3.94 feet/second as compared to 3.58 feet/second in the previous turn; Finnish is said to be stress-timed, Iivonen 1998; cf. Wiik 1991.) This is an example of ‘rush-through’ (Schegloff 1982, 1998: 241). Line 15 therefore shows P orienting to the possible turn transition in his own talk, and back-tracking on it. P does a self-repair in the middle of this continuation, which is at a place of what Schegloff calls ‘maximal syntactic control’. The speed of this material is consistent with P attempting to retain his turn having reached a possible TRP marked by creak. The reformulation is itself performed with creak, which prosodically marks l. 15 as being a TRP. C responds in overlap with the creaky stretch of P’s talk. It is at this point, where creak has started, that turn transition becomes relevant. Schegloff (1996:85) and Wells & MacFarlane (1998) show that incoming talk in overlap in English is treated as non-competitive when placed after the phonetic markers of the ending of the turn have begun.

In summary, then, creak is one of the resources that speakers can use in Finnish to mark turn-finality. Speakers and listeners can be shown to orient to turn-final creak.

But creak is one of a system of resources that is available to speakers in the management of their turn-taking. There are other kidns of prosodic patterning which they also deploy.

Other kinds of prosodic patterning.
This section of my talk is more speculative. What I want to do here is to show you some examples from the same set of Finnish data, where what we find is that uptake is not achieved where it might have been, that is, something seems to have ‘gone wrong’ in some sense. What we get next is a bundle of phonetic events which are extremely well controlled and have some properties which we might want to describe as ‘poetic’ or ‘musical’. Just for the fun of it, I’ll show you an example from English which is kind of like the Finnish material in terms of its phonetic organisation, though I’m not sure its sequential placement is like that of the Finnish examples.

Jefferson (REF) noted that there are many ‘poetic’ properties of conversational speech. Among those I’m going to draw attention to are:

By ‘rhythmicity’, I mean the percept that there is a regular rhythm that the speaker sets up. This isn’t a straightforward thing, because our perception of rhythm is highly dependent on other factors too; but some of the examples in my collection are really highly regular.

The first extract is taken from the closing of a call. Closings of calls in this data contain the following actions (not necessarily all in this order):

• The presenter names the record and artist; this can sometimes include checking that this is the record to be played (eg. “shall we put the record on now?”)
• The presenter and the caller exchange greetings: good evening-good evening
• The presenter and the caller say good-bye, and the caller hangs up.

One of the common patterns for the rhythmical stretches I’m talking about is that they occur in places where something has gone wrong: usually, a first pair part has not received a preferred response.

In this extract, the presenter makes his final greetings, and thus promotes the action of closing the call. But he gets the number of people in the caller’s family wrong. Instead of producing a preferred response, she comes in and tells him how many there actually are. He acknowledges his error; but this potentially opens up a new topic for discussion, and the presenter moves to ensuring that a new topic does not get opened. He tells the caller that there isn’t time to talk about it, and then produces a stretch which has some interesting rhythmical and musical properties.

In line 9, the caller comes in and receipts P’s prior turn with the particle joo, which as Marja-Leena Sorjonen has shown, claims compliance with a directive. So we can understand line 10 as the caller demonstrating compliance with the presenter’s wish to end the call.

Let’s listen to it. Concentrate on the last few lines.

Notice how fast the talk in lines 6 and 9 is. Let’s return to the phonetics of line 9. It’s quiet; and it ends very fast. Normally, we expect endings to be slower: there’s lots of phonetic evidence of ‘utterance-final lengthening’, and it’s one of the factors that conversation analysts comment on when judging whether a turn has finished possible completion. This complete sentence though is not a complete turn. Instead, the presenter goes on an initiates the final action of a phone call, which is to say good bye. This is done in a rhythmical way, with a rhythm established half way through line 9: keitä ne muut on hei. A sequence of three beats is enough to produce a rhythm. We can hear the caller’s orientation to this rhtyhm, and her incoming no hei comes in exactly on beat with the rhythm established by the presenter. So, as the caller demonstrated compliance lexically in line 10, here she demonstrates compliance rhythmically in line 13.

But she does it intonationally too. Notice how level the presenter’s intonation is in line 9. When he gets to hei, his pitch steps up. It doesn’t glide up, it steps up. This gives the overall impression of the words being sung. Again, the caller uses a similar technique. Of course, final good byes are often sung.

What’s interesting here is that the talk leading into the good-bye is also sung; the presenter is kind of gearing up.

So one of the functions that rhythmical stretches seem to serve is in handling when something goes wrong. Let’s look at another example of a similar situation.

In this call, the caller has asked for a piece to be played by a Bulgarian women’s choir. He doesn’t care which track they play, because he wants to record whatever they play, and as long as they don’t play one he’s already got, he’ll be satisfied. During this call, the presenter has a very hard time getting the caller to say anything: the calls have a didactic purpose, and they include conversation about the musicians or the place they come from. The caller’s reticence obviously causes problems for the presenters. In this call, the main presenter gets the other presenter talking. I’ll give you the prior lines in translation to save time, then we’ll focus on the interesting bits.

What’s happening here is that the second presenter is starting to open out the topic so that the caller is given an opportunity to come in. Her talk in line 67 is an explicit offer to the caller to encourage her: if you’re interested. There is a half-second silence after this. The caller does not even produce a minimal response. In line 69, the presenter produces an interrogative. By choosing an obvious first pair part, she almost forces the caller to provide some kind of response, which he does. This is again followed by a long silence. So where the presenter is offering a place for the caller to elaborate, he is not taking up the opportunity.

The presenter’s response to this in line 72 is the bit we’re interested in. Notice that it’s got lots of threes in it: not just three old ladies that sing, but three repetitions of the word mahtava, powerful. And there’s repetition not just of the words, but of the pitch at which those words are said.

This stretch isn’t isochronous. But it does sound very rhythmical. Here we see, I think, a trading relation. While we don’t have absolute repetition of rhythm, we do have repetition of pitch and amplitude; and we have obvious lexical repetition as well. Repetition is a good way for speakers to produce the percept of regularity, because our perceptual system prefers to group things together.

59 P have you ever heard them live
60 C no, just on the radio
61 P I think that these Bulgarian women singers have influenced
62 Finnish singers’ way of thinking, haven’t they Pia
63 P2 right, I have myself in one or two singing groups where we’ve
64 tried sometimes successfully, sometimes unsuccessfully this
65 Bulgarian singing style
66 but of these Bulgarian groups
67 I mean, if you’re interested then my definite favourites are
68 among others Trio Bulgarka

69 {C-}{W--}
onks sulle tuttu täm{mö}{nen}h
be-3SG-QCLI 2SG-ALL known this-kind-of
have you heard of them

70 C ei o
NEG be
no I haven’t

71 (0.8)

72 P2 {C---}
siin on kolme mahtavaa t{ätii} jotka laulaa
3SG-INE be-3SG three powerful-PAR aunt-PAR which-PL sing-3SG
there’s three powerful old ladies that sing

73 {C}{tense-------------}
mahtavalla /äänellä maht{a}{via sovituksia}
powerful-ADE voice-ADE powerful-PL-PAR arrangement-PL-PAR
in a powerful voice powerful pieces

74 C [mmm]

75 P2 [.hh] ja sitte toinen on bulgarkan junior kvartet
and then another be-3SG name-GEN name name
and then another is Bulgarka Junior Quartet

((4 more lines about this group))
C how do you say this mystery choil- choir in Finnish

Finally, an extract from English. This one is a piece of American data taken in the sixties. It’s a “claim to fame”. The recording was made just after the assassination of Kennedy. The speaker is telling her friend about how she saw Kennedy’s body being put on a plane on television; and she’s realised that the place where they put his body on the plane was also the place where she set off for her holiday in Hawaii.

1 A Jackie looked up
2 hey that was the same spot we took off for Honolulu
3 (0.4)
4 where they put him on
5 (0.6)
6 at that chartered pla[ce
7 B [oh really?

The claim to fame is made in line 2. It makes a response from B relevant, but it doesn’t come. There’s a 0.4s gap at line 3. In line 4, there is an increment. Increments are often used as retries, designed to provide another place for the other speaker to come in and make some response. This one doesn’t succeed either. There’s another gap, and then at line 6, another increment. This time, B does come in.

But notice how rhythmically this talk is produced. The end of line 2 is very fast: in fact, the talk finishes just before the next beat, so that the effect of the fast delivery is to provide a next-beat which is in the clear. This is an optimal site for B to come in with her response. Line 4 is rhythmical, too. This one has a beat on every word. And as a result of this, you can notice something unusual about the phonetics of it. The put him stretch is produced with a stop and an /h/, not with a tap, ie. not as /puRIm/ but as /put hIm/. The next increment in line 6 is also produced rhythmically. And this time, B comes in, on beat.

So, here are three instances where there is a problem of uptake, and in each case, the speaker uses rhythm as a way of dealing with the problem. As Betty Couper-Kuhlen has shown, speakers can generate rhythm in order to handle turn transition. The incoming speaker can align their talk with the rhythm that’s been set up by the prior speaker.

What I’ve shown is six short extracts that demonstrate two resources in the management of turn transition. You will have noticed that what I have not really talked about is intonation. I’ve done this because I wanted to show you that intonation is just one prosodic resource that speakers can use; and there are others.

In the case of the rhythmic pieces, there are clearly other things going on that are not just rhythmic. There is obvious control of intonation in these stretches; there can be repetition of lexical items; and in other examples that I haven’t had time to play, there are other poetic resources too. To isolate just one of these aspects would be to detract from what probably matters most: that these chunks of talk are produced as wholes, and as such they produce a Gestalt.

What does this data have to say to phonological theory?

One thing is that while we can talk about the phonetic aspects of speech separately from all others, from an analytic point of view, this isn’t desirable. In the end, we have to tie in our phonological observations with the other observations that we can make: those on the lexical, syntactic, semantic, and interactional levels. We need what Firth called a ‘congruent level’ analysis, that is, one where all the levels of statement can be related to one another.

There’s lots more work to do, and we can talk in the discussion time about what other kinds of work could be done, but here’s what we at York have been working on:
• how the phonetics of individual words varies depending on interactional factors, eg. “and-uhm”, “oh”
• how increments of different kinds (remember we saw some in extract 6) are related to their hosts phonetically
• how collaborative completions are produced

As David Abercrombie pointed out, most phonetics is actually the phonetics of spoken prose: phoneticians and phonologists have typically concentrated on lexical meaning. This has consequences for what is considered as viable minimal pairs, and for the frequent reliance on the word as a basic unit of phonology. Spontaneous, interactional data raises problems for standard phonological theories. This is not because the phonetic and phonological resources are unorderly; rather, phoneticians and laboratory phonologists have used as their primary data speech in highly simplified contexts, typically without any form of interaction. One of the aims of traditional methodology is to decontextualise language from the occasions of its use. As Jørgen Rischel claimed in 1992, this has led to phonology being based on “very exaggerated idealisations of speech and exaggerated idealisations about the power of rule machinery as the format in which to take care of variation”. One of the major findings of interactional linguistics is that language is moulded by the occasions of its use.

One finding of the material I have presented is that some phonetic resources, while not lexically contrastive, do have an interactional function. I’d like to suggest that one of the next jobs of linguistic phoneticians should be to start to make sense of real talk; we have to observe patterns of variability, but we will also need to work out what linguistic and interactional mechanisms drive that variability. The interaction of conversation analysts and linguists at meetings like this one is one step to achieving this, and I’m grateful to have been asked to come. Thank you.