Overview

Plotting static formant measures taken from the midpoint of vowels sometime obscures a lot of important and interesting information about the dynamic nature of formants, vowel targets, and co-articulation with adjacent segments.

To combat this, we can instead plot the entire vowel trajectory, assuming you have the data of course. Luckily, once we have a force-aligned TextGrid we can extract this kind of information relatively easily using Praat scripts.

In this part of the workshop, we’ll be working with an existing dataset of dynamic formant measurements, and we’ll be covering more complex manipulations required to get formant trajectory data into a workable and plottable format.


1 Installing and loading packages

The good news is that we don’t need any new packages for working with formant trajectories - all you need is the tidyverse, which you should have already installed and loaded in Part 1 of this workshop.


2 Loading in data

The dataset we’ll be working with for this part of the workshop comes from an elicitation task conducted with two speakers: one from Manchester and one from Blackburn. Crucially, this time it contains dynamic formant measurements taken across the entire portion of each vowel rather than a single midpoint. It can be downloaded here: workshop_traj.csv.

Let’s take a look at the structure of the dataframe:

##  [1] "token_id"       "speaker"        "time"           "vowel"         
##  [5] "word"           "duration"       "F1_mid"         "F2_mid"        
##  [9] "F3_mid"         "F1_avg_all"     "F2_avg_all"     "F3_avg_all"    
## [13] "F1_avg_third_2" "F2_avg_third_2" "F3_avg_third_2" "F1_avg_third_1"
## [17] "F2_avg_third_1" "F3_avg_third_1" "F1_avg_third_3" "F2_avg_third_3"
## [21] "F3_avg_third_3" "F1_05"          "F1_10"          "F1_15"         
## [25] "F1_20"          "F1_25"          "F1_30"          "F1_35"         
## [29] "F1_40"          "F1_45"          "F1_50"          "F1_55"         
## [33] "F1_60"          "F1_65"          "F1_70"          "F1_75"         
## [37] "F1_80"          "F1_85"          "F1_90"          "F1_95"         
## [41] "F2_05"          "F2_10"          "F2_15"          "F2_20"         
## [45] "F2_25"          "F2_30"          "F2_35"          "F2_40"         
## [49] "F2_45"          "F2_50"          "F2_55"          "F2_60"         
## [53] "F2_65"          "F2_70"          "F2_75"          "F2_80"         
## [57] "F2_85"          "F2_90"          "F2_95"          "F3_05"         
## [61] "F3_10"          "F3_15"          "F3_20"          "F3_25"         
## [65] "F3_30"          "F3_35"          "F3_40"          "F3_45"         
## [69] "F3_50"          "F3_55"          "F3_60"          "F3_65"         
## [73] "F3_70"          "F3_75"          "F3_80"          "F3_85"         
## [77] "F3_90"          "F3_95"

You’ll notice immediately that we have a lot of columns! This is unavoidable when dynamic formant data because, depending on the time resolution, a single vowel token will be represented by many different formant values.


3 Data wrangling

There’s actually a lot of information included in this dataframe that we don’t necessarily need here, so let’s make it a bit cleaner by keeping only the following columns:

Now let’s recode our vowel column to something more user-friendly. Let’s first establish what vowels we actually have data for:

## [1] "E"  "i:" "eI"

Let’s transform these values from their current X-SAMPA form into more user-friendly lexical sets à la Wells. We can do this using a combination of mutate() to change the vowel column, and case_when() to change it using various conditions. You should be familiar with case_when() from Part 1, but just as a reminder the following code basically says:

“when you have ‘eI’ in the vowel column, change it to ‘FACE’, when you have ‘E’, change it to ‘DRESS’, and when you have ‘i:’, change it to ‘FLEECE’”

Since we’re going to focus on the FACE diphthong in this part of the workshop, let’s have a look at the words that contain this vowel:

##  [1] "ate"      "sleigh"   "wade"     "freight"  "eight"    "weigh"   
##  [7] "sleighed" "wait"     "aid"      "slay"     "weighed"  "weight"  
## [13] "slayed"   "fray"     "slate"

3.1 Reshaping data

Another thing we need to do before plotting these vowel trajectories is to make the data ‘tidy’ (read about this here). This means having:

  • each variable as a column
  • one observation per row

In our case, it involves moving from a wide data format (i.e. with lots of columns) into a long format (i.e. with lots of rows instead!). In other words, rather than having one vowel token per row, with each formant at each time point measured in its own column, we want just one column with all of our formant values, and another column telling us which formant, and which time point, each value corresponds to.

To demonstrate with a non-linguistic example, the following dataset is in ‘wide’ format because there are multiple observations per row (it’s the number of words spoken per episode by members of the Stark family in season 1 of Game of Thrones)

## # A tibble: 6 x 11
## # Groups:   character [6]
##   character  ep_1  ep_2  ep_3  ep_4  ep_5  ep_6  ep_7  ep_8  ep_9 ep_10
##   <chr>     <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 arya         23   144   150   106   148    89     0    84    44    53
## 2 bran         71     0    40    33   128    51     0     0     0   132
## 3 catelyn     415   242   179    70    91    34     0    66   281   128
## 4 ned         428   297   558   501   654   371   440    60   150     0
## 5 robb         80   124    70    55     0   158     0   111   218    45
## 6 sansa       101   130    26    99    36   169     0   268     9    43

To this:

## # A tibble: 60 x 3
## # Groups:   character [6]
##    character episode spoken_words
##    <chr>     <chr>          <dbl>
##  1 arya      ep_1              23
##  2 bran      ep_1              71
##  3 catelyn   ep_1             415
##  4 ned       ep_1             428
##  5 robb      ep_1              80
##  6 sansa     ep_1             101
##  7 arya      ep_2             144
##  8 bran      ep_2               0
##  9 catelyn   ep_2             242
## 10 ned       ep_2             297
## # … with 50 more rows

To do this, we can use gather() to - as the name suggests - gather all of these columns together and split them into just two columns: one called value, which is the actual formant measurement, and one called measure_type, which tells us what that value corresponds to.

But that’s not all. At the moment, our measure_type column conflates two things that we really need to separate: the formant number (i.e. F1 or F2) and the time point in the vowel (i.e. 5%, 10% etc.). We can split this into two separate columns using the separate() function.

Let’s combine these two functions into one big pipe chain, and save it to a new dataframe called traj.tidy (the arrange() command in the final line of code is simply to order the rows by the token_id column, and is completely optional)

Ok, let’s have a look at a selection of columns from our new tidy dataset to make sure it’s worked:

## # A tibble: 6 x 6
##   speaker vowel word  formant interval value
##   <chr>   <chr> <chr> <chr>   <chr>    <dbl>
## 1 BillyG  DRESS fed   F1      05        895.
## 2 BillyG  DRESS fed   F1      10        501.
## 3 BillyG  DRESS fed   F1      15        539.
## 4 BillyG  DRESS fed   F1      20        578.
## 5 BillyG  DRESS fed   F1      25        560.
## 6 BillyG  DRESS fed   F1      30        554.

Perfect! Notice how all of the formant values are now contained in one single column - i.e. each vowel contains multiple observations, and each of these observations exists on its own row. If you run dim(traj) and dim(traj.tidy) (or just look at the dataframes in the Environment tab on the right of the RStudio window), you’ll notice that we’ve gone from having 162 rows and 45 columns to 6156 rows and 10 columns!


4 Data visualisation

Needless to say, we’ll need to change our plotting methods now that we’re dealing with dynamic formant trajectories rather than static single-point measures. A lot of the ggplot syntax remains the same, but we won’t be using geom_point() anymore because we won’t be plotting 2-dimensional scatterplots of F1 and F2.

We actually have two options at this point. We can plot our F1 and F2 trajectories using either:

Let’s start off by plotting the trajectories for just the ‘FACE’ vowel using geom_path(). Note a few important things in the code below:

Now let’s try it with a smooth instead. In the code below, all we’ve done is replaced geom_path() with geom_smooth(). Note that we’ve also specified se = FALSE - this is to stop R plotting confidence intervals around the smoothed lines, which would look pretty messy when we have so many individual lines.

If we didn’t want a seperate smooth for each individual token, we could change the group parameter to formant on its own. This means that ggplot will now plot each formant as its own smooth, but this is aggregated over all the relevant tokens produced by each speaker:

Exercise

Try plotting the trajectories for the words wheat and wet in the following way:

  • use a geom_smooth() to plot a single set of F1 and F2 trajectories for each word (i.e. we don’t want individual trajectories for each repetition of the word)
  • also include a geom_point() layer so that we can see the actual formant values overlaid on the smoothed trajectory (hint: you might want to decrease the size and opacity of these points otherwise they’ll obscure the smooths)
  • include two facet terms: speaker and word. So far we’ve only been specifying one facet term, but it’s straightforward to include two - you can use facet_grid(), separating your two faceting variables with ~

What do the results show? Do these vowels show any kind of formant movement during the production of these words, and if so, why might this be the case?


🤔Stuck? Solution here


5 Case study

You might have noticed that, for the FACE lexical set at least, our dataset contains a number of homonyms (or near-homonyms) such as eight and ate. Rumour has it that some speakers of northern British English varieties actually have a distinction between these words (and by rumour I of course mean scientific evidence).

Let’s take a look:

Well would you look at that! The words look pretty similar for BillyG, but WendyJ definitely shows a difference: we see a much greater increase in F2, and decrease in F1, in eight relative to ate

Exercise

Now try plotting the other minimal pairs in this dataset to see if the same pattern emerges:

  • are there any other such minimal pairs? Look at wait and weight
  • how does this interact with voicing of following segment? Compare wait, weight, wade, and weighed
  • what’s interesting about weigh?
  • what about words beginning with [sl]?

🤔Stuck? Solution here