Overview
Plotting static formant measures taken from the midpoint of vowels sometime obscures a lot of important and interesting information about the dynamic nature of formants, vowel targets, and co-articulation with adjacent segments.
To combat this, we can instead plot the entire vowel trajectory, assuming you have the data of course. Luckily, once we have a force-aligned TextGrid we can extract this kind of information relatively easily using Praat scripts.
In this part of the workshop, we’ll be working with an existing dataset of dynamic formant measurements, and we’ll be covering more complex manipulations required to get formant trajectory data into a workable and plottable format.
1 Installing and loading packages
The good news is that we don’t need any new packages for working with formant trajectories - all you need is the tidyverse
, which you should have already installed and loaded in Part 1 of this workshop.
2 Loading in data
The dataset we’ll be working with for this part of the workshop comes from an elicitation task conducted with two speakers: one from Manchester and one from Blackburn. Crucially, this time it contains dynamic formant measurements taken across the entire portion of each vowel rather than a single midpoint. It can be downloaded here: workshop_traj.csv.
Let’s take a look at the structure of the dataframe:
## [1] "token_id" "speaker" "time" "vowel"
## [5] "word" "duration" "F1_mid" "F2_mid"
## [9] "F3_mid" "F1_avg_all" "F2_avg_all" "F3_avg_all"
## [13] "F1_avg_third_2" "F2_avg_third_2" "F3_avg_third_2" "F1_avg_third_1"
## [17] "F2_avg_third_1" "F3_avg_third_1" "F1_avg_third_3" "F2_avg_third_3"
## [21] "F3_avg_third_3" "F1_05" "F1_10" "F1_15"
## [25] "F1_20" "F1_25" "F1_30" "F1_35"
## [29] "F1_40" "F1_45" "F1_50" "F1_55"
## [33] "F1_60" "F1_65" "F1_70" "F1_75"
## [37] "F1_80" "F1_85" "F1_90" "F1_95"
## [41] "F2_05" "F2_10" "F2_15" "F2_20"
## [45] "F2_25" "F2_30" "F2_35" "F2_40"
## [49] "F2_45" "F2_50" "F2_55" "F2_60"
## [53] "F2_65" "F2_70" "F2_75" "F2_80"
## [57] "F2_85" "F2_90" "F2_95" "F3_05"
## [61] "F3_10" "F3_15" "F3_20" "F3_25"
## [65] "F3_30" "F3_35" "F3_40" "F3_45"
## [69] "F3_50" "F3_55" "F3_60" "F3_65"
## [73] "F3_70" "F3_75" "F3_80" "F3_85"
## [77] "F3_90" "F3_95"
You’ll notice immediately that we have a lot of columns! This is unavoidable when dynamic formant data because, depending on the time resolution, a single vowel token will be represented by many different formant values.
3 Data wrangling
There’s actually a lot of information included in this dataframe that we don’t necessarily need here, so let’s make it a bit cleaner by keeping only the following columns:
- token_id (unique identifier for each vowel token)
- speaker (name of the speaker)
- vowel (vowel category, transcribed in X-SAMPA)
- word (word label)
- duration (duration of vowel; in ms)
- F1_avg_all (average F1 across whole vowel)
- F2_avg_all (average F2 across whole vowel)
- F1_05 (F1 at 5% of vowel)
- F1_10 (F1 at 10% of vowel)
- and so on, all the way to F2_95 (the
:
notation in the code below basically means “take all of the columns between, and including, these two”)
traj <- traj %>%
select(token_id, speaker, vowel, word, duration, F1_avg_all, F2_avg_all, F1_05:F2_95)
Now let’s recode our vowel column to something more user-friendly. Let’s first establish what vowels we actually have data for:
## [1] "E" "i:" "eI"
Let’s transform these values from their current X-SAMPA form into more user-friendly lexical sets à la Wells. We can do this using a combination of mutate()
to change the vowel
column, and case_when()
to change it using various conditions. You should be familiar with case_when()
from Part 1, but just as a reminder the following code basically says:
“when you have ‘eI’ in the vowel column, change it to ‘FACE’, when you have ‘E’, change it to ‘DRESS’, and when you have ‘i:’, change it to ‘FLEECE’”
traj <- traj %>%
mutate(vowel = case_when(
vowel == 'eI' ~ 'FACE',
vowel == 'E' ~ 'DRESS',
vowel == 'i:' ~ 'FLEECE'
))
Since we’re going to focus on the FACE diphthong in this part of the workshop, let’s have a look at the words that contain this vowel:
## [1] "ate" "sleigh" "wade" "freight" "eight" "weigh"
## [7] "sleighed" "wait" "aid" "slay" "weighed" "weight"
## [13] "slayed" "fray" "slate"
3.1 Reshaping data
Another thing we need to do before plotting these vowel trajectories is to make the data ‘tidy’ (read about this here). This means having:
- each variable as a column
- one observation per row
In our case, it involves moving from a wide data format (i.e. with lots of columns) into a long format (i.e. with lots of rows instead!). In other words, rather than having one vowel token per row, with each formant at each time point measured in its own column, we want just one column with all of our formant values, and another column telling us which formant, and which time point, each value corresponds to.
To demonstrate with a non-linguistic example, the following dataset is in ‘wide’ format because there are multiple observations per row (it’s the number of words spoken per episode by members of the Stark family in season 1 of Game of Thrones)
## # A tibble: 6 x 11
## # Groups: character [6]
## character ep_1 ep_2 ep_3 ep_4 ep_5 ep_6 ep_7 ep_8 ep_9 ep_10
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 arya 23 144 150 106 148 89 0 84 44 53
## 2 bran 71 0 40 33 128 51 0 0 0 132
## 3 catelyn 415 242 179 70 91 34 0 66 281 128
## 4 ned 428 297 558 501 654 371 440 60 150 0
## 5 robb 80 124 70 55 0 158 0 111 218 45
## 6 sansa 101 130 26 99 36 169 0 268 9 43
To this:
## # A tibble: 60 x 3
## # Groups: character [6]
## character episode spoken_words
## <chr> <chr> <dbl>
## 1 arya ep_1 23
## 2 bran ep_1 71
## 3 catelyn ep_1 415
## 4 ned ep_1 428
## 5 robb ep_1 80
## 6 sansa ep_1 101
## 7 arya ep_2 144
## 8 bran ep_2 0
## 9 catelyn ep_2 242
## 10 ned ep_2 297
## # … with 50 more rows
To do this, we can use gather()
to - as the name suggests - gather all of these columns together and split them into just two columns: one called value, which is the actual formant measurement, and one called measure_type, which tells us what that value corresponds to.
But that’s not all. At the moment, our measure_type column conflates two things that we really need to separate: the formant number (i.e. F1 or F2) and the time point in the vowel (i.e. 5%, 10% etc.). We can split this into two separate columns using the separate()
function.
Let’s combine these two functions into one big pipe chain, and save it to a new dataframe called traj.tidy (the arrange()
command in the final line of code is simply to order the rows by the token_id column, and is completely optional)
traj.tidy <- traj %>%
gather("measure_type", "value", F1_05:F2_95) %>%
separate("measure_type", into=c("formant", "interval")) %>%
arrange(token_id)
Ok, let’s have a look at a selection of columns from our new tidy dataset to make sure it’s worked:
## # A tibble: 6 x 6
## speaker vowel word formant interval value
## <chr> <chr> <chr> <chr> <chr> <dbl>
## 1 BillyG DRESS fed F1 05 895.
## 2 BillyG DRESS fed F1 10 501.
## 3 BillyG DRESS fed F1 15 539.
## 4 BillyG DRESS fed F1 20 578.
## 5 BillyG DRESS fed F1 25 560.
## 6 BillyG DRESS fed F1 30 554.
Perfect! Notice how all of the formant values are now contained in one single column - i.e. each vowel contains multiple observations, and each of these observations exists on its own row. If you run dim(traj)
and dim(traj.tidy)
(or just look at the dataframes in the Environment tab on the right of the RStudio window), you’ll notice that we’ve gone from having 162 rows and 45 columns to 6156 rows and 10 columns!
4 Data visualisation
Needless to say, we’ll need to change our plotting methods now that we’re dealing with dynamic formant trajectories rather than static single-point measures. A lot of the ggplot
syntax remains the same, but we won’t be using geom_point()
anymore because we won’t be plotting 2-dimensional scatterplots of F1 and F2.
We actually have two options at this point. We can plot our F1 and F2 trajectories using either:
geom_path()
, which will draw a line between our individual time points kind of like a dot-to-dotgeom_smooth()
, which is like the above but, well, smoothed (how do they come up with these inventive names?!)
Let’s start off by plotting the trajectories for just the ‘FACE’ vowel using geom_path()
. Note a few important things in the code below:
- we’ve got interval (i.e. the time point) on the x-axis, and the formant value on the y-axis - this means our plots will be laid out similar to Praat’s formant tracker that gets overlaid on the spectrogram
- since we’re plotting lots of vowel tokens at once, we colour-code each formant trajectory using the unique identifier token_id
- we also need to specify a
group
argument, for the first time. This is because R needs to know which points ‘go together’ to constitute a single trajectory. It’s not enough to just setgroup
to token_id (try it and see what happens!) so we have to include aninteraction()
term between token_id and formant - the rest is straightforward: we use
geom_path()
as our geom type, include afacet_wrap()
term to separate our two speakers into two plots, and we also remove the legend usingtheme()
because the colour-coding is only there to make the plot look jazzy (the legend is also huge - if you don’t believe me, run the code below without that last line)
traj.tidy %>%
filter(vowel == 'FACE') %>%
ggplot(aes(x = interval, y = value, colour = token_id, group = interaction(token_id, formant))) +
geom_path() +
facet_wrap(~speaker) +
theme(legend.position = 'none')
Now let’s try it with a smooth instead. In the code below, all we’ve done is replaced geom_path()
with geom_smooth()
. Note that we’ve also specified se = FALSE
- this is to stop R plotting confidence intervals around the smoothed lines, which would look pretty messy when we have so many individual lines.
traj.tidy %>%
filter(vowel == 'FACE') %>%
ggplot(aes(x = interval, y = value, colour = token_id, group = interaction(token_id, formant))) +
geom_smooth(se = FALSE) +
facet_grid(~speaker) +
theme(legend.position = 'none')
If we didn’t want a seperate smooth for each individual token, we could change the group
parameter to formant on its own. This means that ggplot will now plot each formant as its own smooth, but this is aggregated over all the relevant tokens produced by each speaker:
traj.tidy %>%
filter(vowel == 'FACE') %>%
ggplot(aes(x = interval, y = value, group = formant)) +
geom_smooth() +
facet_grid(~speaker)
Exercise
Try plotting the trajectories for the words wheat and wet in the following way:
- use a
geom_smooth()
to plot a single set of F1 and F2 trajectories for each word (i.e. we don’t want individual trajectories for each repetition of the word) - also include a
geom_point()
layer so that we can see the actual formant values overlaid on the smoothed trajectory (hint: you might want to decrease the size and opacity of these points otherwise they’ll obscure the smooths) - include two facet terms: speaker and word. So far we’ve only been specifying one facet term, but it’s straightforward to include two - you can use
facet_grid()
, separating your two faceting variables with~
What do the results show? Do these vowels show any kind of formant movement during the production of these words, and if so, why might this be the case?
🤔Stuck? Solution here
5 Case study
You might have noticed that, for the FACE lexical set at least, our dataset contains a number of homonyms (or near-homonyms) such as eight and ate. Rumour has it that some speakers of northern British English varieties actually have a distinction between these words (and by rumour I of course mean scientific evidence).
Let’s take a look:
traj.tidy %>%
filter(vowel == 'FACE' & word %in% c('ate', 'eight')) %>%
ggplot(aes(x=interval, y=value, group=formant)) +
geom_smooth() +
facet_grid(speaker~word)
Well would you look at that! The words look pretty similar for BillyG, but WendyJ definitely shows a difference: we see a much greater increase in F2, and decrease in F1, in eight relative to ate
Exercise
Now try plotting the other minimal pairs in this dataset to see if the same pattern emerges:
- are there any other such minimal pairs? Look at wait and weight
- how does this interact with voicing of following segment? Compare wait, weight, wade, and weighed
- what’s interesting about weigh?
- what about words beginning with [sl]?
🤔Stuck? Solution here