Overview
In this section you’ll find answers to all of the exercises from Part 1 and Part 2 of the workshop. No peeking unless you really need help!
1 Part 1: Vowel formants
1.1 Section 3
Exercise
Using the tools we’ve covered so far, make a new variable categorising the values in the pre_seg column into either coronal, velar, or other based on their place of articulation.
If you’re not familiar with the coding scheme used in the pre_seg column, where sounds have been transcribed in ARPAbet, you can find IPA translations here (look in the 2-letter columns).
vowels <- vowels %>%
mutate(pre_type = case_when(
pre_seg %in% c('S', 'N', 'T', 'D', 'R', 'L', 'Z', 'JH', 'SH', 'CH') ~ 'coronal',
pre_seg %in% c('K', 'G', 'NG') ~ 'velar',
TRUE ~ 'other'
))
1.2 Section 4
Exercise
Calculate the following:
- average duration of monophthongs and diphthongs for men and women separately
- average F1 of the STRUT vowel for old and young speakers in Manchester and Blackburn separately
## # A tibble: 4 x 3
## # Groups: sex [2]
## sex type duration.avg
## <chr> <chr> <dbl>
## 1 F diphthong 0.136
## 2 F monophthong 0.0977
## 3 M diphthong 0.109
## 4 M monophthong 0.0740
vowels %>%
filter(lexset == 'STRUT') %>%
group_by(location, age.group) %>%
summarise(F1.avg = mean(F1))
## # A tibble: 4 x 3
## # Groups: location [2]
## location age.group F1.avg
## <chr> <chr> <dbl>
## 1 Blackburn older 463.
## 2 Blackburn younger 564.
## 3 Manchester older 503.
## 4 Manchester younger 632.
1.3 Section 5
Exercise
‘lexset’ isn’t a particularly reader-friendly title for our legend, so let’s change it. To do this, you need to add a layer scale_colour_discrete()
.
To get an idea of what arguments you can specify for a particular command, you can check the help section for each command by typing its name, preceded by a ?
, in the console below, i.e. ?scale_colour_discrete
vowels.mon %>%
ggplot(aes(x = F2, y = F1, colour = lexset)) +
geom_point() +
scale_x_reverse() +
scale_y_reverse() +
scale_colour_discrete(name="Vowel")
1.4 Section 5.2
Exercise
By default, stat_ellipse()
will plot an ellipse that contains 95% of the data for that particular distribution (sort of similar to a 95% confidence interval).
Try and change this to a lower value, such as 68% (or even 10%!) to see how this influences the plot. Don’t forget you can check the help page by running ?stat_ellipse
vowels.mon %>%
ggplot(aes(x = F2, y = F1, colour = lexset)) +
stat_ellipse(aes(fill = lexset), geom='polygon', alpha = 0.3, level = 0.68) +
geom_label(data = vowel.avgs, aes(x = F2.avg, y = F1.avg, label = lexset)) +
scale_x_reverse() +
scale_y_reverse() +
theme(legend.position = 'none')
vowels.mon %>%
ggplot(aes(x = F2, y = F1, colour = lexset)) +
stat_ellipse(aes(fill = lexset), geom='polygon', alpha = 0.3, level = 0.1) +
geom_label(data = vowel.avgs, aes(x = F2.avg, y = F1.avg, label = lexset)) +
scale_x_reverse() +
scale_y_reverse() +
theme(legend.position = 'none')
1.5 Section 7
Exercise
Now that we’ve covered all of the key tools in analysing and plotting vowel formant data in R, let’s try a little case study exploring GOOSE-fronting:
plot the distribution of only FLEECE and GOOSE tokens (including averages!) for each speaker to establish the degree of overlap between these categories
plot just the F2 of GOOSE by date of birth or age group to establish if we have evidence of apparent-time change - you might want to try a boxplot for this (hint: it’s
geom_boxplot()
)make a plot of all GOOSE tokens colour-coded by whether or not the following segment is /l/ - you might want to create a new column for this using
case_when()
. What do the results suggest?make a plot of all GOOSE tokens colour-coded by whether or not the preceding segment is alveolar or velar - you should have already made this column from Section 3.2 earlier. Does this preceding segmental environment also have an effect on the realisation of GOOSE?
vowels.avgs <- vowels.mon %>%
filter(lexset %in% c('FLEECE', 'GOOSE')) %>%
group_by(lexset, speaker) %>%
summarise(F1.avg = mean(F1.norm), F2.avg = mean(F2.norm))
vowels.mon %>%
filter(lexset %in% c('FLEECE', 'GOOSE')) %>%
ggplot(aes(x = F2.norm, y = F1.norm, colour = lexset)) +
stat_ellipse(aes(fill = lexset), geom = 'polygon', alpha = 0.5) +
geom_label(data = vowels.avgs, aes(x = F2.avg, y = F1.avg, label = lexset)) +
scale_x_reverse(name = "F2 (normalised)") +
scale_y_reverse(name = "F1 (normalised)") +
facet_wrap(~speaker) +
theme(legend.position = 'none')
vowels.mon %>%
filter(lexset == 'GOOSE') %>%
ggplot(aes(x = age.group, y = F2.norm)) +
geom_boxplot() +
scale_x_discrete(name = "Age group") +
scale_y_continuous(name = "F2 (normalised)")
vowels.mon %>%
mutate(dob = as.character(dob)) %>%
filter(lexset == 'GOOSE') %>%
ggplot(aes(x = dob, y = F2.norm)) +
geom_boxplot() +
scale_x_discrete(name = "Date of birth") +
scale_y_continuous(name = "F2 (normalised)")
vowels.mon %>%
filter(lexset == 'GOOSE') %>%
mutate(fol_type = case_when(
fol_seg == 'L' ~ 'L',
TRUE ~ 'other')) %>%
ggplot(aes(x = F2.norm, y = F1.norm, colour = fol_type, fill = fol_type)) +
geom_point() +
stat_ellipse(geom = 'polygon', alpha = 0.2) +
scale_x_reverse(name = "F2 (normalised)") +
scale_y_reverse(name = "F1 (normalised)")
vowels.mon %>%
filter(lexset == 'GOOSE') %>%
ggplot(aes(x = F2.norm, y = F1.norm, colour = pre_type, fill = pre_type)) +
geom_point() +
stat_ellipse(geom = 'polygon', alpha = 0.2) +
scale_x_reverse(name = "F2 (normalised)") +
scale_y_reverse(name = "F1 (normalised)")
2 Part 2: Formant trajectories
2.1 Section 4
Exercise
Try plotting the trajectories for the words wheat and wet in the following way:
- use a
geom_smooth()
to plot a single set of F1 and F2 trajectories for each word (i.e. we don’t want individual trajectories for each repetition of the word) - also include a
geom_point()
layer so that we can see the actual formant values overlaid on the smoothed trajectory (hint: you might want to decrease the size and opacity of these points otherwise they’ll obscure the smooths) - include two facet terms: speaker and word. So far we’ve only been specifying one facet term, but it’s straightforward to include two - you can use
facet_grid()
, separating your two faceting variables with~
What do the results show? Do these vowels show any kind of formant movement during the production of these words, and if so, why might this be the case?
traj.tidy %>%
filter(word %in% c('wheat', 'wet')) %>%
ggplot(aes(x = interval, y = value, group = formant)) +
geom_smooth() +
geom_point(alpha = 0.5, size = 0.5) +
facet_grid(speaker~word)
2.2 Section 5
Exercise
Now try plotting the other minimal pairs in this dataset to see if the same pattern emerges:
- are there any other such minimal pairs? Look at wait and weight
- how does this interact with voicing of following segment? Compare wait, weight, wade, and weighed
- what’s interesting about weigh?
- what about words beginning with [sl]?
traj.tidy %>%
filter(vowel == 'FACE' & word %in% c('wait', 'weight')) %>%
ggplot(aes(x = interval, y = value, group = interaction(formant, token_id), colour = token_id)) +
geom_point(pch = 4) +
geom_smooth(se=FALSE, span = 2) +
facet_grid(speaker~word) +
theme(legend.position = 'none')
traj.tidy %>%
filter(vowel == 'FACE' & word %in% c('wait', 'weight', 'wade', 'weighed')) %>%
ggplot(aes(x = interval, y = value, group = interaction(formant, token_id), colour = token_id)) +
geom_point(pch = 4) +
geom_smooth(se=FALSE, span = 2) +
facet_grid(speaker~word) +
theme(legend.position = 'none')
traj.tidy %>%
filter(vowel == 'FACE' & word == 'weigh') %>%
ggplot(aes(x = interval, y = value, group = interaction(formant, token_id), colour = token_id)) +
geom_point(pch = 4) +
geom_smooth(se=FALSE, span = 2) +
facet_grid(speaker~.) +
theme(legend.position = 'none')
traj.tidy %>%
filter(vowel == 'FACE' & word %in% c('sleigh', 'sleighed', 'slay', 'slayed', 'slate')) %>%
ggplot(aes(x=interval, y=value, group = interaction(token_id, formant), colour = token_id)) +
geom_point(pch=4) +
geom_smooth(se=FALSE) +
facet_grid(speaker~word) +
theme(legend.position = 'none')