Overview

In this section you’ll find answers to all of the exercises from Part 1 and Part 2 of the workshop. No peeking unless you really need help!

1 Part 1: Vowel formants

1.1 Section 3

Exercise

Using the tools we’ve covered so far, make a new variable categorising the values in the pre_seg column into either coronal, velar, or other based on their place of articulation.

If you’re not familiar with the coding scheme used in the pre_seg column, where sounds have been transcribed in ARPAbet, you can find IPA translations here (look in the 2-letter columns).

vowels <- vowels %>%
  mutate(pre_type = case_when(
    pre_seg %in% c('S', 'N', 'T', 'D', 'R', 'L', 'Z', 'JH', 'SH', 'CH') ~ 'coronal',
    pre_seg %in% c('K', 'G', 'NG') ~ 'velar',
    TRUE ~ 'other'
  ))

1.2 Section 4

Exercise

Calculate the following:

average duration of monophthongs and diphthongs for men and women separately
average F1 of the STRUT vowel for old and young speakers in Manchester and Blackburn separately

vowels %>% 
  group_by(sex, type) %>% 
  summarise(duration.avg = mean(duration))

## # A tibble: 4 x 3
## # Groups:   sex [2]
##   sex   type        duration.avg
##   <chr> <chr>              <dbl>
## 1 F     diphthong         0.136 
## 2 F     monophthong       0.0977
## 3 M     diphthong         0.109 
## 4 M     monophthong       0.0740

vowels %>% 
  filter(lexset == 'STRUT') %>%
  group_by(location, age.group) %>% 
  summarise(F1.avg = mean(F1))

## # A tibble: 4 x 3
## # Groups:   location [2]
##   location   age.group F1.avg
##   <chr>      <chr>      <dbl>
## 1 Blackburn  older       463.
## 2 Blackburn  younger     564.
## 3 Manchester older       503.
## 4 Manchester younger     632.

1.3 Section 5

Exercise

‘lexset’ isn’t a particularly reader-friendly title for our legend, so let’s change it. To do this, you need to add a layer scale_colour_discrete().

To get an idea of what arguments you can specify for a particular command, you can check the help section for each command by typing its name, preceded by a ?, in the console below, i.e. ?scale_colour_discrete

vowels.mon %>%
  ggplot(aes(x = F2, y = F1, colour = lexset)) +
  geom_point() +
  scale_x_reverse() +
  scale_y_reverse() +
  scale_colour_discrete(name="Vowel")

1.4 Section 5.2

Exercise

By default, stat_ellipse() will plot an ellipse that contains 95% of the data for that particular distribution (sort of similar to a 95% confidence interval).

Try and change this to a lower value, such as 68% (or even 10%!) to see how this influences the plot. Don’t forget you can check the help page by running ?stat_ellipse

vowels.mon %>%
  ggplot(aes(x = F2, y = F1, colour = lexset)) +
  stat_ellipse(aes(fill = lexset), geom='polygon', alpha = 0.3, level = 0.68) +
  geom_label(data = vowel.avgs, aes(x = F2.avg, y = F1.avg, label = lexset)) +
  scale_x_reverse() +
  scale_y_reverse() +
  theme(legend.position = 'none')

vowels.mon %>%
  ggplot(aes(x = F2, y = F1, colour = lexset)) +
  stat_ellipse(aes(fill = lexset), geom='polygon', alpha = 0.3, level = 0.1) +
  geom_label(data = vowel.avgs, aes(x = F2.avg, y = F1.avg, label = lexset)) +
  scale_x_reverse() +
  scale_y_reverse() +
  theme(legend.position = 'none')

1.5 Section 7

Exercise

Now that we’ve covered all of the key tools in analysing and plotting vowel formant data in R, let’s try a little case study exploring GOOSE-fronting:

plot the distribution of only FLEECE and GOOSE tokens (including averages!) for each speaker to establish the degree of overlap between these categories
plot just the F2 of GOOSE by date of birth or age group to establish if we have evidence of apparent-time change - you might want to try a boxplot for this (hint: it’s geom_boxplot())
make a plot of all GOOSE tokens colour-coded by whether or not the following segment is /l/ - you might want to create a new column for this using case_when(). What do the results suggest?
make a plot of all GOOSE tokens colour-coded by whether or not the preceding segment is alveolar or velar - you should have already made this column from Section 3.2 earlier. Does this preceding segmental environment also have an effect on the realisation of GOOSE?

vowels.avgs <- vowels.mon %>%
  filter(lexset %in% c('FLEECE', 'GOOSE')) %>%
  group_by(lexset, speaker) %>%
  summarise(F1.avg = mean(F1.norm), F2.avg = mean(F2.norm))

vowels.mon %>%
  filter(lexset %in% c('FLEECE', 'GOOSE')) %>%
  ggplot(aes(x = F2.norm, y = F1.norm, colour = lexset)) +
  stat_ellipse(aes(fill = lexset), geom = 'polygon', alpha = 0.5) +
  geom_label(data = vowels.avgs, aes(x = F2.avg, y = F1.avg, label = lexset)) +
  scale_x_reverse(name = "F2 (normalised)") +
  scale_y_reverse(name = "F1 (normalised)") +
  facet_wrap(~speaker) +
  theme(legend.position = 'none')

vowels.mon %>%
  filter(lexset == 'GOOSE') %>%
  ggplot(aes(x = age.group, y = F2.norm)) +
  geom_boxplot() +
  scale_x_discrete(name = "Age group") +
  scale_y_continuous(name = "F2 (normalised)")

vowels.mon %>%
  mutate(dob = as.character(dob)) %>%
  filter(lexset == 'GOOSE') %>%
  ggplot(aes(x = dob, y = F2.norm)) +
  geom_boxplot() +
  scale_x_discrete(name = "Date of birth") +
  scale_y_continuous(name = "F2 (normalised)")

vowels.mon %>%
  filter(lexset == 'GOOSE') %>%
  mutate(fol_type = case_when(
    fol_seg == 'L' ~ 'L',
    TRUE ~ 'other')) %>%
  ggplot(aes(x = F2.norm, y = F1.norm, colour = fol_type, fill = fol_type)) +
  geom_point() +
  stat_ellipse(geom = 'polygon', alpha = 0.2) +
  scale_x_reverse(name = "F2 (normalised)") +
  scale_y_reverse(name = "F1 (normalised)")

vowels.mon %>%
  filter(lexset == 'GOOSE') %>%
  ggplot(aes(x = F2.norm, y = F1.norm, colour = pre_type, fill = pre_type)) +
  geom_point() +
  stat_ellipse(geom = 'polygon', alpha = 0.2) +
  scale_x_reverse(name = "F2 (normalised)") +
  scale_y_reverse(name = "F1 (normalised)")

2 Part 2: Formant trajectories

2.1 Section 4

Exercise

Try plotting the trajectories for the words wheat and wet in the following way:

use a geom_smooth() to plot a single set of F1 and F2 trajectories for each word (i.e. we don’t want individual trajectories for each repetition of the word)
also include a geom_point() layer so that we can see the actual formant values overlaid on the smoothed trajectory (hint: you might want to decrease the size and opacity of these points otherwise they’ll obscure the smooths)
include two facet terms: speaker and word. So far we’ve only been specifying one facet term, but it’s straightforward to include two - you can use facet_grid(), separating your two faceting variables with ~

What do the results show? Do these vowels show any kind of formant movement during the production of these words, and if so, why might this be the case?

traj.tidy %>%
  filter(word %in% c('wheat', 'wet')) %>%
  ggplot(aes(x = interval, y = value, group = formant)) +
  geom_smooth() +
  geom_point(alpha = 0.5, size = 0.5) +
  facet_grid(speaker~word)

2.2 Section 5

Exercise

Now try plotting the other minimal pairs in this dataset to see if the same pattern emerges:

are there any other such minimal pairs? Look at wait and weight
how does this interact with voicing of following segment? Compare wait, weight, wade, and weighed
what’s interesting about weigh?
what about words beginning with [sl]?

traj.tidy %>%
  filter(vowel == 'FACE' & word %in% c('wait', 'weight')) %>%
  ggplot(aes(x = interval, y = value, group = interaction(formant, token_id), colour = token_id)) +
  geom_point(pch = 4) +
  geom_smooth(se=FALSE, span = 2) +
  facet_grid(speaker~word) +
  theme(legend.position = 'none')

traj.tidy %>%
  filter(vowel == 'FACE' & word %in% c('wait', 'weight', 'wade', 'weighed')) %>%
  ggplot(aes(x = interval, y = value, group = interaction(formant, token_id), colour = token_id)) +
  geom_point(pch = 4) +
  geom_smooth(se=FALSE, span = 2) +
  facet_grid(speaker~word) +
  theme(legend.position = 'none')

traj.tidy %>%
  filter(vowel == 'FACE' & word == 'weigh') %>%
  ggplot(aes(x = interval, y = value, group = interaction(formant, token_id), colour = token_id)) +
  geom_point(pch = 4) +
  geom_smooth(se=FALSE, span = 2) +
  facet_grid(speaker~.) +
  theme(legend.position = 'none')

traj.tidy %>%
  filter(vowel == 'FACE' & word %in% c('sleigh', 'sleighed', 'slay', 'slayed', 'slate')) %>%
  ggplot(aes(x=interval, y=value, group = interaction(token_id, formant), colour = token_id)) +
  geom_point(pch=4) +
  geom_smooth(se=FALSE) +
  facet_grid(speaker~word) +
  theme(legend.position = 'none')

Working with sociophonetic data in R

Solutions to exercises

George Bailey

Part 0:
Introduction Part 1:
Vowel formants Part 2:
Formant trajectories

1 Part 1: Vowel formants

1.1 Section 3

1.2 Section 4

1.3 Section 5

1.4 Section 5.2

1.5 Section 7

2 Part 2: Formant trajectories

2.1 Section 4

2.2 Section 5

Working with sociophonetic data in R

Solutions to exercises

George Bailey Part 0: Introduction Part 1:Vowel formants Part 2:Formant trajectories

1 Part 1: Vowel formants

1.1 Section 3

1.2 Section 4

1.3 Section 5

1.4 Section 5.2

1.5 Section 7

2 Part 2: Formant trajectories

2.1 Section 4

2.2 Section 5

George Bailey

Part 0:
Introduction Part 1:
Vowel formants Part 2:
Formant trajectories