Acoustics Assignment

A collection of words and digits were spoken by CJ and recorded on a PC equipped with a sound card. The words consisted of five repetitions of each monophthongal vowel in a hVd context, to permit a description of the speaker's vowel space. The digit strings were chosen for a future exercise.

The presentation and capture of the speech data was facilitated by use of Steve Cassidy's Emu Capture program. This program presented the words and digit strings one at a time and recorded each utterance into a separate .WAV file. The word list was a text file containing one utterance on each line.

The twelve vowels O, @:, I, A, a:, o:, U, i:, u:, E, V, and @ were recorded five times, in a different random order each time. With two repetions of 20 digit strings, this produced 100 .WAV files which are summarised in Table 1. These files can be downloaded from this self-extracting archive.

An Emu template was created to allow the speech data to be accessed as a database by the Emu labeller. The Emu labeller was used to annotate each recording. For the hVd words, the boundaries of the phonemes were marked, along with the location of the vowel targets. In the hierarchy view, the phoneme and target levels were linked back to a word and speaker level. For the digit strings, only the word boundaries were marked. The Emu annotation files are also stored in the self-extracting archive.

For the hVd words, formants F1 and F2 were measured at the vowel target using the Emu labeller's frequency readout, with the cursor positioned at the mid-points of the F1 and F2 bands. These values were entered into Table 1, along with the vowel labels.

The F1 and F2 values, and the vowel labels, were then copied from the table and loaded into vectors in the R environment. These vectors were used to produce ellipse plots of the vowel space by the following R commands:-

> F1 <- c(600,500,400,840,700,460,440,340,340,600,640,480,
> F2 <- c(1000,1420,2240,1580,1100, 800,1100,2680,1300,2020,1400,1540,
           860,2620,1360,1540,1400,1020,1460,2060,1280,1160,2220, 960,
          1460,1160,2580,2220,1060,2020,1000, 920,1520,1320,1520,1160,
          1140,1440,1420,2540, 840,1980,2200,1140,1000,1320,1180,1600,
          1160,1620,1000,1000,2000,2460,2180,1520,1420, 880,1180,1320)
> formants <- cbind(F1,F2)
> labels <-c("O" ,"@:","I" ,"A" ,"a:","o:","U","i:","u:","E" ,"V" ,"@" ,
             "o:","i:","@:","A" ,"V" ,"U" ,"@","E" ,"u:","a:","I" ,"O" ,
             "@:","a:","i:","I" ,"U" ,"E" ,"O","o:","A" ,"V" ,"@" ,"u:",
             "a:","@" ,"@:","i:","o:","E" ,"I","U" ,"O" ,"V" ,"u:","A" ,
             "u:","A" ,"U" ,"O" ,"E" ,"i:","I","@" ,"@:","o:","a:","V")
> library(emu)
> eplot(formants, labels, formant=T, dopoints=T, col=F)
> title("Speaker CJ Vowel Data Points")
> eplot(formants, labels, formant=T, centroid=T, col=F)
> title("Speaker CJ Vowel Centroids")

The data points for each vowel are quite tightly grouped, indicating a reasonable degree of consistency between different repetitions.

For comparison, two sets of vowel data, for Australian English and Southern British English male speakers, were loaded from the english database on the Harrington and Cassidy CDROM. Ellipse plots were first produced with the frequency scales aligned:-

> sbe.segs <-emu.query("english", "sbe:m:w*", "Phonetic=vowel")
>   <-emu.track(sbe.segs,  "fm", cut=0.5)
> aus.segs <-emu.query("english", "aus:m:*",  "Phonetic=vowel")
>   <-emu.track(aus.segs,  "fm", cut=0.5)
> par(mfrow=c(1,3))
> eplot([,1:2], label(sbe.segs), centroid=T, formant=T, col=F, xlim=c(-2700,-600), ylim=c(-900, -200))
> title("sbe")
> eplot(formants,     labels,          centroid=T, formant=T, col=F, xlim=c(-2700,-600), ylim=c(-900, -200))
> title("cj")
> eplot([,1:2], label(aus.segs), centroid=T, formant=T, col=F, xlim=c(-2700,-600), ylim=c(-900, -200))
> title("aus")

Comparison of these plots was hindered to some extent by the pitch differences in the speakers' voices. In the next ellipse plots, the frequency scales were adjusted to more closely align the front/back, high/low, vowels of each data set:-

> eplot([,1:2], label(sbe.segs), centroid=T, formant=T, col=F)
> title("sbe")
> eplot(formants,     labels,          centroid=T, formant=T, col=F, xlim=c(-2700,-700), ylim=c(-900, -350))
> title("cj")
> eplot([,1:2], label(aus.segs), centroid=T, formant=T, col=F, xlim=c(-2300,-600), ylim=c(-700, -300))
> title("aus")

It is evident that CJ's vowel space corresponds much more closely to the Southern British than the Australian vowel space. (CJ was born near Manchester, England, but has lived in Australia for 28 years.) The relative positions of i:, E, A, a:, o:, and @ are quite similar in the SBE and CJ data. For CJ, u: and U are further back, and O is lower. This probably reflects CJ's Northern British accent. In comparison with SBE and CJ, the AUS data shows marked differences, particularly in the positions of the u: and a: vowels.