library(tidyverse)
library(emuR)
library(wrassp)
5 Forced alignment in emuR
5.1 Objective and preliminaries
The objective of this chapter is to show how to go from a directory with .wav
files and simple .txt
files containing orthographic transcriptions to a phonetically annotated and force-aligned Emu database.
The assumption is that you already have an R project called ipsR
and that it contains the directories emu_databases
and testsample
. If this is not the case, please go back and follow the preliminaries chapter.
Start up R in the project you are using for this course and load the following packages:
In R, store the path to the directory testsample
as sourceDir
in the following way:
<- "./testsample" sourceDir
And also store in R the path to emu_databases
as targetDir
:
<- "./emu_databases" targetDir
5.2 Converting a text collection into an Emu database
The directory ./testsample/german
on your computer contains .wav
files and .txt
files. Define the path to this directory in R and check that you can see these files with the list.files()
function:
<- file.path(sourceDir, "german")
path.german list.files(path.german)
[1] "K01BE001.txt" "K01BE001.wav" "K01BE002.txt" "K01BE002.wav"
The above is an example of a text collection because it contains matching .wav
and .txt
files in the same directory such that, for each .wav
file, the .txt
file contains the corresponding orthography. We can see that this is true by using the function read_file()
to read the context of these .txt
files:
read_file(file.path(path.german, 'K01BE001.txt'))
[1] "heute ist schönes Frühlingswetter"
read_file(file.path(path.german, 'K01BE002.txt'))
[1] "die Sonne lacht"
The command convert_txtCollection()
is used to convert a text collection into an Emu database. Below we make an Emu database called ger2
, which we’ll store in targetDir
:
convert_txtCollection(dbName = "ger2",
sourceDir = path.german,
targetDir = targetDir,
verbose=FALSE)
Load the database into R with load_emuDB()
:
<- load_emuDB(file.path(targetDir, "ger2_emuDB"), verbose=FALSE)
ger2_DB summary(ger2_DB)
── Summary of emuDB ────────────────────────────────────────────────────────────
Name: ger2
UUID: 6f52b7d1-7228-45c1-9d9c-58357a66345f
Directory: C:\Users\rasmu\surfdrive\emuintro\emuintro\emu_databases\ger2_emuDB
Session count: 1
Bundle count: 2
Annotation item count: 2
Label count: 4
Link count: 0
── Database configuration ──────────────────────────────────────────────────────
── SSFF track definitions ──
dataramme med 0 kolonner og 0 rækker
── Level definitions ──
name type nrOfAttrDefs attrDefNames
bundle ITEM 2 bundle; transcription;
── Link definitions ──
dataramme med 0 kolonner og 0 rækker
serve()
the database and have a look at it:
serve(ger2_DB, useViewer = F)
If you switch to hierarchy view, you should see that the words in the .txt
files are a single item in the attribute
tier of bundle
with the name transcription
, as shown in the figure below:
It is evident when query()
ing the database that the words are stored in this way, as shown below. Note that we need to include the argument calcTimes=FALSE
here, because the annotation level transcription
is of type ITEM
and is not linked to a time-based level (i.e. a SEGMENT
or an EVENT
level).
query(ger2_DB, "transcription =~ .*", calcTimes=FALSE)
5.3 Forced alignment
We are now going to run the Munich Automatic Segmentation (MAUS) pipeline over the database. We do this with the function runBASwebservice_all()
, which combines a number of online processing tools. Obligatory arguments are transcriptionAttributeDefinitionName()
which will be the name of the newly created annotation level, and language
, which in this case we set to deu-DE
. We also set the argument runMINNI
to FALSE
; this is potentially used to forced-align data which has no annotations at all. Note that runBASwebservice_all()
can only be used if you have an active internet connection.
runBASwebservice_all(ger2_DB,
transcriptionAttributeDefinitionName = "transcription",
language = "deu-DE",
runMINNI = FALSE,
verbose = FALSE)
language
setting in MAUS
There are quite a lot of languages available in MAUS. As of this writing, you can force-align Afrikaans, Albanian, Arabic, Basque, Catalan, Dutch, English, Estonian, Finnish, French, Georgian, German, Hungarian, Icelandic, Italian, Japanese, Luxembourgish, Maltese, Min Nan, Norwegian, Persian, Polish, Portuguese, Romanian, Russian, Spanish, Swedish, and Thai. There’s also a language independent mode which expects files to be phonetically transcribed in X-SAMPA, and a special mode for Australian aboriginal languages. Additionally, many of these languages have modes for multiple different dialects. More information can be found in the MAUS help files and in the MAUS web interface.
Let’s have a look at the database summary again:
summary(ger2_DB)
── Summary of emuDB ────────────────────────────────────────────────────────────
Name: ger2
UUID: 6f52b7d1-7228-45c1-9d9c-58357a66345f
Directory: C:\Users\rasmu\surfdrive\emuintro\emuintro\emu_databases\ger2_emuDB
Session count: 1
Bundle count: 2
Annotation item count: 58
Label count: 74
Link count: 52
── Database configuration ──────────────────────────────────────────────────────
── SSFF track definitions ──
dataramme med 0 kolonner og 0 rækker
── Level definitions ──
name type nrOfAttrDefs attrDefNames
bundle ITEM 2 bundle; transcription;
ORT ITEM 3 ORT; KAN; KAS;
MAU SEGMENT 1 MAU;
MAS ITEM 1 MAS;
── Link definitions ──
type superlevelName sublevelName
ONE_TO_MANY bundle ORT
ONE_TO_MANY ORT MAS
ONE_TO_MANY MAS MAU
Note that multiple extra levels and attributes have been created (ORT
, KAN
, KAS
, MAU
, and MAS
), as well as links between them.
Let’s serve()
the database.
serve(ger2_DB, useViewer=FALSE)
We immediately see that phone-level annotations have been added in the MAU
level. Have a look at the hierarchy view and try to identify the levels, links, and attributes.
This shows that the phone-level annotations are linked to syllable-level annotations in the MAS
level and word-level orthographic annotations in the ORT
level. The ORT
level further has the attributes KAN
and KAS
. These contain canonical representations of the word, i.e. phonetic annotations corresponding to the canonical pronunciations of these words. The MAS
level is chunked into syllables.
Given this complex information, more complex queries are now also possible. Let’s say we want to find the word-initial MAU
segments of all polysyllabic words:
<- query(ger2_DB,
mau.s "[[MAU =~.* & Start(ORT, MAU)=1] ^ Num(ORT, MAS) > 1]")
mau.s
This data frame can then be passed to requery_hier()
so we can see the labels in the ORT
level associated with these words, like so:
requery_hier(ger2_DB, mau.s, "ORT")
This may all seem rather opaque, but we’ll go into much more detail with how the querying language works in Chapter 7.
5.4 Forced alignment: Albanian
Next we’ll try out forced alignment for a different language (Albanian) and we will show how forced alignment can be done from a canonical phonemic transcription instead of from text.
5.4.1 From a text collection
First we’ll use a text collection like we saw for German previously. This text collection is in our sourceDir
in a folder called albanian
:
<- file.path(sourceDir, "albanian") path.albanian
First we’ll convert the text collection into an Emu database using convert_txtCollection()
as above.
convert_txtCollection(dbName = "alb",
sourceDir = path.albanian,
targetDir = targetDir,
verbose=FALSE)
<- load_emuDB(file.path(targetDir, "alb_emuDB"), verbose=FALSE)
alb_DB summary(alb_DB)
── Summary of emuDB ────────────────────────────────────────────────────────────
Name: alb
UUID: b631536e-dda0-433c-a57a-4a6a668fe1e8
Directory: C:\Users\rasmu\surfdrive\emuintro\emuintro\emu_databases\alb_emuDB
Session count: 1
Bundle count: 4
Annotation item count: 4
Label count: 8
Link count: 0
── Database configuration ──────────────────────────────────────────────────────
── SSFF track definitions ──
dataramme med 0 kolonner og 0 rækker
── Level definitions ──
name type nrOfAttrDefs attrDefNames
bundle ITEM 2 bundle; transcription;
── Link definitions ──
dataramme med 0 kolonner og 0 rækker
Have a look at the database, switch to hierarchy view, and verify that the words have been located at bundle -> transcription
as for the German database above.
serve(alb_DB, useViewer = F)
Now run MAUS, just as before. The language code for Albanian is sqi-AL
. Note that this will take longer than for German, possibly a couple of minutes.
runBASwebservice_all(alb_DB,
transcriptionAttributeDefinitionName = "transcription",
language = "sqi-AL",
runMINNI = F,
verbose=FALSE)
summary(alb_DB)
── Summary of emuDB ────────────────────────────────────────────────────────────
Name: alb
UUID: b631536e-dda0-433c-a57a-4a6a668fe1e8
Directory: C:\Users\rasmu\surfdrive\emuintro\emuintro\emu_databases\alb_emuDB
Session count: 1
Bundle count: 4
Annotation item count: 138
Label count: 176
Link count: 125
── Database configuration ──────────────────────────────────────────────────────
── SSFF track definitions ──
dataramme med 0 kolonner og 0 rækker
── Level definitions ──
name type nrOfAttrDefs attrDefNames
bundle ITEM 2 bundle; transcription;
ORT ITEM 3 ORT; KAN; KAS;
MAU SEGMENT 1 MAU;
MAS ITEM 1 MAS;
── Link definitions ──
type superlevelName sublevelName
ONE_TO_MANY bundle ORT
ONE_TO_MANY ORT MAS
ONE_TO_MANY MAS MAU
Look at the database and verify that the same kind of information has been automatically derived, as for the German database earlier.
serve(alb_DB, useViewer = F)
5.4.2 From a canonical representation
MAUS also allows an automatic segmentation to be derived directly from the canonical level that we saw in the KAN
attribute above. This can be useful when the canonical representation provided by MAUS deviates considerably from what was actually said. For one of the words in 0001BF_1syll_1
, the canonical representation has J E
when what was actually said was closer to n J E
.
First switch in hierarchy view from ORT
→ KAN
and then change the node J E
of the ORT:KAN
level to n J E
for file 0001BF_1syll_1
in the manner of Figure 5.1, as we also saw in Chapter 3.
In order to run MAUS on this more appropriate pronunciation, first change it as in Figure 5.1 above, and don’t forget to save the annotation after editing. Now MAUS can be run directly on this canonical level using the runBASwebservice_maus()
function. Here we again pass the language, and we pass the name of the existing annotation level with a canonical representation KAN
to the argument canoAttributeDefinitionName
and the name of the newly created force-aligned level mausAttributeDefinitionName
, which we call MAU2
to differentiate it from the already created MAU
tier.
runBASwebservice_maus(alb_DB,
canoAttributeDefinitionName = "KAN",
mausAttributeDefinitionName = "MAU2",
language = "sqi-AL",
verbose=FALSE)
Inspect the database again. There should now be a new tier MAU2
.
summary(alb_DB)
── Summary of emuDB ────────────────────────────────────────────────────────────
Name: alb
UUID: b631536e-dda0-433c-a57a-4a6a668fe1e8
Directory: C:\Users\rasmu\surfdrive\emuintro\emuintro\emu_databases\alb_emuDB
Session count: 1
Bundle count: 4
Annotation item count: 219
Label count: 257
Link count: 197
── Database configuration ──────────────────────────────────────────────────────
── SSFF track definitions ──
dataramme med 0 kolonner og 0 rækker
── Level definitions ──
name type nrOfAttrDefs attrDefNames
bundle ITEM 2 bundle; transcription;
ORT ITEM 3 ORT; KAN; KAS;
MAU SEGMENT 1 MAU;
MAS ITEM 1 MAS;
MAU2 SEGMENT 1 MAU2;
── Link definitions ──
type superlevelName sublevelName
ONE_TO_MANY bundle ORT
ONE_TO_MANY ORT MAS
ONE_TO_MANY MAS MAU
ONE_TO_MANY ORT MAU2
If you have reason to suspect that the canonical representation will differ from what is actually said in the recording, you can use the function runBASwebserivce_g2pForPronunciation()
as the first step. This way, you will not have to run MAUS twice, as runBASwebserivce_g2pForPronunciation()
only generates the canonical representations without doing forced alignment. If you have much more data, this could possibly speed up your process.
5.5 Functions introduced in this chapter
read_file()
: reads the content of a.txt
file into Rconvert_txtCollection()
: converts pairs of.wav
and.txt
files into an Emu database.run_BASwebservice_all()
: performs all the steps needed to get from an Emu database with orthographical transcription that aren’t time-aligned to a multi-level force-aligned phonetic annotationrun_BASwebservice_maus()
: performs forced alignment on an Emu database which already has (non-time-aligned) canonical phonetic annotationsrun_BASwebservice_g2pForPronunciation()
: performs grapheme-to-phoneme (G2P) conversion of an orthographical transcription to a canonical phonetic transcription on an Emu database