library(tidyverse)
library(emuR)
library(wrassp)
5 Forced alignment in emuR
5.1 Objective and preliminaries
The objective of this chapter is to show how to go from a directory with .wav
files that are orthographically transcribed in simple .txt
files to a phonetically annotated and force-aligned Emu database.
The assumption is that you already have an R project called ipsR
and that it contains the directories emu_databases
and testsample
. If this is not the case, please go back and follow the preliminaries chapter.
Start up R in the project you are using for this tutorial and load the usual packages:
In R, store the path to the directory testsample
as sourceDir
in the following way:
<- "./testsample" sourceDir
And also store in R the path to emu_databases
as targetDir
:
<- "./emu_databases" targetDir
5.2 Converting a text collection into an Emu database
The directory ./testsample/german
on your computer contains .wav
files and .txt
files. Define the path to this directory in R and check that you can see these files with the list.files()
function:
<- file.path(sourceDir, "german")
path.german list.files(path.german)
[1] "K01BE001.txt" "K01BE001.wav" "K01BE002.txt" "K01BE002.wav"
The above is an example of what we’ll call a text collection because it contains matching .wav
and .txt
files in the same directory, and each .wav
file, the .txt
file contains the corresponding orthography. We can see that this is true by using the function read_file()
to read the content of these .txt
files:
read_file(file.path(path.german, 'K01BE001.txt'))
[1] "heute ist schönes Frühlingswetter"
read_file(file.path(path.german, 'K01BE002.txt'))
[1] "die Sonne lacht"
The command convert_txtCollection()
is used to convert a text collection into an Emu database. Below we make an Emu database called ger2
, which we’ll store in targetDir
:
convert_txtCollection(dbName = "ger2",
sourceDir = path.german,
targetDir = targetDir,
verbose=FALSE)
You can now load the database into R with load_emuDB()
:
<- load_emuDB(file.path(targetDir, "ger2_emuDB"), verbose=FALSE)
ger2_DB summary(ger2_DB)
── Summary of emuDB ────────────────────────────────────────────────────────────
Name: ger2
UUID: 5b818e3b-1d82-4471-985f-2ea5e5c8e2e4
Directory: C:\Users\rasmu\surfdrive\emuintro\emuintro\emu_databases\ger2_emuDB
Session count: 1
Bundle count: 2
Annotation item count: 2
Label count: 4
Link count: 0
── Database configuration ──────────────────────────────────────────────────────
── SSFF track definitions ──
dataramme med 0 kolonner og 0 rækker
── Level definitions ──
name type nrOfAttrDefs attrDefNames
bundle ITEM 2 bundle; transcription;
── Link definitions ──
dataramme med 0 kolonner og 0 rækker
Try to serve()
the database and have a look at it:
serve(ger2_DB, useViewer = F)
If you switch to hierarchy view, you should see that the words in the .txt
files are a single item in the attribute
level of bundle
with the name transcription
, as shown in the image below:
If we query()
the database to find all annotations at the transcription
level, it’s clear that the words are all stored in this way. Note that we need to include the argument calcTimes = FALSE
here, because the annotation level transcription
is of type ITEM
and has no time information.
query(ger2_DB, "transcription =~ .*", calcTimes = FALSE)
5.3 Forced alignment
We are now going to run the Munich Automatic Segmentation (MAUS) pipeline over the database. We do this with the function runBASwebservice_all()
, which combines a number of online processing tools. The obligatory arguments of this function are transcriptionAttributeDefinitionName()
which will be the name of the newly created annotation level, and language
, which in this case we set to deu-DE
. We also set the argument runMINNI
to FALSE
; this is potentially used to forced-align data which has no annotations at all by first running the MINNI automatic speech recognition (ASR) routine. Note that runBASwebservice_all()
can only be used if you have an active internet connection.
runBASwebservice_all(ger2_DB,
transcriptionAttributeDefinitionName = "transcription",
language = "deu-DE",
runMINNI = FALSE,
verbose = FALSE)
language
setting in MAUS
There are quite a lot of languages available in MAUS. As of this writing, you can force-align Afrikaans, Albanian, Arabic, Basque, Catalan, Dutch, English, Estonian, Finnish, French, Georgian, German, Hungarian, Icelandic, Italian, Japanese, Luxembourgish, Maltese, Min Nan, Norwegian, Persian, Polish, Portuguese, Romanian, Russian, Spanish, Swedish, and Thai. There’s also a language independent mode which expects files to be phonetically transcribed in X-SAMPA, and a special mode for Australian aboriginal languages. Additionally, many of these languages have modes for multiple different dialects. More information can be found in the MAUS help files and in the MAUS web interface.
Let’s have a look at the database summary again:
summary(ger2_DB)
── Summary of emuDB ────────────────────────────────────────────────────────────
Name: ger2
UUID: 5b818e3b-1d82-4471-985f-2ea5e5c8e2e4
Directory: C:\Users\rasmu\surfdrive\emuintro\emuintro\emu_databases\ger2_emuDB
Session count: 1
Bundle count: 2
Annotation item count: 58
Label count: 74
Link count: 52
── Database configuration ──────────────────────────────────────────────────────
── SSFF track definitions ──
dataramme med 0 kolonner og 0 rækker
── Level definitions ──
name type nrOfAttrDefs attrDefNames
bundle ITEM 2 bundle; transcription;
ORT ITEM 3 ORT; KAN; KAS;
MAU SEGMENT 1 MAU;
MAS ITEM 1 MAS;
── Link definitions ──
type superlevelName sublevelName
ONE_TO_MANY bundle ORT
ONE_TO_MANY ORT MAS
ONE_TO_MANY MAS MAU
Note that multiple extra annotation levels and attributes have been created (ORT
, KAN
, KAS
, MAU
, and MAS
), as well as links between them.
Let’s serve()
the database.
serve(ger2_DB, useViewer = FALSE)
We immediately see that phone-level annotations have been added in the MAU
level. Have a look at the hierarchy view and try to identify the levels, links, and attributes.
This shows that the phone-level annotations are linked to syllable-level annotations in the MAS
level and word-level orthographic annotations in the ORT
level. The ORT
level further has the attributes KAN
and KAS
. These contain canonical representations of the word, i.e. phonetic annotations corresponding to the canonical pronunciations of these words. The MAS
level is chunked into syllables.
Given this complex information, more complex queries are now also possible. We can find the word-initial MAU
segments of all polysyllabic words like this:
<- query(ger2_DB,
mau.s "[[MAU =~.* & Start(ORT, MAU)=1] ^ Num(ORT, MAS) > 1]")
#deselect the db_uuid column for readability
%>% select(-db_uuid) mau.s
This data frame can then be passed to requery_hier()
so we can see the labels in the ORT
level associated with these words, like so:
requery_hier(ger2_DB, mau.s, "ORT") %>% select(-db_uuid)
This may all seem rather opaque, but we’ll go into much more detail with how the querying language works in Chapter 7.
5.4 Forced alignment: Albanian
Next we’ll try out forced alignment for a different language (Albanian) and we will show how forced alignment can be done from a canonical phonemic transcription instead of from text.
5.4.1 From a text collection
First we’ll use a text collection like we saw for German previously. This text collection is in our sourceDir
in a folder called albanian
:
<- file.path(sourceDir, "albanian") path.albanian
First we’ll convert the text collection into an Emu database using convert_txtCollection()
as above.
convert_txtCollection(dbName = "alb",
sourceDir = path.albanian,
targetDir = targetDir,
verbose=FALSE)
<- load_emuDB(file.path(targetDir, "alb_emuDB"), verbose = FALSE)
alb_DB summary(alb_DB)
── Summary of emuDB ────────────────────────────────────────────────────────────
Name: alb
UUID: 38874968-8ad6-48cf-8992-5fa6098e30d7
Directory: C:\Users\rasmu\surfdrive\emuintro\emuintro\emu_databases\alb_emuDB
Session count: 1
Bundle count: 4
Annotation item count: 4
Label count: 8
Link count: 0
── Database configuration ──────────────────────────────────────────────────────
── SSFF track definitions ──
dataramme med 0 kolonner og 0 rækker
── Level definitions ──
name type nrOfAttrDefs attrDefNames
bundle ITEM 2 bundle; transcription;
── Link definitions ──
dataramme med 0 kolonner og 0 rækker
Have a look at the database, switch to hierarchy view, and verify that the words have been located at bundle -> transcription
as for the German database above.
serve(alb_DB, useViewer = F)
Now run MAUS, just as before. The language code for Albanian is sqi-AL
. Note that this will take longer than for German, possibly a couple of minutes.
runBASwebservice_all(alb_DB,
transcriptionAttributeDefinitionName = "transcription",
language = "sqi-AL",
runMINNI = FALSE,
verbose = FALSE)
summary(alb_DB)
── Summary of emuDB ────────────────────────────────────────────────────────────
Name: alb
UUID: 38874968-8ad6-48cf-8992-5fa6098e30d7
Directory: C:\Users\rasmu\surfdrive\emuintro\emuintro\emu_databases\alb_emuDB
Session count: 1
Bundle count: 4
Annotation item count: 138
Label count: 176
Link count: 125
── Database configuration ──────────────────────────────────────────────────────
── SSFF track definitions ──
dataramme med 0 kolonner og 0 rækker
── Level definitions ──
name type nrOfAttrDefs attrDefNames
bundle ITEM 2 bundle; transcription;
ORT ITEM 3 ORT; KAN; KAS;
MAU SEGMENT 1 MAU;
MAS ITEM 1 MAS;
── Link definitions ──
type superlevelName sublevelName
ONE_TO_MANY bundle ORT
ONE_TO_MANY ORT MAS
ONE_TO_MANY MAS MAU
Look at the database and verify that the same kind of information has been automatically derived, as for the German database earlier.
serve(alb_DB, useViewer = FALSE)
5.4.2 From a canonical representation
MAUS also allows to segment sound files based on the canonical level that we saw in the KAN
attribute above. This can be useful when the canonical representation provided by MAUS deviates considerably from what was actually said. For one of the words in 0001BF_1syll_1
, the canonical representation is J E
but what was actually said is closer to n J E
.
To see how this works, switch the hierarchy view in the webApp
from ORT
→ KAN
and then change the node J E
of the ORT:KAN
level to n J E
for file 0001BF_1syll_1
, as in ?fig-albhier. Remember to save the annotation after editing.
We can now run MAUSdirectly on this canonical level using the runBASwebservice_maus()
function. Here we again pass the language, and we pass the name of the existing annotation level with a canonical representation KAN
to the argument canoAttributeDefinitionName
. We also specify the newly created force-aligned level mausAttributeDefinitionName
, which we call MAU2
in order to differentiate it MAU
level that we already created.
runBASwebservice_maus(alb_DB,
canoAttributeDefinitionName = "KAN",
mausAttributeDefinitionName = "MAU2",
language = "sqi-AL",
verbose = FALSE)
Inspect the database again. There should now be a new annotation level called MAU2
.
summary(alb_DB)
── Summary of emuDB ────────────────────────────────────────────────────────────
Name: alb
UUID: 38874968-8ad6-48cf-8992-5fa6098e30d7
Directory: C:\Users\rasmu\surfdrive\emuintro\emuintro\emu_databases\alb_emuDB
Session count: 1
Bundle count: 4
Annotation item count: 219
Label count: 257
Link count: 197
── Database configuration ──────────────────────────────────────────────────────
── SSFF track definitions ──
dataramme med 0 kolonner og 0 rækker
── Level definitions ──
name type nrOfAttrDefs attrDefNames
bundle ITEM 2 bundle; transcription;
ORT ITEM 3 ORT; KAN; KAS;
MAU SEGMENT 1 MAU;
MAS ITEM 1 MAS;
MAU2 SEGMENT 1 MAU2;
── Link definitions ──
type superlevelName sublevelName
ONE_TO_MANY bundle ORT
ONE_TO_MANY ORT MAS
ONE_TO_MANY MAS MAU
ONE_TO_MANY ORT MAU2
If you have reason to suspect that the canonical representation will differ from what is actually said in the recording, you can use the function runBASwebservice_g2pForPronunciation()
as the first step. This will only generate the canonical reprsentations without doing forced alignment, and this can then be corrected before running the MAUS forced alignment routine. If you have much more data, this may speed up your process.
5.5 Functions introduced in this chapter
read_file()
: reads the content of a.txt
file into Rconvert_txtCollection()
: converts pairs of.wav
and.txt
files into an Emu database.run_BASwebservice_all()
: performs all the steps needed to get from an Emu database with orthographical transcription that aren’t time-aligned to a multi-level force-aligned phonetic annotationrun_BASwebservice_maus()
: performs forced alignment on an Emu database which already has (non-time-aligned) canonical phonetic annotationsrun_BASwebservice_g2pForPronunciation()
: performs grapheme-to-phoneme (G2P) conversion of an orthographical transcription to a canonical phonetic transcription on an Emu database