Academic Commons

Presentations (Communicative Events)

Bootstrapping Phonetic Lexicons for New Languages

Maskey, Sameer R.; Black, Alan W; Tomokiyo, Laura M.

Although phonetic lexicons are critical for many speech applications, the process of building one for a new language can take a significant amount of time and effort. We present a bootstrapping algorithm to build phonetic lexicons for new languages. Our method relies on a large amount of unlabeled text, a small set of ’seed words ’ with their phonetic transcription, and the proficiency of a native speaker in correctly inspecting the generated pronunciations of the words. The method proceeds by automatically building Letter-to-Sound (LTS) rules from a small set of the most commonly occurring words in a large corpus of a given language. These LTS rules are retrained as new words are added to the lexicon in an Active Learning step. This procedure is repeated until we have a lexicon that can predict the pronunciation of any word in the target language with the accuracy desired. We tested our approach for three languages: English, German and Nepali.

Files

More About This Work

Academic Units
Computer Science
Publisher
Interspeech 2004
Published Here
May 31, 2013