2004 Presentations (Communicative Events)
Bootstrapping Phonetic Lexicons for New Languages
Although phonetic lexicons are critical for many speech applications, the process of building one for a new language can take a significant amount of time and effort. We present a bootstrapping algorithm to build phonetic lexicons for new languages. Our method relies on a large amount of unlabeled text, a small set of ’seed words ’ with their phonetic transcription, and the proficiency of a native speaker in correctly inspecting the generated pronunciations of the words. The method proceeds by automatically building Letter-to-Sound (LTS) rules from a small set of the most commonly occurring words in a large corpus of a given language. These LTS rules are retrained as new words are added to the lexicon in an Active Learning step. This procedure is repeated until we have a lexicon that can predict the pronunciation of any word in the target language with the accuracy desired. We tested our approach for three languages: English, German and Nepali.
Subjects
Files
- maskey_al_04a.pdf application/pdf 49.3 KB Download File
More About This Work
- Academic Units
- Computer Science
- Publisher
- Interspeech 2004
- Published Here
- May 31, 2013