Academic Commons

Presentations (Communicative Events)

The automatic induction of concatenative units from machine readable dictionaries and corpora for speech synthesis

Tzoukermann, Evelyne; Klavans, Judith L.

The purpose of this research is to determine the best method for deciding on an optimal set of concatenative units for concatenative speech synthesis. Of the two main approaches to speech synthesis: segmental synthesis and rule-based synthesis, the former relies heavily on the successful choice of concatenative units. Segment al synthesis consists of concatenating segmental units (diphones, triphones, etc); rule-based synthesis consists of the computation of control parameters based on pre-established rules. Deciding on the set of diphones is quite straightforward in the sense that it suffices to take the phoneme inventory of a language, and simply combine each phoneme with every other one. For example, taking the approximately 35 French phonemes, 1225 phonemic pairs (35x35) constitute the complete and exhaustive starting diphone inventory. On the other hand, deciding on the set of triphones, quadriphones and larger units raises difficult questions about the nature of phonemes in a given language such as: (1) stability vs instability in a coarticulatory environment, (2) size of overall inventory, and (3) frequency of that unit in the language, in combination with factors (1) and (2).
We report on experiments with four different databases, with comparisons between the resources regarding their n-gram frequency output. The first two databases consist of pronunciation field information from two dictionaries, the Encyclopedic Robert French dictionary with 85,000 headwords, and the smaller Collins Gem containing 15,000 words. For comparison, we use two text corpora, the Hansard (about 2.5 million words) and the smaller Tubach and Boe corpus (80,000 words); both corpora were processed by a set of grapheme-to-phoneme rules. A frequency extraction program was applied to all four resources to extract trigram phonemic frequencies; this serves as a basis for comparison between dictionary derived data and corpus derived, frequencies.


  • thumnail for klavans_tzoukermann_94b.pdf klavans_tzoukermann_94b.pdf application/pdf 62.5 KB Download File

More About This Work

Academic Units
Computer Science
Proceedings of the Acoustical Society of America (ASA)
Published Here
April 24, 2013