Mixed Distance Measures for Optimizing Concatenative Vocabularies for Speech Synthesis: A Thesis Proposal

Polish, Nathaniel

Synthesized speech from text-to-speech systems is generally produced from the concatenation of small units of speech. The concatenation process can be complex, involving smoothing and context dependent adjustments to the speech. The overall quality of the speech produced will depend in large part on the quality of the elements used for concatenation. Selection and evaluation of these elements has been done entirely by hand. The proposed work addresses the process by which these concatenative elements are created from a natural voice and optimized. The optimization uses distance measures which exploit detailed information on the structure of the speech signals.



Computer Science
Department of Computer Science, Columbia University
Columbia University Computer Science Technical Reports, CUCS-310-87
December 7, 2011