Academic Commons

Presentations (Communicative Events)

Domain Word Translation by Space-Frequency Analysis of Context Length Histograms

Fung, Pascale

We report a new statistical feature relating a bilingual word pair in a non-parallel English-Chinese corpus. It is found that the lengths of context segments of a word is closely correlated to that of its translation, even when the corpus is non-parallel, i.e., monolingual texts which are not translations of each other. The context segment length histogram of a word has a characteristic pattern and corresponds to that of its translation. If a word appears most frequently in long segments, its translation is found to be most likely occurring in long segments. One way to match these histograms is to first extract their salient shape characteristics by space-frequency analysis and then match them against each other using dynamic time warping. The results of matching can be used in combination with other statistical features to bootstrap a word or term translation algorithm from non-parallel corpora.

Files

More About This Work

Academic Units
Computer Science
Publisher
ICASSP96: International Conference on Acoustics, Signal and Speech Processing
Published Here
April 26, 2013
Academic Commons provides global access to research and scholarship produced at Columbia University, Barnard College, Teachers College, Union Theological Seminary and Jewish Theological Seminary. Academic Commons is managed by the Columbia University Libraries.