Academic Commons

Reports

Newsblaster Russian-English Clustering Performance Analysis

Leftin, Lawrence J.

The Natural Language Group is developing a multi-language version of Columbia Newsblaster, a program that generates summaries of news articles collected from web sites. Newsblaster currently processes articles in Arabic, Japanese,Portuguese, Spanish, and Russian, as well as English. This report outlines the Russian language processing software,focusing on machine translation and document clustering. Russian-English clustering results are analyzed and indicate encouraging inter-language and intra-language performance.

Subjects

Files

More About This Work

Academic Units
Computer Science
Series
Columbia University Computer Science Technical Reports, 41
Published Here
August 26, 2009