Newsblaster Russian-English Clustering Performance Analysis

Leftin, Lawrence J.

The Natural Language Group is developing a multi-language version of Columbia Newsblaster, a program that generates summaries of news articles collected from web sites. Newsblaster currently processes articles in Arabic, Japanese,Portuguese, Spanish, and Russian, as well as English. This report outlines the Russian language processing software,focusing on machine translation and document clustering. Russian-English clustering results are analyzed and indicate encouraging inter-language and intra-language performance.



  • thumnail for demo title for ac:109658 demo title for ac:109658 application/octet-stream 145 KB Download File

More About This Work

Academic Units
Computer Science
Columbia University Computer Science Technical Reports, 41
Published Here
August 26, 2009