2003 Reports
A Platform for Multilingual News Summarization
We have developed a multilingual version of Columbia Newsblaster as a testbed for multilingual multi-document summarization. The system collects, clusters, and summarizes news documents from sources all over the world daily. It crawls news sites in many different countries, written in different languages, extracts the news text from the HTML pages, uses a variety of methods to translate the documents for clustering and summarization, and produces an English summary for each cluster. The system is robust, running daily over real-world data. The multilingual version of Columbia Newsblaster provides a platform for testing different strategies for multilingual document clustering, and approaches for multilingual multi-document summarization.
Subjects
Files
-
cucs-014-03.pdf application/pdf 585 KB Download File
More About This Work
- Academic Units
- Computer Science
- Publisher
- Department of Computer Science, Columbia University
- Series
- Columbia University Computer Science Technical Reports, CUCS-014-03
- Published Here
- April 26, 2011