2001 Reports
Combining Visual Layout and Lexical Cohesion Features for Text Segmentation
We propose integrating features from lexical cohesion with elements from layout recognition to build a composite framework. We use supervised machine learning on this composite feature set to derive discourse structure on the topic level. We demonstrate a system based on this principle and use both an intrinsic evaluation as well as the task of genre classification to assess its performance.
Subjects
Files
- cucs-002-01.pdf application/pdf 187 KB Download File
More About This Work
- Academic Units
- Computer Science
- Publisher
- Department of Computer Science, Columbia University
- Series
- Columbia University Computer Science Technical Reports, CUCS-002-01
- Published Here
- April 22, 2011