Combining Visual Layout and Lexical Cohesion Features for Text Segmentation

Kan, Min-Yen

We propose integrating features from lexical cohesion with elements from layout recognition to build a composite framework. We use supervised machine learning on this composite feature set to derive discourse structure on the topic level. We demonstrate a system based on this principle and use both an intrinsic evaluation as well as the task of genre classification to assess its performance.



More About This Work

Academic Units
Computer Science
Department of Computer Science, Columbia University
Columbia University Computer Science Technical Reports, CUCS-002-01
Published Here
April 22, 2011