2010 Presentations (Communicative Events)
“Got You!”: Automatic Vandalism Detection in Wikipedia
with Web-based Shallow Syntactic-Semantic Modeling
Discriminating vandalism edits from non-vandalism edits in Wikipedia is a challenging task, as ill-intentioned edits can include a variety of content and be expressed in many different forms and styles. Previous studies are limited to rule-based methods and learning based on lexical features, lacking in linguistic analysis. In this paper, we propose a novel Web-based shallow syntactic-semantic modeling method, which utilizes Web search results as resource and trains topic-specific n-tag and syntactic n-gram language models to detect vandalism. By combining basic task-specific and lexical features, we have achieved
high F-measures using logistic boosting and logistic model trees classifiers, surpassing the results reported by major Wikipedia vandalism detection systems.
Subjects
Files
- C10-1129.pdf application/pdf 669 KB Download File
More About This Work
- Academic Units
- Computer Science
- Published Here
- April 29, 2013