頁籤選單縮合
題名 | Corpus Cleanup of Mistaken Agreement Using Word Sense Disambiguation |
---|---|
作者姓名(外文) | Yu, Liang-chih; Wu, Chung-hsien; Yeh, Jui-feng; Hovy, Eduard; | 書刊名 | International Journal of Computational Linguistics & Chinese Language Processing |
卷期 | 13:4 2008.12[民97.12] |
頁次 | 頁405-419 |
分類號 | 312.13 |
關鍵詞 | Corpus cleanup; Word sense disambiguation; Semantic analysis; Entropy; |
語文 | 英文(English) |
英文摘要 | Word sense annotated corpora are useful resources for many text mining applications. Such corpora are only useful if their annotations are consistent. Most large-scale annotation efforts take special measures to reconcile inter-annotator disagreement. To date, however, nobody has investigated how to automatically determine exemplars in which the annotators agree but are wrong. In this paper, we use OntoNotes, a large-scale corpus of semantic annotations, including word senses, predicate-argument structure, ontology linking, and coreference. To determine the mistaken agreements in word sense annotation, we employ word sense disambiguation (WSD) to select a set of suspicious candidates for human evaluation. Experiments are conducted from three aspects (precision, cost-effectiveness ratio, and entropy) to examine the performance of WSD. The experimental results show that WSD is most effective in identifying erroneous annotations for highly-ambiguous words, while a baseline is better for other cases. The two methods can be combined to improve the cleanup process. This procedure allows us to find approximately 2% of the remaining erroneous agreements in the OntoNotes corpus. A similar procedure can be easily defined to check other annotated corpora. * |
本系統之摘要資訊系依該期刊論文摘要之資訊為主。