頁籤選單縮合
題 名 | Reliable and Cost-Effective Pos-Tagging |
---|---|
作 者 | Tsai,Yu-fang; Chen,Keh-jiann; | 書刊名 | International Journal of Computational Linguistics & Chinese Language Processing |
卷 期 | 9:1 2004.02[民93.02] |
頁 次 | 頁83-95 |
分類號 | 312.13 |
關鍵詞 | Part-of-speech tagging; Corpus; Reliability; Ambiguous resolution; |
語 文 | 英文(English) |
英文摘要 | In order to achieve fast, high quality Part-of-speech (pos) tagging, algorithms should achieve high accuracy and require less manually proofreading. This study aimed to achieve these goals by defining a new criterion of tagging reliability, the estimated final accuracy of the tagging under a fixed amount of proofreading, to be used to judge how cost-effective a tagging algorithm is. In this paper, we also propose a new tagging algorithm, called the context-rule model, to achieve cost-effective tagging. The context rule model utilizes broad context information to improve tagging accuracy. In experiments, we compared the tagging accuracy and reliability of the context-rule model, Markov bi-gram model and word-dependent Markov bi-gram model. The result showed that the context-rule model outperformed both Markov models. Comparing the models based on tagging accuracy, the context-rule model reduced the number of errors 20% more than the other two Markov models did. For the best cost-effective tagging algorithm to achieve 99% tagging accuracy, it was estimated that, on average, 20% of the samples of ambiguous words needed to be rechecked. We also compared tradeoff between the amount of proofreading needed and final accuracy for the different algorithms. It turns out that an algorithm with the highest accuracy may not always be the most reliable algorithm. |
本系統中英文摘要資訊取自各篇刊載內容。