頁籤選單縮合
題 名 | The Properties and Further Applications of Chinese Frequent Strings |
---|---|
作 者 | Lin,Yih-jeng; Yu,Ming-shing; | 書刊名 | International Journal of Computational Linguistics & Chinese Language Processing |
卷 期 | 9:1 2004.02[民93.02] |
頁 次 | 頁113-128 |
分類號 | 312.13 |
關鍵詞 | Chinese frequent strings; Unknown words; Chinese toneless phoneme-to-character; Chinese spelling error correction; Language model; |
語 文 | 英文(English) |
英文摘要 | This paper reveals some important properties of CFSs and applications in Chinese natural language processing (NLP). We have previously proposed a method for extracting Chinese frequent strings that contain unknown words from a Chinese corpus [Lin and Yu 2001]. We found that CFSs contain many 4-character strings, 3-word strings, and longer n-grams. Such information can only be derived from an extremely large corpus using a traditional language model(LM). In contrast to using a traditional LM, we can achieve high precision and efficiency by using CFSs to solve Chinese toneless phoneme-to-character conversion and to correct Chinese spelling errors with a small training corpus. An accuracy rate of 92.86% was achieved for Chinese toneless phoneme-to-character conversion, and an accuracy rate of 87.32% was achieved for Chinese spelling error correction. We also attempted to assign syntactic categories to a CFS. The accuracy rate for assigning syntactic categories to the CFSs was 88.53% for outside testing when the syntactic categories of the highest level were used. |
本系統中英文摘要資訊取自各篇刊載內容。