頁籤選單縮合
題 名 | Lexical Analysis for Chinese-Difficulties and Possible Solutions=中文詞彙分析的困難問題及可能的解決方法 |
---|---|
作 者 | 陳克健; | 書刊名 | 中國工程學刊 |
卷 期 | 22:5 1999.09[民88.09] |
頁 次 | 頁561-571 |
專 輯 | 中文語音及語言處理 |
分類號 | 312.23 |
關鍵詞 | 詞彙分析; 自動分詞; 未知詞辨識; Lexical analysis; Word segmentation; Unknown word identification; |
語 文 | 英文(English) |
中文摘要 | 中文句子是由字所組成而且字詞中間沒有空白標記。然而語文處理時最小有意義 的單位是詞,因此詞彙分析成為中文語言處理的第一步工作。一般的分詞過程會參考一個詞 典、利用詞典的詞和詞類詞義訊息做分詞的依據。在這個過程中會遭遇兩個困難的問題,即 分詞歧義和未知詞的問題。本論文裡提出用統計方法及規律法律解決分詞歧義的方法,並討 論如何利用離線新詞自動抽取及線上自動辨認新詞兩者互補的方法解決未知詞問題。同時也 提出了設計系統的策略及所需的資料及知識。 |
英文摘要 | Chinese sentences are composed with strings of characters without blanks to mark word boundaries. However, the basic processing unit for sentence processing is the word. It is the smallest meaningful, freely used unit for any natural language. Therefore lexical analysis became the first step in processing Chinese sentences. Usually a lexicon is utilized to match words and provide their syntactic and semantic information in the process of lexical analysis. During the word matching process, problems of segmentation ambiguity and occurrences of unknown words will occur. In this paper, both statistical methods and rule-based methods are discussed for their advantages and disadvantages in solving segmentation ambiguities. For unknown word identification, off-line word extraction methods and on-line unknown word identification strategies are surveyed. Both methods complement each other in solving the problem. The strategies and knowledge sources for implementing a practical system are also discussed. |
本系統中英文摘要資訊取自各篇刊載內容。