查詢結果分析
相關文獻
- Design and Evaluation of Approaches to Automatic Chinese Text Categorization
- 自動化文件分類在資訊服務上的應用
- A Comparative Study on Term Selection for Chinese Text Categorization
- 終端使用者詞彙選用與概念一致性之研究--以臺灣大學學生使用PsycLIT光碟資料庫為例
- 網頁文件分類相關技術之研究
- 終端使用者之辭彙選擇與心智模型研究
- 應用主題地圖於知識整理
- 以三階層文件分類技術探索顧客的需求
- 以語意為基礎之網路犯罪資訊搜尋研究
- 基於文件分類技術之資訊追蹤系統
頁籤選單縮合
題 名 | Design and Evaluation of Approaches to Automatic Chinese Text Categorization |
---|---|
作 者 | Tsay,Jyh-jong; Wang,Jing-doo; | 書刊名 | International Journal of Computational Linguistics & Chinese Language Processing |
卷 期 | 5:2 2000.08[民89.08] |
頁 次 | 頁43-58 |
分類號 | 028.7 |
關鍵詞 | 詞彙選擇; 文件分類; Term clustering; Term selection; Text categorization; |
語 文 | 英文(English) |
英文摘要 | In this paper, we propose and evaluate approaches to categorizing Chinese texts, which consist of term extraction, term selection, term clustering and text classification. We propose a scalable approach which uses frequency counts to identify left and right boundaries of possibly significant terms. We used the combination of term selection and term clustering to reduce the dimension of the vector space to a practical level. While the huge number of possible Chinese terms makes most of the machine learning algorithms impractical, results obtained in an experiment on a CAN news collection show that the dimension could be dramatically reduced to 1200 while approximately the same level of classification accuracy was maintained using our approach. We also studied and compared the performance of three well known classifiers, the Rocchio linear classifier, naive Bayes probabilistic classifier and k-nearest neighbors(kNN) classifier, when they were applied to categorize Chinese texts. Overall, kNN achieved the best accuracy, about 78.3%, but required large amounts of computation time and memory when used to classify new texts. Rocchio was very time and memory efficient, and achieved a high level of accuracy, about 75.4%. In practical implementation, Rocchio may be a good choice. |
本系統中英文摘要資訊取自各篇刊載內容。