查詢結果分析
來源資料
相關文獻
- 以模糊理論和遺傳演算法為基礎的中文文件自動分類之研究
- 自我調適的動態排程系統--限制排程、模糊理論和遺傳演算法的應用
- 智慧型財富管理決策模式之研究--應用人工智慧方法
- 混合型軟式計算系統於健保醫療費用專業審查自動化之應用
- 以模糊理論結合遺傳演算法應用於水庫防洪操作之最佳化研究
- 模糊遺傳演算法在鋼筋混凝土橋墩耐震性能設計之運用
- 傳真手寫數字自動辨識系統
- Modeling and Scheduling for a Flexible Manufacturing System Using Petri Net and Genetic Algorithm
- 模糊理論應用於規劃滿足學生需求所需之行政配合
- 應用遺傳演算法優選大埔水庫排砂操作規線
頁籤選單縮合
題名 | 以模糊理論和遺傳演算法為基礎的中文文件自動分類之研究= |
---|---|
作者 | 侯永昌; 楊雪花; |
期刊 | 模糊系統學刊 |
出版日期 | 19980800 |
卷期 | 4:1 1998.08[民87.08] |
頁次 | 頁45-57 |
分類號 | 312.1 |
語文 | chi |
關鍵詞 | 模糊理論; 遺傳演算法; 文件自動分類; 二元值; Fuzzy theory; Genetic algorithm; Automatic document classification; Bigrams; |
中文摘要 | 近年來資訊以驚人的速度成長,尤其是網路的普及與文章大量的流通,若能事先 將這些資訊分類,則可以加快檢索速度及提高檢索速度及提高檢索之正確率。由於人工分類 的速度,已經遠遠趕不上資訊產生的速度,因此本論文提出一個中文文件自動化分類方法。 首先,將文章中虛字、頻率只出現一次的單字及二元詞刪除而取得初步的短句,再將標題中 的詞彙加重其權重,藉以提高分類的正確率。我們並且利用遺傳演算法來求出最佳的門檻值 ,並以此門檻值來篩選出文章中重要的關鍵詞,之後以文章中的詞彙出現的次數、文件編數 、集中度、廣度做為選詞的重要指標。最後,再將篩選出的各類詞庫次數經公式計算出標準 權重。 同時,每篇測試文章也須經由電腦自動斷詞,產生文章向量,最後,計算出各類詞庫標準權 重與該文章向量的內積值,由內積值較大來決定該文章的類別。自動分類時,除了傳統的單 一分類外,也嘗試了利用模糊理論求算重複分類的方法。實驗結果顯示本研究所提的方法, 在文章自動分類實驗中的平均擷取率可達 81%,其精確率亦可達 76%,足以顯著示本研究在 中文自動分類上的優異性。 |
英文摘要 | Information has been growing rapidly in recent years, especially with the popularization of Internet and the mass-circulation of articles. If the information is classified in advance, the speed of retrieval data can be quickened and the rate of accuracy can be enhanced as well. However, the classifications made by hands are far behind the rapid growth of new information. This paper, consequently, attempts to propose an automatic method to classify Chinese articles. First of all, get the primitive short sentence in the article by crossing out the don't care words, numbers, words and bigrams appeared only once Then, enhance the weight of the words and phrases in the title in order to have a higher rate of accuracy. In addition, genetic algorithm is used to figure out the best threshold, and this threshold is for picking out crucial key words in the article. The frequency of words and phrases, entropy, distribution, and α-cut threshold are important guidelines in choosing phrases. Finally, figure out the standard weight of those chosen words. In the meantime, every article in experiment should go through automatic word identification by computers to create document vector. The inner product of the standard weight in different categories of words and the inner product of document vector in the article are figured out. The categories of the articles depend on the larger inner product. In addition to traditional mono-classification, fuzzy theory is also used to figure out the way of multiple classification. The findings in the research indicate the average recall rate can reach 81 percent in the experiment on automatic classification of articles and 75 percent in accuracy. It is sufficient to prove the superiority of the research in the automatic classification of Chinese articles. |
本系統之摘要資訊系依該期刊論文摘要之資訊為主。