查詢結果分析
來源資料
相關文獻
- 一種改良的啟發式方法以建構名目屬性之二元決策樹
- 閃光號誌路口交通事故特性分析--以臺南市為例
- 整合空間及遙測分析於非法廢棄物棄置場之判釋
- The Critical Deciding Factors of Auditor Choice Decision: An Application of the Decision Tree Technique
- 成本考量下的決策樹建構
- 以C4.5建立病患分類模型之探討--以兩家醫院之糖尿病病患為例
- 運用二階段分類技術挖掘潛在中小企業借貸戶之研究
- 以概念學習模式解析中西坐椅設計風格辨識的規則
- 利用決策樹分類法建置資料倉儲中檢核與整合大量企業資料之機制
- 以知識探索為本之知識組織方法論及研究分析
頁籤選單縮合
題名 | 一種改良的啟發式方法以建構名目屬性之二元決策樹=A Modified Heuristic Method to Construct the Binary Decision Tree of Nominal Attributes |
---|---|
作者 | 葉榮懋; 施武榮; 徐芳玲; | 書刊名 | 資訊管理學報 |
卷期 | 17:1 2010.01[民99.01] |
頁次 | 頁157-176 |
分類號 | 312.13 |
關鍵詞 | 決策樹; 資料探勘; 分類; 啟發式方法; 主成分分析; Decision tree; Data mining; Classification; Heuristic method; Principal component analysis; |
語文 | 中文(Chinese) |
中文摘要 | 資訊科技的日新月異,資料的儲存與處理規模均與過去有相當大的差距。如何從龐大的資料量中擷取出有用的資訊以提供給決策者參考,一直是資料探勘領域裡所關注的重點。決策樹由於其運算容易,又能產生清楚的規則,使其成為資料探勘中最常用的分類技術之一。但是當處理的資料量龐大,且名目屬性的屬性值相當多的情況之下,若每一屬性值都形成一個分支,則決策樹的分支太多將會造成所萃取的規則過於複雜難以解讀,資料在處理上的效率也會大打折扣。本論文發展一種簡化決策樹的方法,可將資料庫內的名目屬性做二元分割,把資料分成二支,以減少過多與不必要的決策樹分支。本研究採用主成分分析法中,可表示大部分變異的第一主成分,並利用該成分裡經過標準化成分分數的平均值,作為二元分割屬性值的基準,以消除過多的屬性值分支,使得決策樹的外顯知識容易解讀。最後,並以四個UCI資料庫內的資料集作為測試樣本,結果顯示本研究所提的方法,在決策樹的精簡與分類正確性上都有良好的表現。 |
英文摘要 | The ability to extract useful information from a large-scale database to aid decision-making is critical in data mining. Classification is an important problem in data mining. It has been studied extensively as a possible solution to the knowledge acquisition. Decision tree has become one of the most commonly used techniques for classifying data because the algorithm for generating a decision tree can be easily implemented. However, when there are too many distinct values of the nominal attributes in each node of a tree, the branches of the tree become enormous and complicated. As a result, the effectiveness of data processing in a large data set may be compromised. This paper aims to propose a heuristic method to simplify the decision tree by splitting the nominal attributes into two branches. We adopt principal component analysis to present an algorithm for finding a good partition strategy in order to reduce unnecessary branches of a decision tree. Since the principal component can represent most of the variants, the first component scores of each attribute will be utilized as the thresholds for splitting examples. The decision tree can be simplified to a binary tree so that the explicit knowledge of a tree can be easily extracted. We also compare against other heuristic methods and give an analysis of experimental results on four UCI data sets. |
本系統之摘要資訊系依該期刊論文摘要之資訊為主。