查詢結果分析
來源資料
相關文獻
- 使用比較資料探勘演算法預測非小細胞肺癌患者預後因子、存活情形及其效能
- 結合環境背景資訊萃取及使用者日用電作息分析於電力消耗模式推論與需量預測
- 消費金融無擔保客戶違約協商後毀諾--資料探勘技術之應用
- 應用資料探勘技術提昇國防財務內部控制與內部審核之研究
- 應用資料探勘於勞工退休準備基金戶流失預測模式之建構
- 資料探勘演算法於軍人貪污量刑之預測及比較
- Churn Prediction Based on the Analysis of Customers' Preferences and Social Behavior on a Big Data Platform
- 運用二階段分類技術挖掘潛在中小企業借貸戶之研究
- 利用決策樹分類法建置資料倉儲中檢核與整合大量企業資料之機制
- 資料探索方法在醫學資料庫之評估
頁籤選單縮合
題名 | 使用比較資料探勘演算法預測非小細胞肺癌患者預後因子、存活情形及其效能=Comparing Data Mining Methods for Predicting Prognostic Factors and Survivability of Non-small Cell Lung Cancer Patients |
---|---|
作者姓名(中文) | 陳麗帆; 周繡玲; 周雨青; 楊燦; 簡戊鑑; 白璐; 孫建安; 曾知浩; 朱基銘; | 書刊名 | 腫瘤護理雜誌 |
卷期 | 9:2 2009.12[民98.12] |
頁次 | 頁33-47 |
分類號 | 415.468 |
關鍵詞 | 肺癌; 決策樹; 類神經網路; 資料探勘; 癌症登記資料檔; Decision tree; Artificial neural network; Logistic regression; Data mining; Lung cancer; Survivability; |
語文 | 中文(Chinese) |
中文摘要 | 本研究目的是使用決策樹(DT)、類神經網路(ANN)和邏輯斯迴歸(LR)模型三種資料探勘演算法探討非小細胞肺癌(non-small cell lung cancer, NSCLC)預後因子及模型的影響因素。研究對象為131,257位美國癌症登記資料檔(surveillance, epidemiology, and end results, SEER)中診斷為NSCLC患者,依死因不同分為死於肺癌(N=123972)與轉移癌(N=7285);限於篇幅本文只討論死於肺癌個案。模型評估指標為準確性(accuracy, ACC)、ROC曲線下的面積(area under the ROC curve, AUC)和外推力(external generalization)且進行十折交叉驗證(10-fold crossvalidation)。其研究結果顯示:綜合三個模型之一、三、五年存活情形預後變項排序的結果,死於肺癌的NSCLC患者其預後因子前三名主要為手術種類、臨床分組和腫瘤擴散程度。預測力以ANN表現較好,外推能力以LR表現較好。樣本人數建議至少3500人,LR模型最易受小樣本影響;DT則受到所提供訊息之不足而無法成樹。複合模型則是當決策樹測試組ACC值較好時,則複合模型測試組AUC值就會提高。故研究結果建議ANN預測力表現較好,外推力以LR較好;使用LR樣本大小建議大於3500人;當DT的預測力較好時,建議可以使用複合模型。 |
英文摘要 | The purpose of this study was to investigate effectiveness of decision tree (DT), artificial neural network (ANN) and logistic regression (LR) models for predicting prognostic factors and survivability of patients with non-small cell lung cancer (NSCLC). Study samples were patients diagnosed of NSCLC between 1973 and 2004 in the United States drawn from the SEER (surveillance, epidemiology and end results) databank. The dataset consists of 131,257 patients, 123,972 of whom died of lung cancer and 7,285 died of metastasis in five years. Because of the page limit, we demonstrate only results from those who died of lung cancer. The study evaluated the performance of models in terms of accuracy (ACC), area under ROC curve (AUC) and external generalization (ΔACC, ΔAUC). A 10-fold cross-validation was used to estimate unbiased values of parameters. Through synthesizing the models of DT, ANN and LR, the first 3 prognostic factors for 1-, 3- and 5-year survivability of patients died of lung cancer are surgery type, clinical stage, and the extension of cancer. The first 3 prognostic factors of patients died of metastasis are surgery type, clinical stage and the number of examined lymph nodes. ANN model had the highest ACC while LR had the worst. Decision tree for 5-year NSCLC survivability cannot be constructed due to inadequate information. The sample sizes significantly affect the performances of LR. As LR performs stable generalities beyond the amount of 3500 samples, sample size of at least 3500 people is recommended. The hybrid model can improve the performance (AUC and ACC) when DT performs better than ANN or LR. |
本系統之摘要資訊系依該期刊論文摘要之資訊為主。