查詢結果分析
來源資料
頁籤選單縮合
題 名 | 部落格本文自動萃取機制=An Automatic Blog Text Extraction Mechanism |
---|---|
作 者 | 洪智力 ; 林政輝 ; | 書刊名 | 電子商務研究 |
卷 期 | 8:4 2010.12[民99.12] |
頁 次 | 頁457-472 |
分類號 | 312.13 |
關鍵詞 | 部落格文章; 資訊擷取; 文字探勘; 文件物件模型; Blog text; Information extraction; Text mining; Document object model; |
語 文 | 中文(Chinese) |
中文摘要 | 在部落格快速發展的時代,部落格上的資訊越來越多且具有參考價值,部落格文字內容探勘已成為網頁探勘研究的重要分支。要能自動化讀取部落格的文字內容,必須正確的找出描述本文的網頁標籤。本研究提出「網頁標籤文字相對比例法」,找出最有可能的本文標籤,此技術運用文件物件模型(DOM; document object model)的概念並透過網頁爬行器自動萃取部落格本文。經過實驗說明,本研究所提供的部落格本文自動萃取機制,能正確的過濾雜訊,找出本文標籤。 |
英文摘要 | In the era of blog, more and more useful information is shared on blogs. Mining text on blogs has become one of important and novel research directions in the filed of web mining. For an automatic blog text mining system, it is necessary to locate the tags which describe the main concepts of blog text effectively and efficiently. This research uses the technique of relative proportion of text and tag in order to find the most possible tag for main blog text. More particularly, we use the concept of DOM (document object model) through the java crawler to analyze the relationship between text and tag. According to our experiments, our automatic blog text extraction mechanism is able to extract the main text of blog effectively and efficiently. |
本系統中英文摘要資訊取自各篇刊載內容。