查詢結果分析
來源資料
相關文獻
- 運用以卡方為基礎的統計方法於色情網頁分類之研究
- 卡方分配的近似百分點
- Developing Alexander-Govern Trimmed Mean Test for One-way Fixed Effects AVOVA Model under Variance Heterogeneity and Non-normality
- Estimating the Standard Deviation for Correlated Data
- 保險現象的卡方分配檢定
- 在型Ⅰ及型Ⅱ誤差同時控制下之不良率的檢定
- 在小樣本下不良率的區間估計
- 在二維列聯表中有關於RC模式的一些計算問題
- 論非心卡方分配之高級負動差
- Improved Design of Quantile-Based Control Charts
頁籤選單縮合
題名 | 運用以卡方為基礎的統計方法於色情網頁分類之研究=Classifying Pornographic Web Pages Using a Chi-Square Based Statistics Method |
---|---|
作者 | 李龍豪; 陸承志; Lee, Lung-hao; Luh, Cheng-jye; |
期刊 | 資訊管理學報 |
出版日期 | 20070400 |
卷期 | 14:2 2007.04[民96.04] |
頁次 | 頁225-246 |
分類號 | 448.6 |
語文 | chi |
關鍵詞 | 網路內容分類; 色情黑名單; 不當資訊過濾; 卡方分配; Web content rating; Pornographic black list; Inappropriate web content filtering; Chi-square distribution; |
中文摘要 | 由於網際網路的普及,資訊的散佈非常迅速,網路上充斥著各種良莠不齊的資訊,越來越多的不當資訊,例如色情小說、圖片與粗暴文字等,在缺乏完善的網路內容管理機制之下,使用者只要透過搜尋引擎輸入相關的關鍵字,就可以從搜尋結果藉由超連結輕易存取網站內容,因此網路內容管理已成為刻不容緩的議題。本研究針對不當資訊中的色情範疇,提出一個以色情網頁分類,來蒐集黑名單的方式,對色情網站內容中文字的部份,求出個別字詞(Word)的色情傾向(Porn Tendency),透過卡方分配計算出色情指標值(Indicator Value),將網頁分成色情(Porn)、未確定(Unsure)與非色情(Non-Porn)三類。色情類網頁的網址即為所謂的黑名單,可做為網路色情過濾的依據。本研究針對中文與英文語系網頁實作一個系統,實驗結果顯示,本提議方法具有高度的精確率與相當低的正誤判率。 |
英文摘要 | With the rapid growing of Internet usage, inappropriate materials (e.g. porn, drug, violence et al.) had been flooded on the Web. The open characteristic of the Web allows users to access almost any type of such inappropriate materials, consequently having various negative effects on the users, particularly on the children. Thus, web content rating and filtering mechanism is a worthy and pressing issue. This study proposes a chi-square based statistics method for classifying pornographic materials. Given a web page, its textual content is first split into a list of tokens, along a porn tendency weight for each token. The proposed method then calculates an indicator value (I-value) for the web page by combining the tokens' porn tendency weights through properties of chi-square distribution. The resulting I-value is used to classify the given web page into one of three categories, Porn, Unsure and Non-Porn. The web pages in the Porn Category are finally collected into a black list. Currently, the proposed method can classify English and Chinese Web pages. Experimental results indicate that the proposed method can detect pornographic web content at a superior precision rate along with a very low false positive rate. |
本系統之摘要資訊系依該期刊論文摘要之資訊為主。