頁籤選單縮合
題 名 | Document Identification Reassignment for Inverted File Compression |
---|---|
作 者 | 謝萬雲; 陳添福; 鍾崇斌; 單智君; | 書刊名 | 中華民國資訊學會通訊 |
卷 期 | 3:3 2000.09[民89.09] |
頁 次 | 頁23-37 |
專 輯 | 網際網路與分散式系統 |
分類號 | 028.7 |
關鍵詞 | 資訊檢索系統; Information retrieval system; IRS; Document identifications; Document IDs; |
語 文 | 英文(English) |
英文摘要 | Inverted file is the most popular indexing mechanism to speedup the document search in anInformation Retrieval System (IRS). The size of the inverted file is usually enormous. Traditionally,the d-gap technique is applied to an inverted file to replace document identifications (document IDs)by smaller numbers. These numbers can be further compressed efficiently. However, large gapvalues may cause the compression rate not as well as we expected. In this paper we propose adocument ID reassigning algorithm by exploiting the cluster property to reduce the gap values. Inaddition, we propose an improved notation to make up the shortcoming of d-gap technique. Weshow that the inverted file compression rate can be improved 16 to 23 over pure d-gap technique. |
本系統中英文摘要資訊取自各篇刊載內容。