頁籤選單縮合
| 題 名 | 應用短字串索引在中英文全文資料檢索之研究 |
|---|---|
| 作 者 | 陳偉星; | 書刊名 | 大葉學報 |
| 卷 期 | 1:1 1992.12[民81.12] |
| 頁 次 | 頁161-173 |
| 分類號 | 028.7 |
| 關鍵詞 | 全文檢索; 字串比對; 詞鍵索引; 詞鍵向量; 二元字串; 字串向量; 相似值; Full text retrieval; Term indexing; Term vector; N-gram indexing; Ngram vector; Hashing; Computation of similarity; |
| 語 文 | 中文(Chinese) |
| 中文摘要 | 傳統資料庫很難擷取文件或非結構化資料的資訊,因此全文檢索系統在現代辦公室自動化中扮演極重要角色。因為中文與英文在結構、語意或文法上均有極大差異,所以應用自動化詞鍵索引在中文或中英混含資料的全文檢索上非常困難。本文將研究如何利用二元字串(2-gram)產生索引,以建立中英文全文檢索系統。中英文的二元字串包括中文、英文及數字三部份,其長度各不相同,我們將研究如何固定其長度,以利索引檔中主鍵之建立。最後討論如何計算檢索詞與文件資料的相似性。本研究將應用二元字串索引並建立大葉工學院圖書資訊系統中有關圖書趄名之全文檢索系統。 |
| 英文摘要 | It is difficult to implement the retrieval of text based or bibliographic information on the most traditional database. Several full text retrieval techniques have been proposed, however, all of them are dealing with English based text. We will propose an N-gram indexing system to retrieve Chinese text based information. In text based information retrieval operations, we do not insist on a complete match between query and document terms before particular documents are retrieved. Instead, the retrieval of an item may depend on a sufficient degree of coincidence between the sets of identifiers attached to queries and documents produced by some approximate or partial matching method. Based on 2-gram indexing system, we will propose methods for calculation of the similarity. Finally we apply the technique to our library information system so that it is able to retrieve text based information such as title of book and abstract of journal. |
本系統中英文摘要資訊取自各篇刊載內容。