頁籤選單縮合
題 名 | White Page Construction from Web Pages for Finding People on the Internet |
---|---|
作 者 | Chen,Hsin-hsi; Bian,Guo-wei; | 書刊名 | International Journal of Computational Linguistics & Chinese Language Processing |
卷 期 | 3:1 1998.02[民87.02] |
頁 次 | 頁75-100 |
分類號 | 312.2 |
關鍵詞 | Proper name identification; Information extraction; White pages; World wide web; |
語 文 | 英文(English) |
英文摘要 | This paper proposes a method to extract proper names and their associated information from web pages for Internet/Intranet users automatically. The information extracted from World Wide Web documents includes proper nouns, E-mail addresses and home page URLs. Natural language processing techniques are employed to identify and classify proper nouns, which are usually unknown words. The information (i.e., home pages' URLs or e-mail addresses) for those proper nouns appearing in the anchor parts can be easily extracted using the associated anchor tags. For those proper nouns in the non-anchor part of a web page, different kinds of clues, such as the spelling method, adjacency principle and HTML tags, are used to relate proper nouns to their corresponding E-mail addresses and/or URLs. Based on the semantics of content and HTML tags, the extracted information is more accurate than the results obtained using traditional search engines. The results can be used to construct white pages for Internet/Intranet users or to build databases for finding people and organizations on the Internet. Such searching services are very useful for human communication and dissemination of information. |
本系統中英文摘要資訊取自各篇刊載內容。