查詢結果分析
相關文獻
- 建構一個以共時與歷時語言研究為導向的歷史語料庫
- 語料庫在口筆譯教學與研究上的應用
- 東漢魏晉南北朝在語法史上的地位
- From Quantitative to Qualitative Studies: Developments in Chinese Computational and Corpus Linguistics
- 運用文件探勘於語料庫之辦公室服務代理人
- 中英雙語近義句翻譯檢索系統
- 漢語語料庫裡的「族」:走在虛化路上的類詞綴
- HMM式中文詞性自動標注系統
- POI擷取:商家名稱辨識與地址配對之研究
- Corpus-driven Creation of a Reliable Learner's Vocabulary for Classical Chinese
頁籤選單縮合
題 名 | 建構一個以共時與歷時語言研究為導向的歷史語料庫=Historical Corpora for Synchronic and Diachronic Linguistics Studies |
---|---|
作 者 | 魏培泉; 譚樸森; 劉承慧; 黃居仁; 孫朝奮; | 書刊名 | International Journal of Computational Linguistics & Chinese Language Processing |
卷 期 | 2:1 1997.02[民86.02] |
頁 次 | 頁131-145 |
分類號 | 028.7 |
關鍵詞 | 語料庫; 詞彙庫; 詞類; 標記; 檢索; 古代漢語; 中古漢語; 近代漢語; Corpus; Lexical database; Part-of-speech; Mark-up; Tagging; Old Chinese; Middle Chinese; Early mandarin Chinese; |
語 文 | 中文(Chinese) |
中文摘要 | 中央研究院古漢語語料庫是為古漢語語言研究而構建的。這個語料庫不但具有大量的可作為古漢語語法及詞彙研究的電子文獻,而且擁有可以對文獻的語詞進行檢索、統計、搭配的多功能程式。以語法的發展為準,這個語料庫又分作上古漢語、中古漢語、近代漢語等三個次語料庫,相信這樣的劃分對古漢語的共時或歷時的研究都是頗為便益的。 現在上古漢語語料庫中有相當數量的文獻已經依據其原典、作者、文體等等完成了分類及標注的工作,其中又有不少文獻已經做了斷詞,在已斷詞的文獻中又有幾部古籍已完成詞類的標記。這些斷詞以及詞類標記的成果現已構成我們上古漢語詞彙庫的基礎。 |
英文摘要 | The Academia Sinica Ancient Chinese Corpus is designed for linguistic research. The corpus contains ancient texts that are selected because of their usefulness in grammatical and lexical studies, as well as an inspection program with keyword searching, statistics, and collocation functions. The corpus is divided into three subcorpora according to stages of grammatical developments, thus both synchronic and diachronic studies can be performed on them. Their current sizes are as follows: A. Old Chinese subcorpus (from pre-Qin to Pre-Han):5,128,068 characters. B. Middle Chinese subcorpus (from Late Han to the Six Dynasties):8,101,662 characters. C. Early Mandarin Chinese subcorpus (from Tang to Ching):4,406,381 characters. A great portion of the texts from the Old Chinese subcorpus (4,497,051 characters) has been textually classified and marked-up according to their source books, author, text genre etc. A substantive part (520,794 characters) of the same subcorpus has also been segmented into words, which are in turn given part-of-speech tagging. Results of the above two tasks form the basis of our Old Chinese Lexical Database. |
本系統中英文摘要資訊取自各篇刊載內容。