頁籤選單縮合
題 名 | Aligning Parallel Bilingual Corpora Statistically with Punctuation Criteria |
---|---|
作 者 | Chuang, Thomas C.; Yeh, Kevin C.; | 書刊名 | International Journal of Computational Linguistics & Chinese Language Processing |
卷 期 | 10:1 民94.03 |
頁 次 | 頁95-122 |
分類號 | 802.77 |
關鍵詞 | Sentence alignment; Cognate alignment; Machine translation; |
語 文 | 英文(English) |
英文摘要 | We present a new approach to aligning sentences in bilingual parallel corpora based on punctuation, especially for English and Chinese. Although the length-based approach produces high accuracy rates of sentence alignment for clean parallel corpora written in two Western languages, such as French-English or German-English, it does not work as well for parallel corpora that are noisy or written in two disparate languages such as Chinese-English. It is possible to use cognates on top of the length-based approach to increase the alignment accuracy. However, cognates do not exist between two disparate languages, which limit the applicability of the cognate-based approach. In this paper, we examine the feasibility of exploiting the statistically ordered matching of punctuation marks in two languages to achieve high accuracy sentence alignment. We have experimented with an implementation of the proposed method on parallel corpora, the Chinese-English Sinorama Magazine Corpus and Scientific American Magazine articles, with satisfactory results. Compared with the length-based method, the proposed method exhibits better precision rates based on our experimental reuslts. Highly promising improvement was observed when both the punctuation-based and length-based methods were adopted within a common statistical framework. We also demonstrate that the method can be applied to other language pairs, such as English-Japanese, with minimal additional effort. |
本系統中英文摘要資訊取自各篇刊載內容。