頁籤選單縮合
題名 | Using the Word-Based Triphone Model for Continuous-Speech Recognition=利用以字詞為基礎之三音素模型於連續語音辨識 |
---|---|
作者 | 邱創乾; 黃世中; Chiu, Chuang-chien; Huang, Shyh-jong ; |
期刊 | 逢甲學報 |
出版日期 | 19971200 |
卷期 | 32 1997.12[民86.12] |
頁次 | 頁161-174 |
分類號 | 312.23 |
語文 | eng |
關鍵詞 | 連續語音辨識; 三音素模型; TIMIT連續語音資料庫; Continuous-speech recognition; Triphone model; TIMIT; |
中文摘要 | 連續語音與單字語音的明顯差別在於連續語音具有轉折音的效應,故辨識中所使 用的語音單位應能反映此效應才符合需求。常見的語音單位有字詞模型,單音素模型及與上 下文相關的音素模型等,而一個較佳的語音單位應能同時滿足三種基本特性:對轉折音效應 的敏銳性;易於訓練辨識模型的可訓練性;及語音單位之間是否具有共通性,然上列所舉的 三種語音單位往往無法同時滿足此三種特性。本文的目的在於提出以字詞為基礎之三音素模 型,經證明較先前所提常見的語音單位更能兼顧此三種特性,且其前十五位的辨識率達 80% 。值得一提的是本研究使用的語音資料是取自 TIMIT 連續語音資料庫。 |
英文摘要 | In modern continuous-speech recognition tasks, three most commonly used speech units are phone, word, and context-dependent phone. An excellent speech unit should have the following properties: sharability, trainability and sensitivity. However, the commonly used speech units mentioned above cannot meet all these three properties. The goal of this research is to propose a word-based triphone model as the unit of speech, and it shows great improvements on sharability, trainability, and sensitivity. A word-based triphone model has six modes. A comparison between the word-based triphone model and three commonly used models is made using TIMIT database. Also, three speech features which include LPC, cepstrum and mel-cepstrum are used separately to test the recognition results. The top-15 recognition rate reaches to 80%. Finally, we will discuss the recognition rates of the six modes of word-based triphone model. The result tested from the cepstrum and mel-cepstrum speech parameters is better than that from LPC speech features. |
本系統之摘要資訊系依該期刊論文摘要之資訊為主。