頁籤選單縮合
題 名 | 基於非均勻加權決策規則之語音活動檢測=Voice Activity Detection Based on Nonuniform Weighted Decision Rules |
---|---|
作 者 | 簡福榮; 張民昌; | 書刊名 | 臺北科技大學學報 |
卷 期 | 37:1 2004.03[民93.03] |
頁 次 | 頁37-50 |
分類號 | 312.23 |
關鍵詞 | 語音編碼器; 語音活動檢測; 活動語音; 非活動語音; 最陡坡降演算法; Speech coder; Voice activity detection; VAD; Active voice; Inactive voice; Steepest descent algorithm; |
語 文 | 中文(Chinese) |
中文摘要 | 在語音訊號編碼過程中,可加入語音活動檢測的前置處理器,以判斷目前的輸入音框是否為活動語音(交談中的對話)或非活動語音(交談中停頓的靜音或背景雜訊)。若是活動語音音框,則採用語音編碼系統做進一步的資料壓縮;若是非活動語音音框,則採用僅需較少語音參數的簡單靜音壓縮演算法去模擬背景雜訊。根據統計,人們在交談時的停頓超過全部交談時間40 %甚至有高達60 %的情形,如果引入語音活動檢測,不但可以節省位元率與降低傳輸頻寬,更可減少編碼語音所需要的計算量。本論文採用了五種語音特性參數來建立判斷音框是否為活動語音或非活動語音的六項決策規則,分別為能量、越零率、聲道截面積參數和、音高週期、及頻譜失真。本論文所提出的非均勻加權決策規則的VAD判斷方法(NWDR-VAD)並對這六項決策規則給予不同的最佳加權權重,加總後再和最終門限值作比較。在與GSM HR編碼器的VAD方法(GSM HR-VAD)及G.729編碼器的VAD(G.729-VAD)方法比較後,實驗結果顯示NWDR-VAD誤判音框的錯誤率為最小。在訓練語料中誤判音框的錯誤率NWDR-VAD為12.544 %,低於GSM HR-VAD的21.298 %及G.729-VAD的17.204 %。而在測試語料中誤判音框的錯誤率NWDR-VAD為14.971 %,亦低於GSM HR-VAD的23.033 %及G.729-VAD的18.332 %。 |
英文摘要 | Voice activity detection (VAD) is usually involved in the preprocessor of speech encoders in order to determine whether the incoming signal is speech or not. If it is true, a normal speech codec is used to encode the speech segments. Otherwise, we can exploit much fewer speech parameters to mimic the background noise at the decoder. According to the statistics about people’s talking, above 40 % even as higher as 60 % time slice is silence between talk spurts, so lots of bit rates and bandwidth can be saved. In this paper, there are five speech parameters used to classify the incoming signal segments into active voice (speech like segments) and inactive voice (non-speech like segments) including the segmental energy, the zero crossing rate, the sum of the vocal tract areas, the pitch period, and the spectral distortion. The final VAD decision is based on the nonuniform weighted summation of six decision rules. Therefore, this VAD decision approach is called nonuniform weighted decision rules VAD method (NWDR-VAD). The proposed VAD method (NWDR-VAD) is studied and compared with those of the famous half-rate GSM (GSM HR-VAD) and G.729 (G.729-VAD) speech coders. The experiments show that NWDR-VAD performs better than both GSM HR-VAD and G.729-VAD. For the training data, the performance in terms of VAD error rate is 12.544 % for NWDR-VAD, and is 21.298 % and 17.204 % for GSM HR-VAD and G.729-VAD, respectively. For outside test, the VAD error rates are 14.971 %, 23.033 % and 18.332 % for NWDR-VAD, GSM HR-VAD and G.729-VAD, respectively. |
本系統中英文摘要資訊取自各篇刊載內容。