摘要
研究了双模型语音识别系统中前合成和后合成两种听觉视觉合成方法 ;同时在后合成方法中引入了考虑听觉和视觉同步异步特点的复合模型。仿真实验证明了在声学噪音环境下 ,后合成方法能够带来比较理想的识别效果 ;考虑听觉和视觉同步异步性的模型可以有效地提高识别率。
Researchers have become increasingly interested in raising speech recognition rate under noisy acoustic environments. As human speech recognition rate is much higher than machine speech recognition rate because human's aural sensing is aided by visual sensing with a certain degree of asynchrony naturally existing between the two sensings, researchers, including us, have been studying audio visual fusion and model asynchrony for increasing speech recognition rate. This paper offers some progress in this research area. Section 1 uses Fig.1 to discuss two fusion methods of audio and visual sensors: early integration and late integration. Section 2 introduces 4 types of model topologies in Fig.2 to reflect the bimodal (audio visual) fusion of human speech perceptions. Section 2 also simulates asynchrony with Product HMMs (Hidden Markov Models). Fig.2(d) presents a simplified topology of Product HMM, which adopts stream state tying scheme to get robust parameter estimations while restricting the asynchrony to only a state of phoneme HMM. Section 3 gives detailed speech recognition experiments in various simulated SNR (Signal Noise Ratio) conditions based on the AVTC (Audio Visual Telephone Conversations) corpus for 6 systems including early and late integration systems. Experimental results as shown in Fig.3, in terms of recognition rate, indicate that late integration can bring better recognition performance than early integration does in noisy acoustic conditions. Fig.3 also shows that state asynchronous system outperforms state synchronous system when SNR>12dB, and state synchronous system outperforms state asynchronous system when SNR<12dB. We believe that our findings are of some help in raising the speech recognition rates under noisy acoustic environments.
出处
《西北工业大学学报》
EI
CAS
CSCD
北大核心
2004年第2期171-175,共5页
Journal of Northwestern Polytechnical University
基金
中国科技部与比利时弗拉芒大区科技合作项目 (国科外字 19990 2 0 9)