摘要
文章抓住人类语音感知多模型的特点,尝试建立一个在噪音环境下的基于音频和视频复合特征的连续语音识别系统。在视频特征提取方面,引入了一种基于特征口形的提取方法。识别实验证明,这种视频特征提取方法比传统DCT、DWT方法能够带来更高的识别率;基于特征口形的音频-视频混合连续语音识别系统具有很好的抗噪性。
Considering that human speech perception is inherently a multi-modal process,the paper tries to develop a continuous speech recognition system based on audio-visual fusion,which is used in noisy environments.In the visual feature extraction,an eigen mouth based method is introduced.Experimental results show that the feature extraction method presented in this paper outperforms the traditional methods,such as DCT and DWT.The experiments also show that the audio-visual continuous speech recognition system is robust in noisy environments.
出处
《计算机工程与应用》
CSCD
北大核心
2003年第16期3-5,35,共4页
Computer Engineering and Applications
基金
中国科技部与比利时弗拉芒大区的国际科技合作项目"现实世界的机器视觉与语音技术"的支持(编号:国科外字19990209号)