摘要
提出了一种用于听视觉语音识别的基于 MASM的口形轮廓提取方法 ,这种方法只需要少量的训练数据就可以实现对大量口形轮廓的准确提取。还引入了一种口形轮廓的平滑修正方法 ,该方法利用口形连续变化的特点 ,对错误轮廓进行修正。实验证明 ,利用该方法提取轮廓的准确率比常规 ASM模型高出 2 0个百分点 ;将该口形轮廓特征引入到听视觉语音识别中 。
In audio visual speech recognition and lipreading, the widely used ASM (Active Shape Model) for lip contour extraction suffers from the lack of robustness and cannot extract the exact lip contours due to the various mouth shape changes when uttering. We present a more robust model——Multiple Active Shape Model (MASM). The model classifies the mouth shapes into closed mouth set, half-opened mouth set, and round mouth set. An independent ASM is built for each different set with a tiny set of the training data. The MASM contour extraction algorithm automatically selects the best accurate lip contour from multiple shape searching procedures. Considering the consecutive changes of the mouth, a method for smoothing lip contours is also presented to correct the contour extraction errors. Experimental results from AVCONDIG database show that extraction accuracy achieved by the MASM is 13% higher than that of conventional ASM. The combination of the MASM and the contour-smoothing method leads to another 7% accuracy improvement. With the fusion of the exact lip contour feature and audio MFCC (Mel Frequency Cepstral Coefficients) feature, the average word recognition rates of the considered connected-digits speech recognition task are considerably increased under noisy acoustic conditions.
出处
《西北工业大学学报》
EI
CAS
CSCD
北大核心
2004年第5期674-678,共5页
Journal of Northwestern Polytechnical University
基金
中国科技部与比利时弗拉芒大区国际科技合作项目 (国科外 19990 2 0 9号 )资助
关键词
语音识别
听视觉语音识别
ASM
MASM
口形轮廓提取
speech recognition, audio visual speech recognition, ASM(Active Shape Model), MASM(Multiple Active Shape Model), lip contour extraction