摘要
情感识别技术是智能人机交互的重要基础,它涉及计算机科学、语言学、心理学等多个研究领域,是模式识别和图像处理领域的研究热点。鉴于此,基于Boosting框架提出两种有效的视觉语音多模态融合情感识别方法:第一种方法将耦合HMM(coupled HMM)作为音频流和视频流的模型层融合技术,使用改进的期望最大化算法对其进行训练,着重学习难于识别的(即含有更多信息的)样本,并将Ada Boost框架应用于耦合HMM的训练过程,从而得到Ada Boost-CHMM总体分类器;第二种方法构建了多层Boosted HMM(MBHMM)分类器,将脸部表情、肩部运动和语音三种模态的数据流分别应用于分类器的某一层,当前层的总体分类器在训练时会聚焦于前一层总体分类器难于识别的样本,充分利用各模态特征数据间的互补特性。实验结果验证了两种方法的有效性。
As the important basis of intelligent human-computer interaction,the emotion recognition technology relates to the computer science,linguistics,psychology and other research fields,and is a research hotspot in pattern recognition and image processing fields. Based on the Boosting framework,two effective multi-modal emotion recognition methods fusing vision and speech are proposed. In the first method,the coupled hidden Markov model(HMM) is taken as the model-layer fusion technology of audio and video streams,and the improved expectation maximization algorithm is used to train it,and pay attention to the learning of the samples which are difficult to recognize emphatically; the Ada Boost framework is applied to the training process of HMM coupling to get the Ada Boost-CHMM general classifier. In the second method,the multi-layer Boosted HMM(MBHMM)classifier is constructed,and the data streams with the modals of facial expression,shoulder movement and speech are respectively applied to the classifier of a certain layer. The current layer′s overall classifier while training will focus on the sample which is difficultly recognized by the overall classifier in the former layer. The MBHMM classifier makes full use of the complementary characteristic of the feature data in each mode. The validity of the two methods was verified with experimental results.
出处
《现代电子技术》
北大核心
2017年第23期59-63,共5页
Modern Electronics Technique
基金
四川省软件工程专业卓越工程师质量工程项目支持(11100-14Z00327)