Since the concept of“Big Data”was first introduced in Nature in 2008,it has been widely applied in fields,such as business,healthcare,national defense,education,transportation,and security.With the maturity of artif...Since the concept of“Big Data”was first introduced in Nature in 2008,it has been widely applied in fields,such as business,healthcare,national defense,education,transportation,and security.With the maturity of artificial intelligence technology,big data analysis techniques tailored to various fields have made significant progress,but still face many challenges in terms of data quality,algorithms,and computing power.展开更多
In multimodal human computer dialog,non-verbal channels,such as facial expression,posture,gesture,etc,combined with spoken information,are also important in the procedure of dialogue.Nowadays,in spite of high performa...In multimodal human computer dialog,non-verbal channels,such as facial expression,posture,gesture,etc,combined with spoken information,are also important in the procedure of dialogue.Nowadays,in spite of high performance of users*single channel behavior computing,it is still great challenge to understand users1 intention accurately from their multimodal behaviors.One reason for this challenge is that we still need to improve multimodal information fusion in theories,methodologies and practical systems.This paper presents a review of data fusion methods in multimodal human computer dialog.We first introduce the cognitive assumption of single channel processing,and then discuss its implementation methods in human computer dialog;for the task of multi-modal information fusion,serval computing models are presented after we introduce the principle description of multiple data fusion.Finally,some practical examples of multimodal information fusion methods are introduced and the possible and important breakthroughs of the data fusion methods in future multimodal human-computer interaction applications are discussed.展开更多
As a major component of speech signal processing, speech emotion recognition has become increasingly essential to understanding human communication. Benefitting from deep learning, many researchers have proposed vario...As a major component of speech signal processing, speech emotion recognition has become increasingly essential to understanding human communication. Benefitting from deep learning, many researchers have proposed various unsupervised models to extract effective emotional features and supervised models to train emotion recognition systems. In this paper, we utilize semi-supervised ladder networks for speech emotion recognition. The model is trained by minimizing the supervised loss and auxiliary unsupervised cost function. The addition of the unsupervised auxiliary task provides powerful discriminative representations of the input features, and is also regarded as the regularization of the emotional supervised task. We also compare the ladder network with other classical autoencoder structures. The experiments were conducted on the interactive emotional dyadic motion capture (IEMOCAP) database, and the results reveal that the proposed methods achieve superior performance with a small number of labelled data and achieves better performance than other methods.展开更多
Facial emotion recognition is an essential and important aspect of the field of human-machine interaction.Past research on facial emotion recognition focuses on the laboratory environment.However,it faces many challen...Facial emotion recognition is an essential and important aspect of the field of human-machine interaction.Past research on facial emotion recognition focuses on the laboratory environment.However,it faces many challenges in real-world conditions,i.e.,illumination changes,large pose variations and partial or full occlusions.Those challenges lead to different face areas with different degrees of sharpness and completeness.Inspired by this fact,we focus on the authenticity of predictions generated by different<emotion,region>pairs.For example,if only the mouth areas are available and the emotion classifier predicts happiness,then there is a question of how to judge the authenticity of predictions.This problem can be converted into the contribution of different face areas to different emotions.In this paper,we divide the whole face into six areas:nose areas,mouth areas,eyes areas,nose to mouth areas,nose to eyes areas and mouth to eyes areas.To obtain more convincing results,our experiments are conducted on three different databases:facial expression recognition+(FER+),real-world affective faces database(RAF-DB)and expression in-the-wild(ExpW)dataset.Through analysis of the classification accuracy,the confusion matrix and the class activation map(CAM),we can establish convincing results.To sum up,the contributions of this paper lie in two areas:1)We visualize concerned areas of human faces in emotion recognition;2)We analyze the contribution of different face areas to different emotions in real-world conditions through experimental analysis.Our findings can be combined with findings in psychology to promote the understanding of emotional expressions.展开更多
Correction to:Semi-supervised Ladder Networks for Speech Emotion Recognition DOI:10.1007/sll633-019-1175-x Authors:Jian-Hua Tao,Jian Huang,Ya Li,Zheng Li-an,Ming-Yue Niu The article Semi-supervised Ladder Networks for...Correction to:Semi-supervised Ladder Networks for Speech Emotion Recognition DOI:10.1007/sll633-019-1175-x Authors:Jian-Hua Tao,Jian Huang,Ya Li,Zheng Li-an,Ming-Yue Niu The article Semi-supervised Ladder Networks for Speech Emotion Recognition written by Jian-Hua Tao,Jian Huang,Ya Li,Zheng Lian and Ming-Yue Niu,was originally published on vol.16,no.4 of International Journal of Automation and Computing without Open Access.展开更多
文摘Since the concept of“Big Data”was first introduced in Nature in 2008,it has been widely applied in fields,such as business,healthcare,national defense,education,transportation,and security.With the maturity of artificial intelligence technology,big data analysis techniques tailored to various fields have made significant progress,but still face many challenges in terms of data quality,algorithms,and computing power.
基金the National Natural Science Foundation of China(61873269,61425017,61332017,61831022)the National Key Research&Development Plan of China(2017YFB1002804).
文摘In multimodal human computer dialog,non-verbal channels,such as facial expression,posture,gesture,etc,combined with spoken information,are also important in the procedure of dialogue.Nowadays,in spite of high performance of users*single channel behavior computing,it is still great challenge to understand users1 intention accurately from their multimodal behaviors.One reason for this challenge is that we still need to improve multimodal information fusion in theories,methodologies and practical systems.This paper presents a review of data fusion methods in multimodal human computer dialog.We first introduce the cognitive assumption of single channel processing,and then discuss its implementation methods in human computer dialog;for the task of multi-modal information fusion,serval computing models are presented after we introduce the principle description of multiple data fusion.Finally,some practical examples of multimodal information fusion methods are introduced and the possible and important breakthroughs of the data fusion methods in future multimodal human-computer interaction applications are discussed.
基金supported by National Natural Science Foundation of China(Nos.61425017 and 61773379)the National Key Research&Development Plan of China(No.2017YFB1002804)
文摘As a major component of speech signal processing, speech emotion recognition has become increasingly essential to understanding human communication. Benefitting from deep learning, many researchers have proposed various unsupervised models to extract effective emotional features and supervised models to train emotion recognition systems. In this paper, we utilize semi-supervised ladder networks for speech emotion recognition. The model is trained by minimizing the supervised loss and auxiliary unsupervised cost function. The addition of the unsupervised auxiliary task provides powerful discriminative representations of the input features, and is also regarded as the regularization of the emotional supervised task. We also compare the ladder network with other classical autoencoder structures. The experiments were conducted on the interactive emotional dyadic motion capture (IEMOCAP) database, and the results reveal that the proposed methods achieve superior performance with a small number of labelled data and achieves better performance than other methods.
基金supported by the National Key Research & Development Plan of China (No. 2017YFB1002804)National Natural Science Foundation of China (Nos. 61425017, 61773379, 61332017, 61603390 and 61771472)the Major Program for the 325 National Social Science Fund of China (No. 13&ZD189)
文摘Facial emotion recognition is an essential and important aspect of the field of human-machine interaction.Past research on facial emotion recognition focuses on the laboratory environment.However,it faces many challenges in real-world conditions,i.e.,illumination changes,large pose variations and partial or full occlusions.Those challenges lead to different face areas with different degrees of sharpness and completeness.Inspired by this fact,we focus on the authenticity of predictions generated by different<emotion,region>pairs.For example,if only the mouth areas are available and the emotion classifier predicts happiness,then there is a question of how to judge the authenticity of predictions.This problem can be converted into the contribution of different face areas to different emotions.In this paper,we divide the whole face into six areas:nose areas,mouth areas,eyes areas,nose to mouth areas,nose to eyes areas and mouth to eyes areas.To obtain more convincing results,our experiments are conducted on three different databases:facial expression recognition+(FER+),real-world affective faces database(RAF-DB)and expression in-the-wild(ExpW)dataset.Through analysis of the classification accuracy,the confusion matrix and the class activation map(CAM),we can establish convincing results.To sum up,the contributions of this paper lie in two areas:1)We visualize concerned areas of human faces in emotion recognition;2)We analyze the contribution of different face areas to different emotions in real-world conditions through experimental analysis.Our findings can be combined with findings in psychology to promote the understanding of emotional expressions.
文摘Correction to:Semi-supervised Ladder Networks for Speech Emotion Recognition DOI:10.1007/sll633-019-1175-x Authors:Jian-Hua Tao,Jian Huang,Ya Li,Zheng Li-an,Ming-Yue Niu The article Semi-supervised Ladder Networks for Speech Emotion Recognition written by Jian-Hua Tao,Jian Huang,Ya Li,Zheng Lian and Ming-Yue Niu,was originally published on vol.16,no.4 of International Journal of Automation and Computing without Open Access.