摘要
基于视频序列的面部表情识别问题主要有两个特点:空时性和显著性。近年来,许多研究人员利用卷积神经网络、循环神经网络、三维卷积神经网络等深度学习方法处理该问题的空时特性。但是,面部表情的显著性问题却往往被忽视。随着注意力机制在深度学习网络中的应用发展,其能够有效地解决各类任务中的显著性问题。该文将空时注意力机制应用到面部表情识别中,使得深度网络更多地关注空时特征中的显著性。具体地,该文将空间注意力模块嵌入到卷积网络中,以使空域特征更加关注对表情识别重要的区域,将时间注意力模块嵌入到门控循环单元(gated recurrent units,GRU)后,使得时域特征更加关注信息丰富的视频帧。在RECOLA情感数据库上的实验表明,与一般的深度模型相比,该文的深度空时注意力网络显著提高了面部表情识别的性能。
Facial expression recognition(FER)based on video sequences has two main characteristics:spatio-temporal and significance.Of late,many researchers combined convolutional neural networks(CNNs),recurrent neural networks(RNNs)and 3D CNN to address the spatio-temporal characteristics.However,few works focus on the salient features of this issue.Meanwhile,with the development of the attention mechanism for deep learning,its effectiveness in the salient problem has attracted the interest of researchers.In this paper,we introduce the attention mechanism into FER,by which our deep network pays more attention to the salient extraction of spatial-temporal features.Specifically,a spatial attention module is inserted into the CNN networks to make the spatial feature extraction more objectively.A temporal attention module is inserted into the output of the gated recurrent units(GRU)at each step of a sequence,so that the temporal features pay more attention to the informative frames.We validate our approach on the RECOLA emotion database.A comparison of the results with attention and without attention shows that our deep attention network improves the performance compared to the general deep model.
作者
冯晓毅
黄东
崔少星
王坤伟
FENG Xiaoyi;HUANG Dong;CUI Shaoxing;WANG Kunwei(School of Electronics and Information,Northwestern Polytechnical University,Xi′an 710072,China;School of Automation and Information Engineering,Xi′an University of Technology,Xi′an 710048,China)
出处
《西北大学学报(自然科学版)》
CAS
CSCD
北大核心
2020年第3期319-327,共9页
Journal of Northwest University(Natural Science Edition)
基金
国家自然科学基金资助项目(61702419)
陕西省科技计划资助项目(2020GY-050,2018ZDXM-GY-186)
陕西省自然科学基础研究计划资助项目(2018JQ6090)。
关键词
深度学习
空时方法
注意力机制
面部表情识别
deep learning
spatial-temporal method
attention mechanism
facial expression recognition