期刊文献+

采用时空注意力机制的人脸微表情识别 被引量:9

Spatiotemporal attention network for microexpression recognition
原文传递
导出
摘要 目的微表情是人自发产生的一种面部肌肉运动,可以展现人试图掩盖的真实情绪,在安防、嫌疑人审问和心理学测试等有潜在的应用。为缓解微表情面部肌肉变化幅度小、持续时间短所带来的识别准确率低的问题,本文提出了一种用于识别微表情的时空注意力网络(spatiotemporal attention network,STANet)。方法STANet包含一个空间注意力模块和一个时间注意力模块。首先,利用空间注意力模块使模型的注意力集中在产生微表情强度更大的区域,再利用时间注意力模块对微表情变化更大因而判别性更强的帧给予更大的权重。结果在3个公开微表情数据集(The Chinese Academy of Sciences microexpression,CASME;CASME II;spontaneous microexpression database-high speed camera,SMIC-HS)上,使用留一交叉验证与其他8个算法进行了对比实验。实验结果表明,STANet在CASME数据集上的分类准确率相比于性能第2的模型Sparse MDMO(sparse main directional mean optical flow)提高了1.78%;在CASME II数据集上,分类准确率相比于性能第2的模型HIGO(histogram of image gradient orientation)提高了1.90%;在SMIC-HS数据集上,分类准确率达到了68.90%。结论针对微表情肌肉幅度小、产生区域小、持续时间短的特点,本文将注意力机制用于微表情识别任务中,提出了STANet模型,使得模型将注意力集中于产生微表情幅度更大的区域和相邻帧之间变化更大的片段。 Objective Microexpression,a kind of spontaneous facial muscle movement,can conceal the real underlying emotions of people.Microexpression has potential applications in security,police interrogation,and psychological testing.Compared with macroexpression,the lower intensity and shorter duration of microexpressions increase the difficulty in recognition.Traditional methods can be divided into facial image-and optical flow-based approaches.Facial image-based methods utilize spatiotemporal partition blocks to construct feature vectors wherein spatiotemporal segmentation parameters are regarded as hyperparameters.Each sample of the dataset uses the same hyperparameters.The performance of microexpression recognition may suffer when using the same spatiotemporal division blocks for different samples,which may require varying spatiotemporal segmentation blocks.Optical flow-based methods are widely used for microexpression recognition.Although such methods demonstrate satisfactory robustness in the variation of illumination,facial features in different regions are considered equally important but ignore the appearance of microexpression in partial regions.Attention mechanism,which has been introduced in many fields,such as natural language processing and computer vision,can focus on salient regions of the object and give additional weights to these regions.We apply the attention mechanism to the microexpression recognition task and propose a spatiotemporal attention network(STANet)due to its outstanding performance in recognition tasks.Method STANet mainly consists of spatial spatial attention module(SAM)and temporal temporal attention module(TAM)attention modules.SAM is used to focus on microexpression regions with high intensity while TAM is incorporated to learn discriminative frames,which are given additional weights.Inspired by the fully convolutional network(FCN),which was proposed in semantic segmentation,we propose a spatial attention branch(SAB)in the SAM.SAB,a top-down and bottom-up structure,is a crucial component of SAM.Convolutional layers and nonlinear transformation are used to extract salient features of the microexpression in the downsampling process,followed by maximum pooling.The maximum pooling operation is utilized to reduce the resolution and increase the receptive field of the feature map.We use bilinear interpolation in the upsampling process to recover the feature map to its original size gradually and adopt skip connections to retain detailed information,which may be lost in the upsampling process.Sigmoid function is ultimately adopted after the last layer of the feature map to normalize the SAB output to[0,1].Furthermore,we propose a temporal attention branch(TAB)to focus on the additional discriminative frames in the microexpression sequence,which are crucial in microexpression recognition.Experiments are conducted using the Chinese Academy of Sciences microexpression(CASME),the Chinese Academy of Sciences microexpression II(CASME II),and spontaneous microexpression database-high speed camera(SMIC-HS)datasets with 171,246 and 164 samples,respectively.Corner crop and rescaling augmentations are used in CASME and CASME II to avoid overfitting.Scaling factors are set to 0.9,1.0 and 1.1.Corner crop and horizontal flip augmentations are applied in the SMIC-HS dataset.Linear interpolation is used to interpolate samples into 20 frames because various samples have different numbers of frames.Samples are then resized to 192×192 pixels.Finally,we use Flow Net 2.0 to obtain the optical flow sequence of each frame.Experimental settings use the Adam optimizer with a learning rate of 1 E-5.Weight decay coefficient is set to 1 E-4 and the regularization term of coefficientλof1 is set to 1 E-8.The number of iterations is 60,30 and 100 for CASME,CASME II and SMIC-HS,respectively.Result We compared our model with eight state-of-the-art frameworks,including facial image-and optical flow-based methods using three public microexpression datasets,namely,CASME,CASME II and SMIC-HS.Leave-one-subject-out(LOSO)cross validation is used due to insufficient samples.We utilize classification accuracy to measure the performance of methods.The results showed that our model achieves the best performance with CASME and CASME II datasets.Our model’s classification accuracy rate in the CASME dataset is 1.78%higher than the Sparse MDMO,which ranks second.The classification accuracy rate of STANet in the CASME II dataset is 1.90%higher than histogram of image gradient orientation(HIGO).The classification accuracy rate of our model in the SMIC-HS dataset is 68.90%.Ablation studies are also performed using the CASME dataset.The results verified the validity of the SAM and TAM,and the fusion algorithm can significantly improve the recognition accuracy.Conclusion STANet is proposed in this study for microexpression recognition.SAM emphasizes salient regions of the microexpression by placing additional weights on these regions.Additionally,TAM can learn large weights for clips with high variation in sequences.Experiments performed using the three public microexpression datasets illustrated that STANet achieves the highest recognition accuracy rate on CASME and CASME II datasets compared with eight other state-of-the-art methods and demonstrates satisfactory prediction performance using the SMIC-HS dataset.
作者 李国豪 袁一帆 贲晛烨 张军平 Li Guohao;Yuan Yifan;Ben Xianye;Zhang Junping(School of Computer Science,Fudan University,Shanghai Key Laboratory of Intelligent Information Processing,Shanghai 200438,China;School of Information Science and Engineering,Shandong University,Qingdao 266237,China)
出处 《中国图象图形学报》 CSCD 北大核心 2020年第11期2380-2390,共11页 Journal of Image and Graphics
基金 国家自然科学基金项目(61673118,61971468) 上海市科技重大专项资助项目(2018SHZDZX01) 张江实验室资助项目。
关键词 微表情识别 分类 面部特征 深度学习 注意力模型 时空注意力 microexpression recognition classification facial feature deep learning attention mechanism spatiotemporal attention
  • 相关文献

同被引文献72

引证文献9

二级引证文献20

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部