摘要
真实的人机交互场景中,人的动态行为(转头、行走等)以及不稳定的光源,会导致面部细节特征无法有效提取,从而降低面部表情识别的准确率。针对该问题,提出了一种结合显著特征筛选和视觉转化器(ViT)的优化模型。采用加权求和光照归一化方法对原图进行亮度平衡,并利用卷积神经网络提取面部特征;使用显著特征筛选模块聚合面部局部-全局上下文信息;应用多层Transformer编码器来加强面部特征之间的关联性;最后采用Softmax函数对面部表情结果进行预测。实验结果表明,该网络模型在RAF-DB、FERPlus和AffectNet数据集上取得了良好的性能。
In the real human-computer interaction scene,human's dynamic behaviors(turning,walking,etc.)and unstable light sources lead to the ineffective extraction of facial detail features,thereby reducing the accuracy of facial expression recognition.In view of the problem,an optimization model combining distinguishing feature filtering and vision transformer(ViT)is proposed.Weighted sum illumination normalization is used to balance the brightness of the original image,and convolutional neural network is used to extract facial features.Improved feature attention module algorithm is used to aggregate facial local-global context information.Multi-layer Transformer encoder is used to enhance the associations between features.Finally,Softmax function is used to predict the facial expression results.The results show that the network model achieves good performance on RAF-DB,FERPlus and AffectNet datasets.
作者
封红旗
黄伟铠
张登辉
FENG Hongqi;HUANG Weikai;ZHANG Denghui(School of Computer Science and Artificial Intelligence,Changzhou University,Changzhou,Jiangsu 213100,China;College of Information Technology,Zhejiang Shuren University,Hangzhou 310000,China)
出处
《计算机工程与应用》
CSCD
北大核心
2023年第22期136-143,共8页
Computer Engineering and Applications
基金
浙江省公益技术研究计划(LGF21F020024)
浙江树人学院青年学术团队项目。