摘要
目的 微表情是人在外界信息和刺激下做出的无意识面部动作,是判断受试人情绪和行为的重要佐证,在社会安全、商业谈判和心理辅导等领域都有着广泛的应用。微表情不同于一般的表情,分类与定位较为困难。针对这种情况,提出了一种基于光流窗口的双分支微表情定位网络(dual-branch optical flow spotting network,DFSN)和一种利用峰值帧光流信息的微表情分类网络,以识别视频中的微表情。方法 在定位任务中,首先提取面部图像,选择光流窗口大小和位置,计算面部光流并进行预处理;接下来输入双分支网络中进行两次分类,分别针对有无微表情和在有微表情前提下微表情所处阶段分类,并结合两个损失函数抑制过拟合;最后绘制出微表情强度曲线,曲线峰值所处位置即为所求微表情峰值帧。在分类任务中,选取视频起始帧和定位网络取得的峰值帧作为光流窗口,并利用欧拉运动放大算法(Eulerian motion magnification,EMM)放大微表情,最后采用峰值帧光流信息分类微表情视频。结果 微表情定位网络分别在CASME Ⅱ(Chinese Academy of Sciences Micro-expression Database Ⅱ)数据集和CASME数据集上按照使用留一被试交叉验证法进行了实验,与目前最好的定位方法比较,此网络在CASME Ⅱ上获得了最低的NMAE(normalized mean absolute error)值0.101 7,比Optical flow+UPC方法提高了9%。在CASME上获得的NMAE值为0.137 8,在此数据集上为次优定位方法。在定位网络得到的峰值基础上,分类网络在CASME Ⅱ上取得了89.79%的准确率,在CASME上取得了66.06%的准确率。若采用数据集标注的峰值,分类网络在CASME Ⅱ上取得了91.83%的准确率,在CASME上取得了76.96%的准确率。结论 提出的微表情定位网络可以有效定位视频中微表情峰值帧的位置,帮助后续网络进行分类,微表情分类网络可以有效区分不同种类的微表情视频。
Objective Micro-expressions are unconscious facial actions made by people under external information and stimulation.These expressions are crucial proofs to judge people’s emotions and thoughts.Micro-expressions are widely used in the fields of social security,business negotiation,and psychological counseling.This type of expression is different from the general macro-expression and demonstrates characteristics of short duration, low expression intensity, and fastchange speed. Therefore, compared with macro-expressions, micro-expressions are more difficult to recognize and locate.Before the emergence of deep learning, researchers mostly used the traditional hand-crafted method, which utilizes the arti⁃ficially designed micro-expression extractors and complex parameter adjustment processes and algorithms to extract fea⁃tures. Some excellent algorithms can achieve competitive results, such as local binary pattern-three orthogonal plane andmain directional mean optical flow (MDMO). However, these algorithms mostly only extract shallow features, and improv⁃ing their accuracy is difficult. With the development of machine learning in the field of computer vision, the researchmethod of micro-expression based on deep learning has immediately become the mainstream. This method generally usesconvolutional neural network to extract and classify the image or video features. The accuracy of micro-expression identifi⁃cation is markedly improved due to its powerful feature extraction and learning capability. However, the spotting and classi⁃fication of micro-expressions are still difficult tasks due to the subtle characteristics of micro-expressions and the difficultyof extracting effective features. Therefore, this paper proposes a dual-branch optical flow spotting network based on opticalflow window, which can promote the solution of these problems. Method First, the size of the optical flow window isselected in accordance with the number of video frames, and three frames at both ends of the window are taken to stabilizethe optical flow intensity. Dlib library is used to detect faces, and Farneback method is used to extract facial optical flowfeatures and preprocess the optical flow image. The image size is finally converted into 224 × 224 pixels. The dual-branchnetwork is then inputted for two classifications to address the presence or absence of micro-expression and the rising or fall⁃ing state of micro-expression. The twice classification should be judged in accordance with the same characteristics. There⁃fore, the same network backbone is used, and then the branches are utilized to process the characteristics, thereby focus⁃ing on different directions. Combining two loss functions can suppress the overfitting of the network, complete classifica⁃tion, and improve the network performance. Finally, the micro-expression state in the video window is obtained by slidingthe window, and the intensity curve is drawn. Multiple windows are selected for positioning due to the different durations ofmicro-expression, and the highest point among them is taken as the apex frame. The classification network is different fromthe location network in two aspects. First, the front end of the window is the second to the fourth frame of the video and theback end uses the micro-expression part of the video. Second, Euler motion magnification is used to process video. Thismethod can amplify facial motion and improve expression intensity but will destroy some optical flow features;thus, themethod is not used in the positioning network. When classifying videos, the apex frame of the positioning network is takenas the center, and the five surrounding positions are selected as the input of the classification network. The classificationnetwork uses the uncomplicated network structure and obtains good results, proving the importance of apex frame spotting.Result The micro-expression spotting network is based on leave-one-subject-out cross-validation method on the ChineseAcademy of Sciences Micro-expression Database II (CASME II) and the Chinese Academy of Sciences Micro-expressionDatabase (CASME), which is the most commonly used validation method in the current micro-expression identificationresearch. Compared with the current best spotting method, the lowest normalized mean absolute error (NMAE) value of0. 101 7 is obtained on the CASME II, which is 9% lower than the current best spotting method. The NMAE value obtainedon the CASME is 0. 137 8, which is currently the second lowest number. Using this micro-expression spotting network, theclassification network achieved 89. 79% accuracy of three categories (positive, negative, and surprise) in the microexpression classification experiment of CASME II and 66. 06% accuracy of four categories (disgust, tense, repression, andsurprise) in the micro-expression classification experiment of CASME. Using the apex frame in dataset, the classificationnetwork achieved 91. 83% and 76. 96% accuracy on CASME II and CASME, respectively. Conclusion The proposedmicro-expression spotting network can effectively locate the position of the apex frame in the video and then extract its effec⁃tive micro-expression information. Extensive experimental evaluation proved that the spotting network has good spottingeffect. The subsequent classification network shows that the extraction of effective micro-expression information such as anapex frame can significantly help the network in classifying micro-expressions. Overall, the proposed micro-expressionspotting network can substantially improve the accuracy of micro-expression recognition.
作者
李博凯
吴从中
项柏杨
臧怀娟
任永生
詹曙
Li Bokai;Wu Congzhong;Xiang Baiyang;Zang Huaijuan;Ren Yongsheng;Zhan Shu(Institute of Artificial Intelligence,Hefei Comprehensive National Science Center,Hefei 230601,China;School of Computer and Information,Hefei University of Technology,Hefei 230601,China;School of Metallurgy and Energy Engineering,Kunming University of Science and Technology,Kunming 650093,China)
出处
《中国图象图形学报》
CSCD
北大核心
2024年第5期1447-1459,共13页
Journal of Image and Graphics
基金
国家自然科学基金项目(52104303)
安徽省教育厅安徽高校协同创新项目(GXXT-2022-041)。
关键词
微表情定位
情感计算
峰值帧
微表情分类
图像识别
深度学习
micro-expression spotting
affective computing
apex frame
micro-expression classification
image recognition
deep learning