Aiming at the problems of short duration,low intensity,and difficult detection of micro-expressions(MEs),the global and local features of ME video frames are extracted by combining spatial feature extraction and tempo...Aiming at the problems of short duration,low intensity,and difficult detection of micro-expressions(MEs),the global and local features of ME video frames are extracted by combining spatial feature extraction and temporal feature extraction.Based on traditional convolution neural network(CNN)and long short-term memory(LSTM),a recognition method combining global identification attention network(GIA),block identification attention network(BIA)and bi-directional long short-term memory(Bi-LSTM)is proposed.In the BIA,the ME video frame will be cropped,and the training will be carried out by cropping into 24 identification blocks(IBs),10 IBs and uncropped IBs.To alleviate the overfitting problem in training,we first extract the basic features of the preprocessed sequence through the transfer learning layer,and then extract the global and local spatial features of the output data through the GIA layer and the BIA layer,respectively.In the BIA layer,the input data will be cropped into local feature vectors with attention weights to extract the local features of the ME frames;in the GIA layer,the global features of the ME frames will be extracted.Finally,after fusing the global and local feature vectors,the ME time-series information is extracted by Bi-LSTM.The experimental results show that using IBs can significantly improve the model’s ability to extract subtle facial features,and the model works best when 10 IBs are used.展开更多
基金supported by the National Natural Science Foundation of Hunan Province,China(Grant Nos.2021JJ50058,2022JJ50051)the Open Platform Innovation Foundation of Hunan Provincial Education Department(Grant No.20K046)The Scientific Research Fund of Hunan Provincial Education Department,China(Grant Nos.21A0350,21C0439,19A133).
文摘Aiming at the problems of short duration,low intensity,and difficult detection of micro-expressions(MEs),the global and local features of ME video frames are extracted by combining spatial feature extraction and temporal feature extraction.Based on traditional convolution neural network(CNN)and long short-term memory(LSTM),a recognition method combining global identification attention network(GIA),block identification attention network(BIA)and bi-directional long short-term memory(Bi-LSTM)is proposed.In the BIA,the ME video frame will be cropped,and the training will be carried out by cropping into 24 identification blocks(IBs),10 IBs and uncropped IBs.To alleviate the overfitting problem in training,we first extract the basic features of the preprocessed sequence through the transfer learning layer,and then extract the global and local spatial features of the output data through the GIA layer and the BIA layer,respectively.In the BIA layer,the input data will be cropped into local feature vectors with attention weights to extract the local features of the ME frames;in the GIA layer,the global features of the ME frames will be extracted.Finally,after fusing the global and local feature vectors,the ME time-series information is extracted by Bi-LSTM.The experimental results show that using IBs can significantly improve the model’s ability to extract subtle facial features,and the model works best when 10 IBs are used.