摘要
Facial expression recognition(FER)is still challenging due to the small interclass discrepancy in facial expression data.In view of the significance of facial crucial regions for FER,many existing studies utilize the prior information from some annotated crucial points to improve the performance of FER.However,it is complicated and time-consuming to manually annotate facial crucial points,especially for vast wild expression images.Based on this,a local non-local joint network is proposed to adaptively enhance the facial crucial regions in feature learning of FER in this paper.In the proposed method,two parts are constructed based on facial local and non-local information,where an ensemble of multiple local networks is proposed to extract local features corresponding to multiple facial local regions and a non-local attention network is addressed to explore the significance of each local region.In particular,the attention weights obtained by the non-local network are fed into the local part to achieve interactive feedback between the facial global and local information.Interestingly,the non-local weights corresponding to local regions are gradually updated and higher weights are given to more crucial regions.Moreover,U-Net is employed to extract the integrated features of deep semantic information and low hierarchical detail information of expression images.Finally,experimental results illustrate that the proposed method achieves more competitive performance than several state-of-the-art methods on five benchmark datasets.