Fine-grained recognition of ships based on remote sensing images is crucial to safeguarding maritime rights and interests and maintaining national security.Currently,with the emergence of massive high-resolution multi...Fine-grained recognition of ships based on remote sensing images is crucial to safeguarding maritime rights and interests and maintaining national security.Currently,with the emergence of massive high-resolution multi-modality images,the use of multi-modality images for fine-grained recognition has become a promising technology.Fine-grained recognition of multi-modality images imposes higher requirements on the dataset samples.The key to the problem is how to extract and fuse the complementary features of multi-modality images to obtain more discriminative fusion features.The attention mechanism helps the model to pinpoint the key information in the image,resulting in a significant improvement in the model’s performance.In this paper,a dataset for fine-grained recognition of ships based on visible and near-infrared multi-modality remote sensing images has been proposed first,named Dataset for Multimodal Fine-grained Recognition of Ships(DMFGRS).It includes 1,635 pairs of visible and near-infrared remote sensing images divided into 20 categories,collated from digital orthophotos model provided by commercial remote sensing satellites.DMFGRS provides two types of annotation format files,as well as segmentation mask images corresponding to the ship targets.Then,a Multimodal Information Cross-Enhancement Network(MICE-Net)fusing features of visible and near-infrared remote sensing images,has been proposed.In the network,a dual-branch feature extraction and fusion module has been designed to obtain more expressive features.The Feature Cross Enhancement Module(FCEM)achieves the fusion enhancement of the two modal features by making the channel attention and spatial attention work cross-functionally on the feature map.A benchmark is established by evaluating state-of-the-art object recognition algorithms on DMFGRS.MICE-Net conducted experiments on DMFGRS,and the precision,recall,mAP0.5 and mAP0.5:0.95 reached 87%,77.1%,83.8%and 63.9%,respectively.Extensive experiments demonstrate that the proposed MICE-Net has more excellent performance on DMFGRS.Built on lightweight network YOLO,the model has excellent generalizability,and thus has good potential for application in real-life scenarios.展开更多
The effectiveness of facial expression recognition(FER)algorithms hinges on the model’s quality and the availability of a substantial amount of labeled expression data.However,labeling large datasets demands signific...The effectiveness of facial expression recognition(FER)algorithms hinges on the model’s quality and the availability of a substantial amount of labeled expression data.However,labeling large datasets demands significant human,time,and financial resources.Although active learning methods have mitigated the dependency on extensive labeled data,a cold-start problem persists in small to medium-sized expression recognition datasets.This issue arises because the initial labeled data often fails to represent the full spectrum of facial expression characteristics.This paper introduces an active learning approach that integrates uncertainty estimation,aiming to improve the precision of facial expression recognition regardless of dataset scale variations.The method is divided into two primary phases.First,the model undergoes self-supervised pre-training using contrastive learning and uncertainty estimation to bolster its feature extraction capabilities.Second,the model is fine-tuned using the prior knowledge obtained from the pre-training phase to significantly improve recognition accuracy.In the pretraining phase,the model employs contrastive learning to extract fundamental feature representations from the complete unlabeled dataset.These features are then weighted through a self-attention mechanism with rank regularization.Subsequently,data from the low-weighted set is relabeled to further refine the model’s feature extraction ability.The pre-trained model is then utilized in active learning to select and label information-rich samples more efficiently.Experimental results demonstrate that the proposed method significantly outperforms existing approaches,achieving an improvement in recognition accuracy of 5.09%and 3.82%over the best existing active learning methods,Margin,and Least Confidence methods,respectively,and a 1.61%improvement compared to the conventional segmented active learning method.展开更多
Convolutional neural networks struggle to accurately handle changes in angles and twists in the direction of images,which affects their ability to recognize patterns based on internal feature levels. In contrast, Caps...Convolutional neural networks struggle to accurately handle changes in angles and twists in the direction of images,which affects their ability to recognize patterns based on internal feature levels. In contrast, CapsNet overcomesthese limitations by vectorizing information through increased directionality and magnitude, ensuring that spatialinformation is not overlooked. Therefore, this study proposes a novel expression recognition technique calledCAPSULE-VGG, which combines the strengths of CapsNet and convolutional neural networks. By refining andintegrating features extracted by a convolutional neural network before introducing theminto CapsNet, ourmodelenhances facial recognition capabilities. Compared to traditional neural network models, our approach offersfaster training pace, improved convergence speed, and higher accuracy rates approaching stability. Experimentalresults demonstrate that our method achieves recognition rates of 74.14% for the FER2013 expression dataset and99.85% for the CK+ expression dataset. By contrasting these findings with those obtained using conventionalexpression recognition techniques and incorporating CapsNet’s advantages, we effectively address issues associatedwith convolutional neural networks while increasing expression identification accuracy.展开更多
In smart classrooms, conducting multi-face expression recognition based on existing hardware devices to assessstudents’ group emotions can provide educators with more comprehensive and intuitive classroom effect anal...In smart classrooms, conducting multi-face expression recognition based on existing hardware devices to assessstudents’ group emotions can provide educators with more comprehensive and intuitive classroom effect analysis,thereby continuouslypromotingthe improvementof teaching quality.However,most existingmulti-face expressionrecognition methods adopt a multi-stage approach, with an overall complex process, poor real-time performance,and insufficient generalization ability. In addition, the existing facial expression datasets are mostly single faceimages, which are of low quality and lack specificity, also restricting the development of this research. This paperaims to propose an end-to-end high-performance multi-face expression recognition algorithm model suitable forsmart classrooms, construct a high-quality multi-face expression dataset to support algorithm research, and applythe model to group emotion assessment to expand its application value. To this end, we propose an end-to-endmulti-face expression recognition algorithm model for smart classrooms (E2E-MFERC). In order to provide highqualityand highly targeted data support for model research, we constructed a multi-face expression dataset inreal classrooms (MFED), containing 2,385 images and a total of 18,712 expression labels, collected from smartclassrooms. In constructing E2E-MFERC, by introducing Re-parameterization visual geometry group (RepVGG)block and symmetric positive definite convolution (SPD-Conv) modules to enhance representational capability;combined with the cross stage partial network fusion module optimized by attention mechanism (C2f_Attention),it strengthens the ability to extract key information;adopts asymptotic feature pyramid network (AFPN) featurefusion tailored to classroomscenes and optimizes the head prediction output size;achieves high-performance endto-end multi-face expression detection. Finally, we apply the model to smart classroom group emotion assessmentand provide design references for classroom effect analysis evaluation metrics. Experiments based on MFED showthat the mAP and F1-score of E2E-MFERC on classroom evaluation data reach 83.6% and 0.77, respectively,improving the mAP of same-scale You Only Look Once version 5 (YOLOv5) and You Only Look Once version8 (YOLOv8) by 6.8% and 2.5%, respectively, and the F1-score by 0.06 and 0.04, respectively. E2E-MFERC modelhas obvious advantages in both detection speed and accuracy, which can meet the practical needs of real-timemulti-face expression analysis in classrooms, and serve the application of teaching effect assessment very well.展开更多
Continuous emotion recognition is to predict emotion states through affective information and more focus on the continuous variation of emotion. Fusion of electroencephalography (EEG) and facial expressions videos has...Continuous emotion recognition is to predict emotion states through affective information and more focus on the continuous variation of emotion. Fusion of electroencephalography (EEG) and facial expressions videos has been used in this field, while there are with some limitations in current researches, such as hand-engineered features, simple approaches to integration. Hence, a new continuous emotion recognition model is proposed based on the fusion of EEG and facial expressions videos named residual multimodal Transformer (RMMT). Firstly, the Resnet50 and temporal convolutional network (TCN) are utilised to extract spatiotemporal features from videos, and the TCN is also applied to process the computed EEG frequency power to acquire spatiotemporal features of EEG. Then, a multimodal Transformer is used to fuse the spatiotemporal features from the two modalities. Furthermore, a residual connection is introduced to fuse shallow features with deep features which is verified to be effective for continuous emotion recognition through experiments. Inspired by knowledge distillation, the authors incorporate feature-level loss into the loss function to further enhance the network performance. Experimental results show that the RMMT reaches a superior performance over other methods for the MAHNOB-HCI dataset. Ablation studies on the residual connection and loss function in the RMMT demonstrate that both of them is functional.展开更多
Peptidoglycan recognition proteins(PGRPs) are a family of pattern recognition receptors(PRRs) of the immune system,which bind and hydrolyze bacterial peptidoglycan.Here,a long type PGRP(PGRP-L) was first cloned ...Peptidoglycan recognition proteins(PGRPs) are a family of pattern recognition receptors(PRRs) of the immune system,which bind and hydrolyze bacterial peptidoglycan.Here,a long type PGRP(PGRP-L) was first cloned in the lower vertebrate species Xenopus tropicalis(Xt).The XtPGRP-L possessed a conserved genomic structure with five exons and four introns.The alignment and phylogenetic analysis indicated that XtPGRP-L might be a type of amidase-like PGRP.The 3-D model showed that XtPGRP-L possessed a conserved structure compared with the Drosophila PGRP-Lb.During embryonic development,XtPGRP-L was not expressed until the 72 h tadpole stage.In adult tissues,it was strongly expressed in the liver,lung,intestine,and stomach.Furthermore,after LPS stimulation,the expression of XtPGRP-L was up-regulated significantly in the liver,intestine and spleen,indicating that XtPGRP-L may play an important role in the innate immunity of Xenopus tropicalis.展开更多
A novel fuzzy linear discriminant analysis method by the canonical correlation analysis (fuzzy-LDA/CCA)is presented and applied to the facial expression recognition. The fuzzy method is used to evaluate the degree o...A novel fuzzy linear discriminant analysis method by the canonical correlation analysis (fuzzy-LDA/CCA)is presented and applied to the facial expression recognition. The fuzzy method is used to evaluate the degree of the class membership to which each training sample belongs. CCA is then used to establish the relationship between each facial image and the corresponding class membership vector, and the class membership vector of a test image is estimated using this relationship. Moreover, the fuzzy-LDA/CCA method is also generalized to deal with nonlinear discriminant analysis problems via kernel method. The performance of the proposed method is demonstrated using real data.展开更多
Facial expression recognition is a hot topic in computer vision, but it remains challenging due to the feature inconsistency caused by person-specific 'characteristics of facial expressions. To address such a chal...Facial expression recognition is a hot topic in computer vision, but it remains challenging due to the feature inconsistency caused by person-specific 'characteristics of facial expressions. To address such a challenge, and inspired by the recent success of deep identity network (DeepID-Net) for face identification, this paper proposes a novel deep learning based framework for recognising human expressions with facial images. Compared to the existing deep learning methods, our proposed framework, which is based on multi-scale global images and local facial patches, can significantly achieve a better performance on facial expression recognition. Finally, we verify the effectiveness of our proposed framework through experiments on the public benchmarking datasets JAFFE and extended Cohn-Kanade (CK+).展开更多
A facial expression emotion recognition based human-robot interaction(FEER-HRI) system is proposed, for which a four-layer system framework is designed. The FEERHRI system enables the robots not only to recognize huma...A facial expression emotion recognition based human-robot interaction(FEER-HRI) system is proposed, for which a four-layer system framework is designed. The FEERHRI system enables the robots not only to recognize human emotions, but also to generate facial expression for adapting to human emotions. A facial emotion recognition method based on2D-Gabor, uniform local binary pattern(LBP) operator, and multiclass extreme learning machine(ELM) classifier is presented,which is applied to real-time facial expression recognition for robots. Facial expressions of robots are represented by simple cartoon symbols and displayed by a LED screen equipped in the robots, which can be easily understood by human. Four scenarios,i.e., guiding, entertainment, home service and scene simulation are performed in the human-robot interaction experiment, in which smooth communication is realized by facial expression recognition of humans and facial expression generation of robots within 2 seconds. As a few prospective applications, the FEERHRI system can be applied in home service, smart home, safe driving, and so on.展开更多
Functional magnetic resonance imaging was used during emotion recognition to identify changes in functional brain activation in 21 first-episode, treatment-naive major depressive disorder patients before and after ant...Functional magnetic resonance imaging was used during emotion recognition to identify changes in functional brain activation in 21 first-episode, treatment-naive major depressive disorder patients before and after antidepressant treatment. Following escitalopram oxalate treatment, patients exhibited decreased activation in bilateral precentral gyrus, bilateral middle frontal gyrus, left middle temporal gyrus, bilateral postcentral gyrus, left cingulate and right parahippocampal gyrus, and increased activation in right superior frontal gyrus, bilateral superior parietal Iobule and left occipital gyrus during sad facial expression recognition. After antidepressant treatment, patients also exhibited decreased activation in the bilateral middle frontal gyrus, bilateral cingulate and right parahippocampal gyrus, and increased activation in the right inferior frontal gyrus, left fusiform gyrus and right precuneus during happy facial expression recognition. Our experimental findings indicate that the limbic-cortical network might be a key target region for antidepressant treatment in major depressive disorder.展开更多
In this paper, a novel method based on dual-tree complex wavelet transform(DT-CWT) and rotation invariant local binary pattern(LBP) for facial expression recognition is proposed. The quarter sample shift (Q-shift) DT-...In this paper, a novel method based on dual-tree complex wavelet transform(DT-CWT) and rotation invariant local binary pattern(LBP) for facial expression recognition is proposed. The quarter sample shift (Q-shift) DT-CWT can provide a group delay of 1/4 of a sample period, and satisfy the usual 2-band filter bank constraints of no aliasing and perfect reconstruction. To resolve illumination variation in expression verification, low-frequency coefficients produced by DT-CWT are set zeroes, high-frequency coefficients are used for reconstructing the image, and basic LBP histogram is mapped on the reconstructed image by means of histogram specification. LBP is capable of encoding texture and shape information of the preprocessed images. The histogram graphs built from multi-scale rotation invariant LBPs are combined to serve as feature for further recognition. Template matching is adopted to classify facial expressions for its simplicity. The experimental results show that the proposed approach has good performance in efficiency and accuracy.展开更多
For the problems of complex model structure and too many training parameters in facial expression recognition algorithms,we proposed a residual network structure with a multi-headed channel attention(MCA)module.The mi...For the problems of complex model structure and too many training parameters in facial expression recognition algorithms,we proposed a residual network structure with a multi-headed channel attention(MCA)module.The migration learning algorithm is used to pre-train the convolutional layer parameters and mitigate the overfitting caused by the insufficient number of training samples.The designed MCA module is integrated into the ResNet18 backbone network.The attention mechanism highlights important information and suppresses irrelevant information by assigning different coefficients or weights,and the multi-head structure focuses more on the local features of the pictures,which improves the efficiency of facial expression recognition.Experimental results demonstrate that the model proposed in this paper achieves excellent recognition results in Fer2013,CK+and Jaffe datasets,with accuracy rates of 72.7%,98.8%and 93.33%,respectively.展开更多
A new method to extract person-independent expression feature based on higher-order singular value decomposition (HOSVD) is proposed for facial expression recognition. Based on the assumption that similar persons ha...A new method to extract person-independent expression feature based on higher-order singular value decomposition (HOSVD) is proposed for facial expression recognition. Based on the assumption that similar persons have similar facial expression appearance and shape, the person-similarity weighted expression feature is proposed to estimate the expression feature of test persons. As a result, the estimated expression feature can reduce the influence of individuals caused by insufficient training data, and hence become less person-dependent. The proposed method is tested on Cohn-Kanade facial expression database and Japanese female facial expression (JAFFE) database. Person-independent experimental results show the superiority of the proposed method over the existing methods.展开更多
In computer vision,emotion recognition using facial expression images is considered an important research issue.Deep learning advances in recent years have aided in attaining improved results in this issue.According t...In computer vision,emotion recognition using facial expression images is considered an important research issue.Deep learning advances in recent years have aided in attaining improved results in this issue.According to recent studies,multiple facial expressions may be included in facial photographs representing a particular type of emotion.It is feasible and useful to convert face photos into collections of visual words and carry out global expression recognition.The main contribution of this paper is to propose a facial expression recognitionmodel(FERM)depending on an optimized Support Vector Machine(SVM).To test the performance of the proposed model(FERM),AffectNet is used.AffectNet uses 1250 emotion-related keywords in six different languages to search three major search engines and get over 1,000,000 facial photos online.The FERM is composed of three main phases:(i)the Data preparation phase,(ii)Applying grid search for optimization,and(iii)the categorization phase.Linear discriminant analysis(LDA)is used to categorize the data into eight labels(neutral,happy,sad,surprised,fear,disgust,angry,and contempt).Due to using LDA,the performance of categorization via SVM has been obviously enhanced.Grid search is used to find the optimal values for hyperparameters of SVM(C and gamma).The proposed optimized SVM algorithm has achieved an accuracy of 99%and a 98%F1 score.展开更多
One of the characteristics of Autism Spectrum Disorder (ASD) is social disorder. The specificity of facial and expression recognition for people with ASD is gathering attention as a factor of this social disorder. The...One of the characteristics of Autism Spectrum Disorder (ASD) is social disorder. The specificity of facial and expression recognition for people with ASD is gathering attention as a factor of this social disorder. The study examined the hemodynamic activities in the prefrontal cortex using near-infrared spectroscopy (NIRS) when a person with ASD performed an expression recognition task. The subjects were twenty males (18 - 22 years old) with ASD and without intellectual disabilities. Forty-five healthy males matched for age and sex were included as a control group. In both groups, the degree of autistic tendencies was evaluated using the Autism-Spectrum Quotient (AQ). Using eight standard emotional expressions of Japanese people, two expression recognition tasks were set. An NIRS was used to measure the prefrontal cortex blood mobilization during the expression-processing process. The AQ was significantly higher in the ASD group, while the rate of overall correct expression response was significantly lower (p ρ= −0.40 p < 0.001). In the automatic expression-processing task, no activation in the prefrontal cortex was found in either the ASD or the control group. In the conscious expression-processing task, the activation of the left and right lateral prefrontal cortex was weaker in the ASD group compared to the control group. Unlike in the control group, a mild activation of posterior prefrontal cortex was found in the ASD group. The expression-processing process of the ASD group was found to be different from that of the control group. NIRS was effective in detecting a brain function disorder in people with ASD during an expression-processing process.展开更多
Mining more discriminative temporal features to enrich temporal context representation is considered the key to fine-grained action recog-nition.Previous action recognition methods utilize a fixed spatiotemporal windo...Mining more discriminative temporal features to enrich temporal context representation is considered the key to fine-grained action recog-nition.Previous action recognition methods utilize a fixed spatiotemporal window to learn local video representation.However,these methods failed to capture complex motion patterns due to their limited receptive field.To solve the above problems,this paper proposes a lightweight Temporal Pyramid Excitation(TPE)module to capture the short,medium,and long-term temporal context.In this method,Temporal Pyramid(TP)module can effectively expand the temporal receptive field of the network by using the multi-temporal kernel decomposition without significantly increasing the computational cost.In addition,the Multi Excitation module can emphasize temporal importance to enhance the temporal feature representation learning.TPE can be integrated into ResNet50,and building a compact video learning framework-TPENet.Extensive validation experiments on several challenging benchmark(Something-Something V1,Something-Something V2,UCF-101,and HMDB51)datasets demonstrate that our method achieves a preferable balance between computation and accuracy.展开更多
AIM: To conduct a systematic literature review about the influence of gender on the recognition of facial expressions of six basic emotions. METHODS: We made a systematic search with the search terms(face OR facial) A...AIM: To conduct a systematic literature review about the influence of gender on the recognition of facial expressions of six basic emotions. METHODS: We made a systematic search with the search terms(face OR facial) AND(processing OR recognition OR perception) AND(emotional OR emotion) AND(gender or sex) in Pub Med, Psyc INFO, LILACS, and Sci ELO electronic databases for articles assessing outcomes related to response accuracy and latency and emotional intensity. The articles selection was performed according to parameters set by COCHRANE. The reference lists of the articles found through the database search were checked for additional references of interest. RESULTS: In respect to accuracy, women tend to perform better than men when all emotions are considered as a set. Regarding specific emotions, there seems to be no gender-related differences in the recognition of happiness, whereas results are quite heterogeneous in respect to the remaining emotions, especially sadness, anger, and disgust. Fewer articles dealt with the parameters of response latency and emotional intensity, which hinders the generalization of their findings, especially in the face of their methodological differences. CONCLUSION: The analysis of the studies conducted to date do not allow for definite conclusions concerning the role of the observer's gender in the recognition of facial emotion, mostly because of the absence of standardized methods of investigation.展开更多
Facial micro-expressions are short and imperceptible expressions that involuntarily reveal the true emotions that a person may be attempting to suppress,hide,disguise,or conceal.Such expressions can reflect a person...Facial micro-expressions are short and imperceptible expressions that involuntarily reveal the true emotions that a person may be attempting to suppress,hide,disguise,or conceal.Such expressions can reflect a person's real emotions and have a wide range of application in public safety and clinical diagnosis.The analysis of facial micro-expressions in video sequences through computer vision is still relatively recent.In this research,a comprehensive review on the topic of spotting and recognition used in micro expression analysis databases and methods,is conducted,and advanced technologies in this area are summarized.In addition,we discuss challenges that remain unresolved alongside future work to be completed in the field of micro-expression analysis.展开更多
The fine-grained ship image recognition task aims to identify various classes of ships.However,small inter-class,large intra-class differences between ships,and lacking of training samples are the reasons that make th...The fine-grained ship image recognition task aims to identify various classes of ships.However,small inter-class,large intra-class differences between ships,and lacking of training samples are the reasons that make the task difficult.Therefore,to enhance the accuracy of the fine-grained ship image recognition,we design a fine-grained ship image recognition network based on bilinear convolutional neural network(BCNN)with Inception and additive margin Softmax(AM-Softmax).This network improves the BCNN in two aspects.Firstly,by introducing Inception branches to the BCNN network,it is helpful to enhance the ability of extracting comprehensive features from ships.Secondly,by adding margin values to the decision boundary,the AM-Softmax function can better extend the inter-class differences and reduce the intra-class differences.In addition,as there are few publicly available datasets for fine-grained ship image recognition,we construct a Ship-43 dataset containing 47,300 ship images belonging to 43 categories.Experimental results on the constructed Ship-43 dataset demonstrate that our method can effectively improve the accuracy of ship image recognition,which is 4.08%higher than the BCNN model.Moreover,comparison results on the other three public fine-grained datasets(Cub,Cars,and Aircraft)further validate the effectiveness of the proposed method.展开更多
A deep fusion model is proposed for facial expression-based human-computer Interaction system.Initially,image preprocessing,i.e.,the extraction of the facial region from the input image is utilized.Thereafter,the extr...A deep fusion model is proposed for facial expression-based human-computer Interaction system.Initially,image preprocessing,i.e.,the extraction of the facial region from the input image is utilized.Thereafter,the extraction of more discriminative and distinctive deep learning features is achieved using extracted facial regions.To prevent overfitting,in-depth features of facial images are extracted and assigned to the proposed convolutional neural network(CNN)models.Various CNN models are then trained.Finally,the performance of each CNN model is fused to obtain the final decision for the seven basic classes of facial expressions,i.e.,fear,disgust,anger,surprise,sadness,happiness,neutral.For experimental purposes,three benchmark datasets,i.e.,SFEW,CK+,and KDEF are utilized.The performance of the proposed systemis compared with some state-of-the-artmethods concerning each dataset.Extensive performance analysis reveals that the proposed system outperforms the competitive methods in terms of various performance metrics.Finally,the proposed deep fusion model is being utilized to control a music player using the recognized emotions of the users.展开更多
文摘Fine-grained recognition of ships based on remote sensing images is crucial to safeguarding maritime rights and interests and maintaining national security.Currently,with the emergence of massive high-resolution multi-modality images,the use of multi-modality images for fine-grained recognition has become a promising technology.Fine-grained recognition of multi-modality images imposes higher requirements on the dataset samples.The key to the problem is how to extract and fuse the complementary features of multi-modality images to obtain more discriminative fusion features.The attention mechanism helps the model to pinpoint the key information in the image,resulting in a significant improvement in the model’s performance.In this paper,a dataset for fine-grained recognition of ships based on visible and near-infrared multi-modality remote sensing images has been proposed first,named Dataset for Multimodal Fine-grained Recognition of Ships(DMFGRS).It includes 1,635 pairs of visible and near-infrared remote sensing images divided into 20 categories,collated from digital orthophotos model provided by commercial remote sensing satellites.DMFGRS provides two types of annotation format files,as well as segmentation mask images corresponding to the ship targets.Then,a Multimodal Information Cross-Enhancement Network(MICE-Net)fusing features of visible and near-infrared remote sensing images,has been proposed.In the network,a dual-branch feature extraction and fusion module has been designed to obtain more expressive features.The Feature Cross Enhancement Module(FCEM)achieves the fusion enhancement of the two modal features by making the channel attention and spatial attention work cross-functionally on the feature map.A benchmark is established by evaluating state-of-the-art object recognition algorithms on DMFGRS.MICE-Net conducted experiments on DMFGRS,and the precision,recall,mAP0.5 and mAP0.5:0.95 reached 87%,77.1%,83.8%and 63.9%,respectively.Extensive experiments demonstrate that the proposed MICE-Net has more excellent performance on DMFGRS.Built on lightweight network YOLO,the model has excellent generalizability,and thus has good potential for application in real-life scenarios.
基金supported by National Science Foundation of China(61971078)Chongqing Municipal Education Commission Science and Technology Major Project(KJZDM202301901).
文摘The effectiveness of facial expression recognition(FER)algorithms hinges on the model’s quality and the availability of a substantial amount of labeled expression data.However,labeling large datasets demands significant human,time,and financial resources.Although active learning methods have mitigated the dependency on extensive labeled data,a cold-start problem persists in small to medium-sized expression recognition datasets.This issue arises because the initial labeled data often fails to represent the full spectrum of facial expression characteristics.This paper introduces an active learning approach that integrates uncertainty estimation,aiming to improve the precision of facial expression recognition regardless of dataset scale variations.The method is divided into two primary phases.First,the model undergoes self-supervised pre-training using contrastive learning and uncertainty estimation to bolster its feature extraction capabilities.Second,the model is fine-tuned using the prior knowledge obtained from the pre-training phase to significantly improve recognition accuracy.In the pretraining phase,the model employs contrastive learning to extract fundamental feature representations from the complete unlabeled dataset.These features are then weighted through a self-attention mechanism with rank regularization.Subsequently,data from the low-weighted set is relabeled to further refine the model’s feature extraction ability.The pre-trained model is then utilized in active learning to select and label information-rich samples more efficiently.Experimental results demonstrate that the proposed method significantly outperforms existing approaches,achieving an improvement in recognition accuracy of 5.09%and 3.82%over the best existing active learning methods,Margin,and Least Confidence methods,respectively,and a 1.61%improvement compared to the conventional segmented active learning method.
基金the following funds:The Key Scientific Research Project of Anhui Provincial Research Preparation Plan in 2023(Nos.2023AH051806,2023AH052097,2023AH052103)Anhui Province Quality Engineering Project(Nos.2022sx099,2022cxtd097)+1 种基金University-Level Teaching and Research Key Projects(Nos.ch21jxyj01,XLZ-202208,XLZ-202106)Special Support Plan for Innovation and Entrepreneurship Leaders in Anhui Province。
文摘Convolutional neural networks struggle to accurately handle changes in angles and twists in the direction of images,which affects their ability to recognize patterns based on internal feature levels. In contrast, CapsNet overcomesthese limitations by vectorizing information through increased directionality and magnitude, ensuring that spatialinformation is not overlooked. Therefore, this study proposes a novel expression recognition technique calledCAPSULE-VGG, which combines the strengths of CapsNet and convolutional neural networks. By refining andintegrating features extracted by a convolutional neural network before introducing theminto CapsNet, ourmodelenhances facial recognition capabilities. Compared to traditional neural network models, our approach offersfaster training pace, improved convergence speed, and higher accuracy rates approaching stability. Experimentalresults demonstrate that our method achieves recognition rates of 74.14% for the FER2013 expression dataset and99.85% for the CK+ expression dataset. By contrasting these findings with those obtained using conventionalexpression recognition techniques and incorporating CapsNet’s advantages, we effectively address issues associatedwith convolutional neural networks while increasing expression identification accuracy.
基金the Science and Technology Project of State Grid Corporation of China under Grant No.5700-202318292A-1-1-ZN.
文摘In smart classrooms, conducting multi-face expression recognition based on existing hardware devices to assessstudents’ group emotions can provide educators with more comprehensive and intuitive classroom effect analysis,thereby continuouslypromotingthe improvementof teaching quality.However,most existingmulti-face expressionrecognition methods adopt a multi-stage approach, with an overall complex process, poor real-time performance,and insufficient generalization ability. In addition, the existing facial expression datasets are mostly single faceimages, which are of low quality and lack specificity, also restricting the development of this research. This paperaims to propose an end-to-end high-performance multi-face expression recognition algorithm model suitable forsmart classrooms, construct a high-quality multi-face expression dataset to support algorithm research, and applythe model to group emotion assessment to expand its application value. To this end, we propose an end-to-endmulti-face expression recognition algorithm model for smart classrooms (E2E-MFERC). In order to provide highqualityand highly targeted data support for model research, we constructed a multi-face expression dataset inreal classrooms (MFED), containing 2,385 images and a total of 18,712 expression labels, collected from smartclassrooms. In constructing E2E-MFERC, by introducing Re-parameterization visual geometry group (RepVGG)block and symmetric positive definite convolution (SPD-Conv) modules to enhance representational capability;combined with the cross stage partial network fusion module optimized by attention mechanism (C2f_Attention),it strengthens the ability to extract key information;adopts asymptotic feature pyramid network (AFPN) featurefusion tailored to classroomscenes and optimizes the head prediction output size;achieves high-performance endto-end multi-face expression detection. Finally, we apply the model to smart classroom group emotion assessmentand provide design references for classroom effect analysis evaluation metrics. Experiments based on MFED showthat the mAP and F1-score of E2E-MFERC on classroom evaluation data reach 83.6% and 0.77, respectively,improving the mAP of same-scale You Only Look Once version 5 (YOLOv5) and You Only Look Once version8 (YOLOv8) by 6.8% and 2.5%, respectively, and the F1-score by 0.06 and 0.04, respectively. E2E-MFERC modelhas obvious advantages in both detection speed and accuracy, which can meet the practical needs of real-timemulti-face expression analysis in classrooms, and serve the application of teaching effect assessment very well.
基金State Key Development Program in 14th Five-Year under Grant No.2021YFF0900701.
文摘Continuous emotion recognition is to predict emotion states through affective information and more focus on the continuous variation of emotion. Fusion of electroencephalography (EEG) and facial expressions videos has been used in this field, while there are with some limitations in current researches, such as hand-engineered features, simple approaches to integration. Hence, a new continuous emotion recognition model is proposed based on the fusion of EEG and facial expressions videos named residual multimodal Transformer (RMMT). Firstly, the Resnet50 and temporal convolutional network (TCN) are utilised to extract spatiotemporal features from videos, and the TCN is also applied to process the computed EEG frequency power to acquire spatiotemporal features of EEG. Then, a multimodal Transformer is used to fuse the spatiotemporal features from the two modalities. Furthermore, a residual connection is introduced to fuse shallow features with deep features which is verified to be effective for continuous emotion recognition through experiments. Inspired by knowledge distillation, the authors incorporate feature-level loss into the loss function to further enhance the network performance. Experimental results show that the RMMT reaches a superior performance over other methods for the MAHNOB-HCI dataset. Ablation studies on the residual connection and loss function in the RMMT demonstrate that both of them is functional.
基金supported by the Project from the Natural Science Foundation of the Jiangsu Higher Education Institutions of China (10KJB240001)the Foundation for Talent Recruitment of Yancheng Institute of Technology (XKR2011007)the National Natural Science Foundation of China (30830083)
文摘Peptidoglycan recognition proteins(PGRPs) are a family of pattern recognition receptors(PRRs) of the immune system,which bind and hydrolyze bacterial peptidoglycan.Here,a long type PGRP(PGRP-L) was first cloned in the lower vertebrate species Xenopus tropicalis(Xt).The XtPGRP-L possessed a conserved genomic structure with five exons and four introns.The alignment and phylogenetic analysis indicated that XtPGRP-L might be a type of amidase-like PGRP.The 3-D model showed that XtPGRP-L possessed a conserved structure compared with the Drosophila PGRP-Lb.During embryonic development,XtPGRP-L was not expressed until the 72 h tadpole stage.In adult tissues,it was strongly expressed in the liver,lung,intestine,and stomach.Furthermore,after LPS stimulation,the expression of XtPGRP-L was up-regulated significantly in the liver,intestine and spleen,indicating that XtPGRP-L may play an important role in the innate immunity of Xenopus tropicalis.
基金The National Natural Science Foundation of China (No.60503023,60872160)the Natural Science Foundation for Universities ofJiangsu Province (No.08KJD520009)the Intramural Research Foundationof Nanjing University of Information Science and Technology(No.Y603)
文摘A novel fuzzy linear discriminant analysis method by the canonical correlation analysis (fuzzy-LDA/CCA)is presented and applied to the facial expression recognition. The fuzzy method is used to evaluate the degree of the class membership to which each training sample belongs. CCA is then used to establish the relationship between each facial image and the corresponding class membership vector, and the class membership vector of a test image is estimated using this relationship. Moreover, the fuzzy-LDA/CCA method is also generalized to deal with nonlinear discriminant analysis problems via kernel method. The performance of the proposed method is demonstrated using real data.
基金supported by the Academy of Finland(267581)the D2I SHOK Project from Digile Oy as well as Nokia Technologies(Tampere,Finland)
文摘Facial expression recognition is a hot topic in computer vision, but it remains challenging due to the feature inconsistency caused by person-specific 'characteristics of facial expressions. To address such a challenge, and inspired by the recent success of deep identity network (DeepID-Net) for face identification, this paper proposes a novel deep learning based framework for recognising human expressions with facial images. Compared to the existing deep learning methods, our proposed framework, which is based on multi-scale global images and local facial patches, can significantly achieve a better performance on facial expression recognition. Finally, we verify the effectiveness of our proposed framework through experiments on the public benchmarking datasets JAFFE and extended Cohn-Kanade (CK+).
基金supported by the National Natural Science Foundation of China(61403422,61273102)the Hubei Provincial Natural Science Foundation of China(2015CFA010)+1 种基金the Ⅲ Project(B17040)the Fundamental Research Funds for National University,China University of Geosciences(Wuhan)
文摘A facial expression emotion recognition based human-robot interaction(FEER-HRI) system is proposed, for which a four-layer system framework is designed. The FEERHRI system enables the robots not only to recognize human emotions, but also to generate facial expression for adapting to human emotions. A facial emotion recognition method based on2D-Gabor, uniform local binary pattern(LBP) operator, and multiclass extreme learning machine(ELM) classifier is presented,which is applied to real-time facial expression recognition for robots. Facial expressions of robots are represented by simple cartoon symbols and displayed by a LED screen equipped in the robots, which can be easily understood by human. Four scenarios,i.e., guiding, entertainment, home service and scene simulation are performed in the human-robot interaction experiment, in which smooth communication is realized by facial expression recognition of humans and facial expression generation of robots within 2 seconds. As a few prospective applications, the FEERHRI system can be applied in home service, smart home, safe driving, and so on.
基金supported by research grants from the National Natural Science Foundation of China (No. 81071099)the Liaoning Science and Technology Foundation (No. 2008225010-14)Doctoral Foundation of the First Affiliated Hospital in China Medical University (No. 2010)
文摘Functional magnetic resonance imaging was used during emotion recognition to identify changes in functional brain activation in 21 first-episode, treatment-naive major depressive disorder patients before and after antidepressant treatment. Following escitalopram oxalate treatment, patients exhibited decreased activation in bilateral precentral gyrus, bilateral middle frontal gyrus, left middle temporal gyrus, bilateral postcentral gyrus, left cingulate and right parahippocampal gyrus, and increased activation in right superior frontal gyrus, bilateral superior parietal Iobule and left occipital gyrus during sad facial expression recognition. After antidepressant treatment, patients also exhibited decreased activation in the bilateral middle frontal gyrus, bilateral cingulate and right parahippocampal gyrus, and increased activation in the right inferior frontal gyrus, left fusiform gyrus and right precuneus during happy facial expression recognition. Our experimental findings indicate that the limbic-cortical network might be a key target region for antidepressant treatment in major depressive disorder.
文摘In this paper, a novel method based on dual-tree complex wavelet transform(DT-CWT) and rotation invariant local binary pattern(LBP) for facial expression recognition is proposed. The quarter sample shift (Q-shift) DT-CWT can provide a group delay of 1/4 of a sample period, and satisfy the usual 2-band filter bank constraints of no aliasing and perfect reconstruction. To resolve illumination variation in expression verification, low-frequency coefficients produced by DT-CWT are set zeroes, high-frequency coefficients are used for reconstructing the image, and basic LBP histogram is mapped on the reconstructed image by means of histogram specification. LBP is capable of encoding texture and shape information of the preprocessed images. The histogram graphs built from multi-scale rotation invariant LBPs are combined to serve as feature for further recognition. Template matching is adopted to classify facial expressions for its simplicity. The experimental results show that the proposed approach has good performance in efficiency and accuracy.
基金funded by Anhui Province Quality Engineering Project No.2021jyxm0801Natural Science Foundation of Anhui University of Chinese Medicine under Grant Nos.2020zrzd18,2019zrzd11+1 种基金Humanity Social Science foundation Grants 2021rwzd20,2020rwzd07Anhui University of Chinese Medicine Quality Engineering Projects No.2021zlgc046.
文摘For the problems of complex model structure and too many training parameters in facial expression recognition algorithms,we proposed a residual network structure with a multi-headed channel attention(MCA)module.The migration learning algorithm is used to pre-train the convolutional layer parameters and mitigate the overfitting caused by the insufficient number of training samples.The designed MCA module is integrated into the ResNet18 backbone network.The attention mechanism highlights important information and suppresses irrelevant information by assigning different coefficients or weights,and the multi-head structure focuses more on the local features of the pictures,which improves the efficiency of facial expression recognition.Experimental results demonstrate that the model proposed in this paper achieves excellent recognition results in Fer2013,CK+and Jaffe datasets,with accuracy rates of 72.7%,98.8%and 93.33%,respectively.
基金supported by National Natural Science Foundation of China (6087208460940008)+2 种基金Beijing Training Programming Foundation for the Talents (20081D1600300343)Excellent Young Scholar Research Fund of Beijing Institute of Technology (2007Y0305)Fundamental Research Foundation of Beijing Institute of Technology (20080342005)
文摘A new method to extract person-independent expression feature based on higher-order singular value decomposition (HOSVD) is proposed for facial expression recognition. Based on the assumption that similar persons have similar facial expression appearance and shape, the person-similarity weighted expression feature is proposed to estimate the expression feature of test persons. As a result, the estimated expression feature can reduce the influence of individuals caused by insufficient training data, and hence become less person-dependent. The proposed method is tested on Cohn-Kanade facial expression database and Japanese female facial expression (JAFFE) database. Person-independent experimental results show the superiority of the proposed method over the existing methods.
文摘In computer vision,emotion recognition using facial expression images is considered an important research issue.Deep learning advances in recent years have aided in attaining improved results in this issue.According to recent studies,multiple facial expressions may be included in facial photographs representing a particular type of emotion.It is feasible and useful to convert face photos into collections of visual words and carry out global expression recognition.The main contribution of this paper is to propose a facial expression recognitionmodel(FERM)depending on an optimized Support Vector Machine(SVM).To test the performance of the proposed model(FERM),AffectNet is used.AffectNet uses 1250 emotion-related keywords in six different languages to search three major search engines and get over 1,000,000 facial photos online.The FERM is composed of three main phases:(i)the Data preparation phase,(ii)Applying grid search for optimization,and(iii)the categorization phase.Linear discriminant analysis(LDA)is used to categorize the data into eight labels(neutral,happy,sad,surprised,fear,disgust,angry,and contempt).Due to using LDA,the performance of categorization via SVM has been obviously enhanced.Grid search is used to find the optimal values for hyperparameters of SVM(C and gamma).The proposed optimized SVM algorithm has achieved an accuracy of 99%and a 98%F1 score.
文摘One of the characteristics of Autism Spectrum Disorder (ASD) is social disorder. The specificity of facial and expression recognition for people with ASD is gathering attention as a factor of this social disorder. The study examined the hemodynamic activities in the prefrontal cortex using near-infrared spectroscopy (NIRS) when a person with ASD performed an expression recognition task. The subjects were twenty males (18 - 22 years old) with ASD and without intellectual disabilities. Forty-five healthy males matched for age and sex were included as a control group. In both groups, the degree of autistic tendencies was evaluated using the Autism-Spectrum Quotient (AQ). Using eight standard emotional expressions of Japanese people, two expression recognition tasks were set. An NIRS was used to measure the prefrontal cortex blood mobilization during the expression-processing process. The AQ was significantly higher in the ASD group, while the rate of overall correct expression response was significantly lower (p ρ= −0.40 p < 0.001). In the automatic expression-processing task, no activation in the prefrontal cortex was found in either the ASD or the control group. In the conscious expression-processing task, the activation of the left and right lateral prefrontal cortex was weaker in the ASD group compared to the control group. Unlike in the control group, a mild activation of posterior prefrontal cortex was found in the ASD group. The expression-processing process of the ASD group was found to be different from that of the control group. NIRS was effective in detecting a brain function disorder in people with ASD during an expression-processing process.
基金supported by the research team of Xi’an Traffic Engineering Institute and the Young and middle-aged fund project of Xi’an Traffic Engineering Institute (2022KY-02).
文摘Mining more discriminative temporal features to enrich temporal context representation is considered the key to fine-grained action recog-nition.Previous action recognition methods utilize a fixed spatiotemporal window to learn local video representation.However,these methods failed to capture complex motion patterns due to their limited receptive field.To solve the above problems,this paper proposes a lightweight Temporal Pyramid Excitation(TPE)module to capture the short,medium,and long-term temporal context.In this method,Temporal Pyramid(TP)module can effectively expand the temporal receptive field of the network by using the multi-temporal kernel decomposition without significantly increasing the computational cost.In addition,the Multi Excitation module can emphasize temporal importance to enhance the temporal feature representation learning.TPE can be integrated into ResNet50,and building a compact video learning framework-TPENet.Extensive validation experiments on several challenging benchmark(Something-Something V1,Something-Something V2,UCF-101,and HMDB51)datasets demonstrate that our method achieves a preferable balance between computation and accuracy.
基金Supported by FAPESP-Fundacao de Amparo à Pesquisa do Estado de Sao Paulo,No.2012/02260-7
文摘AIM: To conduct a systematic literature review about the influence of gender on the recognition of facial expressions of six basic emotions. METHODS: We made a systematic search with the search terms(face OR facial) AND(processing OR recognition OR perception) AND(emotional OR emotion) AND(gender or sex) in Pub Med, Psyc INFO, LILACS, and Sci ELO electronic databases for articles assessing outcomes related to response accuracy and latency and emotional intensity. The articles selection was performed according to parameters set by COCHRANE. The reference lists of the articles found through the database search were checked for additional references of interest. RESULTS: In respect to accuracy, women tend to perform better than men when all emotions are considered as a set. Regarding specific emotions, there seems to be no gender-related differences in the recognition of happiness, whereas results are quite heterogeneous in respect to the remaining emotions, especially sadness, anger, and disgust. Fewer articles dealt with the parameters of response latency and emotional intensity, which hinders the generalization of their findings, especially in the face of their methodological differences. CONCLUSION: The analysis of the studies conducted to date do not allow for definite conclusions concerning the role of the observer's gender in the recognition of facial emotion, mostly because of the absence of standardized methods of investigation.
文摘Facial micro-expressions are short and imperceptible expressions that involuntarily reveal the true emotions that a person may be attempting to suppress,hide,disguise,or conceal.Such expressions can reflect a person's real emotions and have a wide range of application in public safety and clinical diagnosis.The analysis of facial micro-expressions in video sequences through computer vision is still relatively recent.In this research,a comprehensive review on the topic of spotting and recognition used in micro expression analysis databases and methods,is conducted,and advanced technologies in this area are summarized.In addition,we discuss challenges that remain unresolved alongside future work to be completed in the field of micro-expression analysis.
基金This work is supported by the National Natural Science Foundation of China(61806013,61876010,62176009,and 61906005)General project of Science and Technology Planof Beijing Municipal Education Commission(KM202110005028)+2 种基金Beijing Municipal Education Commission Project(KZ201910005008)Project of Interdisciplinary Research Institute of Beijing University of Technology(2021020101)International Research Cooperation Seed Fund of Beijing University of Technology(2021A01).
文摘The fine-grained ship image recognition task aims to identify various classes of ships.However,small inter-class,large intra-class differences between ships,and lacking of training samples are the reasons that make the task difficult.Therefore,to enhance the accuracy of the fine-grained ship image recognition,we design a fine-grained ship image recognition network based on bilinear convolutional neural network(BCNN)with Inception and additive margin Softmax(AM-Softmax).This network improves the BCNN in two aspects.Firstly,by introducing Inception branches to the BCNN network,it is helpful to enhance the ability of extracting comprehensive features from ships.Secondly,by adding margin values to the decision boundary,the AM-Softmax function can better extend the inter-class differences and reduce the intra-class differences.In addition,as there are few publicly available datasets for fine-grained ship image recognition,we construct a Ship-43 dataset containing 47,300 ship images belonging to 43 categories.Experimental results on the constructed Ship-43 dataset demonstrate that our method can effectively improve the accuracy of ship image recognition,which is 4.08%higher than the BCNN model.Moreover,comparison results on the other three public fine-grained datasets(Cub,Cars,and Aircraft)further validate the effectiveness of the proposed method.
基金supported by the Researchers Supporting Project (No.RSP-2021/395),King Saud University,Riyadh,Saudi Arabia.
文摘A deep fusion model is proposed for facial expression-based human-computer Interaction system.Initially,image preprocessing,i.e.,the extraction of the facial region from the input image is utilized.Thereafter,the extraction of more discriminative and distinctive deep learning features is achieved using extracted facial regions.To prevent overfitting,in-depth features of facial images are extracted and assigned to the proposed convolutional neural network(CNN)models.Various CNN models are then trained.Finally,the performance of each CNN model is fused to obtain the final decision for the seven basic classes of facial expressions,i.e.,fear,disgust,anger,surprise,sadness,happiness,neutral.For experimental purposes,three benchmark datasets,i.e.,SFEW,CK+,and KDEF are utilized.The performance of the proposed systemis compared with some state-of-the-artmethods concerning each dataset.Extensive performance analysis reveals that the proposed system outperforms the competitive methods in terms of various performance metrics.Finally,the proposed deep fusion model is being utilized to control a music player using the recognized emotions of the users.