Due to the lack of long-range association and spatial location information,fine details and accurate boundaries of complex clothing images cannot always be obtained by using the existing deep learning-based methods.Th...Due to the lack of long-range association and spatial location information,fine details and accurate boundaries of complex clothing images cannot always be obtained by using the existing deep learning-based methods.This paper presents a convolutional structure with multi-scale fusion to optimize the step of clothing feature extraction and a self-attention module to capture long-range association information.The structure enables the self-attention mechanism to directly participate in the process of information exchange through the down-scaling projection operation of the multi-scale framework.In addition,the improved self-attention module introduces the extraction of 2-dimensional relative position information to make up for its lack of ability to extract spatial position features from clothing images.The experimental results based on the colorful fashion parsing dataset(CFPD)show that the proposed network structure achieves 53.68%mean intersection over union(mIoU)and has better performance on the clothing parsing task.展开更多
Fault diagnosis is important for maintaining the safety and effectiveness of chemical process.Considering the multivariate,nonlinear,and dynamic characteristic of chemical process,many time-series-based data-driven fa...Fault diagnosis is important for maintaining the safety and effectiveness of chemical process.Considering the multivariate,nonlinear,and dynamic characteristic of chemical process,many time-series-based data-driven fault diagnosis methods have been developed in recent years.However,the existing methods have the problem of long-term dependency and are difficult to train due to the sequential way of training.To overcome these problems,a novel fault diagnosis method based on time-series and the hierarchical multihead self-attention(HMSAN)is proposed for chemical process.First,a sliding window strategy is adopted to construct the normalized time-series dataset.Second,the HMSAN is developed to extract the time-relevant features from the time-series process data.It improves the basic self-attention model in both width and depth.With the multihead structure,the HMSAN can pay attention to different aspects of the complicated chemical process and obtain the global dynamic features.However,the multiple heads in parallel lead to redundant information,which cannot improve the diagnosis performance.With the hierarchical structure,the redundant information is reduced and the deep local time-related features are further extracted.Besides,a novel many-to-one training strategy is introduced for HMSAN to simplify the training procedure and capture the long-term dependency.Finally,the effectiveness of the proposed method is demonstrated by two chemical cases.The experimental results show that the proposed method achieves a great performance on time-series industrial data and outperforms the state-of-the-art approaches.展开更多
False data injection attack(FDIA)can affect the state estimation of the power grid by tampering with the measured value of the power grid data,and then destroying the stable operation of the smart grid.Existing work u...False data injection attack(FDIA)can affect the state estimation of the power grid by tampering with the measured value of the power grid data,and then destroying the stable operation of the smart grid.Existing work usually trains a detection model by fusing the data-driven features from diverse power data streams.Data-driven features,however,cannot effectively capture the differences between noisy data and attack samples.As a result,slight noise disturbances in the power grid may cause a large number of false detections for FDIA attacks.To address this problem,this paper designs a deep collaborative self-attention network to achieve robust FDIA detection,in which the spatio-temporal features of cascaded FDIA attacks are fully integrated.Firstly,a high-order Chebyshev polynomials-based graph convolution module is designed to effectively aggregate the spatio information between grid nodes,and the spatial self-attention mechanism is involved to dynamically assign attention weights to each node,which guides the network to pay more attention to the node information that is conducive to FDIA detection.Furthermore,the bi-directional Long Short-Term Memory(LSTM)network is introduced to conduct time series modeling and long-term dependence analysis for power grid data and utilizes the temporal selfattention mechanism to describe the time correlation of data and assign different weights to different time steps.Our designed deep collaborative network can effectively mine subtle perturbations from spatiotemporal feature information,efficiently distinguish power grid noise from FDIA attacks,and adapt to diverse attack intensities.Extensive experiments demonstrate that our method can obtain an efficient detection performance over actual load data from New York Independent System Operator(NYISO)in IEEE 14,IEEE 39,and IEEE 118 bus systems,and outperforms state-of-the-art FDIA detection schemes in terms of detection accuracy and robustness.展开更多
With the development of short video industry,video and bullet screen have become important ways to spread public opinions.Public attitudes can be timely obtained through emotional analysis on bullet screen,which can a...With the development of short video industry,video and bullet screen have become important ways to spread public opinions.Public attitudes can be timely obtained through emotional analysis on bullet screen,which can also reduce difficulties in management of online public opinions.A convolutional neural network model based on multi-head attention is proposed to solve the problem of how to effectively model relations among words and identify key words in emotion classification tasks with short text contents and lack of complete context information.Firstly,encode word positions so that order information of input sequences can be used by the model.Secondly,use a multi-head attention mechanism to obtain semantic expressions in different subspaces,effectively capture internal relevance and enhance dependent relationships among words,as well as highlight emotional weights of key emotional words.Then a dilated convolution is used to increase the receptive field and extract more features.On this basis,the above multi-attention mechanism is combined with a convolutional neural network to model and analyze the seven emotional categories of bullet screens.Testing from perspectives of model and dataset,experimental results can validate effectiveness of our approach.Finally,emotions of bullet screens are visualized to provide data supports for hot event controls and other fields.展开更多
Keyphrase greatly provides summarized and valuable information.This information can help us not only understand text semantics,but also organize and retrieve text content effectively.The task of automatically generati...Keyphrase greatly provides summarized and valuable information.This information can help us not only understand text semantics,but also organize and retrieve text content effectively.The task of automatically generating it has received considerable attention in recent decades.From the previous studies,we can see many workable solutions for obtaining keyphrases.One method is to divide the content to be summarized into multiple blocks of text,then we rank and select the most important content.The disadvantage of this method is that it cannot identify keyphrase that does not include in the text,let alone get the real semantic meaning hidden in the text.Another approach uses recurrent neural networks to generate keyphrases from the semantic aspects of the text,but the inherently sequential nature precludes parallelization within training examples,and distances have limitations on context dependencies.Previous works have demonstrated the benefits of the self-attention mechanism,which can learn global text dependency features and can be parallelized.Inspired by the above observation,we propose a keyphrase generation model,which is based entirely on the self-attention mechanism.It is an encoder-decoder model that can make up the above disadvantage effectively.In addition,we also consider the semantic similarity between keyphrases,and add semantic similarity processing module into the model.This proposed model,which is demonstrated by empirical analysis on five datasets,can achieve competitive performance compared to baseline methods.展开更多
Due to the rapid evolution of Advanced Persistent Threats(APTs)attacks,the emergence of new and rare attack samples,and even those never seen before,make it challenging for traditional rule-based detection methods to ...Due to the rapid evolution of Advanced Persistent Threats(APTs)attacks,the emergence of new and rare attack samples,and even those never seen before,make it challenging for traditional rule-based detection methods to extract universal rules for effective detection.With the progress in techniques such as transfer learning and meta-learning,few-shot network attack detection has progressed.However,challenges in few-shot network attack detection arise from the inability of time sequence flow features to adapt to the fixed length input requirement of deep learning,difficulties in capturing rich information from original flow in the case of insufficient samples,and the challenge of high-level abstract representation.To address these challenges,a few-shot network attack detection based on NFHP(Network Flow Holographic Picture)-RN(ResNet)is proposed.Specifically,leveraging inherent properties of images such as translation invariance,rotation invariance,scale invariance,and illumination invariance,network attack traffic features and contextual relationships are intuitively represented in NFHP.In addition,an improved RN network model is employed for high-level abstract feature extraction,ensuring that the extracted high-level abstract features maintain the detailed characteristics of the original traffic behavior,regardless of changes in background traffic.Finally,a meta-learning model based on the self-attention mechanism is constructed,achieving the detection of novel APT few-shot network attacks through the empirical generalization of high-level abstract feature representations of known-class network attack behaviors.Experimental results demonstrate that the proposed method can learn high-level abstract features of network attacks across different traffic detail granularities.Comparedwith state-of-the-artmethods,it achieves favorable accuracy,precision,recall,and F1 scores for the identification of unknown-class network attacks through cross-validation onmultiple datasets.展开更多
Nowadays,with the rapid development of industrial Internet technology,on the one hand,advanced industrial control systems(ICS)have improved industrial production efficiency.However,there are more and more cyber-attack...Nowadays,with the rapid development of industrial Internet technology,on the one hand,advanced industrial control systems(ICS)have improved industrial production efficiency.However,there are more and more cyber-attacks targeting industrial control systems.To ensure the security of industrial networks,intrusion detection systems have been widely used in industrial control systems,and deep neural networks have always been an effective method for identifying cyber attacks.Current intrusion detection methods still suffer from low accuracy and a high false alarm rate.Therefore,it is important to build a more efficient intrusion detection model.This paper proposes a hybrid deep learning intrusion detection method based on convolutional neural networks and bidirectional long short-term memory neural networks(CNN-BiLSTM).To address the issue of imbalanced data within the dataset and improve the model’s detection capabilities,the Synthetic Minority Over-sampling Technique-Edited Nearest Neighbors(SMOTE-ENN)algorithm is applied in the preprocessing phase.This algorithm is employed to generate synthetic instances for the minority class,simultaneously mitigating the impact of noise in the majority class.This approach aims to create a more equitable distribution of classes,thereby enhancing the model’s ability to effectively identify patterns in both minority and majority classes.In the experimental phase,the detection performance of the method is verified using two data sets.Experimental results show that the accuracy rate on the CICIDS-2017 data set reaches 97.7%.On the natural gas pipeline dataset collected by Lan Turnipseed from Mississippi State University in the United States,the accuracy rate also reaches 85.5%.展开更多
Due to their robust learning and expression ability for complex features,the deep learning(DL)model plays a vital role in bearing fault diagnosis.However,since there are fewer labeled samples in fault diagnosis,the de...Due to their robust learning and expression ability for complex features,the deep learning(DL)model plays a vital role in bearing fault diagnosis.However,since there are fewer labeled samples in fault diagnosis,the depth of DL models in fault diagnosis is generally shallower than that of DL models in other fields,which limits the diagnostic performance.To solve this problem,a novel transfer residual Swin Transformer(RST)is proposed for rolling bearings in this paper.RST has 24 residual self-attention layers,which use the hierarchical design and the shifted window-based residual self-attention.Combined with transfer learning techniques,the transfer RST model uses pre-trained parameters from ImageNet.A new end-to-end method for fault diagnosis based on deep transfer RST is proposed.Firstly,wavelet transform transforms the vibration signal into a wavelet time-frequency diagram.The signal’s time-frequency domain representation can be represented simultaneously.Secondly,the wavelet time-frequency diagram is the input of the RST model to obtain the fault type.Finally,our method is verified on public and self-built datasets.Experimental results show the superior performance of our method by comparing it with a shallow neural network.展开更多
Emotional electroencephalography(EEG)signals are a primary means of recording emotional brain activity.Currently,the most effective methods for analyzing emotional EEG signals involve feature engineering and neural ne...Emotional electroencephalography(EEG)signals are a primary means of recording emotional brain activity.Currently,the most effective methods for analyzing emotional EEG signals involve feature engineering and neural networks.However,neural networks possess a strong ability for automatic feature extraction.Is it possible to discard feature engineering and directly employ neural networks for end-to-end recognition?Based on the characteristics of EEG signals,this paper proposes an end-to-end feature extraction and classification method for a dynamic self-attention network(DySAT).The study reveals significant differences in brain activity patterns associated with different emotions across various experimenters and time periods.The results of this experiment can provide insights into the reasons behind these differences.展开更多
The haze weather environment leads to the deterioration of the visual effect of the image,and it is difficult to carry out the work of the advanced vision task.Therefore,dehazing the haze image is an important step be...The haze weather environment leads to the deterioration of the visual effect of the image,and it is difficult to carry out the work of the advanced vision task.Therefore,dehazing the haze image is an important step before the execution of the advanced vision task.Traditional dehazing algorithms achieve image dehazing by improving image brightness and contrast or constructing artificial priors such as color attenuation priors and dark channel priors.However,the effect is unstable when dealing with complex scenes.In the method based on convolutional neural network,the image dehazing network of the encoding and decoding structure does not consider the difference before and after the dehazing image,and the image spatial information is lost in the encoding stage.In order to overcome these problems,this paper proposes a novel end-to-end two-stream convolutional neural network for single-image dehazing.The network model is composed of a spatial information feature stream and a highlevel semantic feature stream.The spatial information feature stream retains the detailed information of the dehazing image,and the high-level semantic feature stream extracts the multi-scale structural features of the dehazing image.A spatial information auxiliary module is designed and placed between the feature streams.This module uses the attention mechanism to construct a unified expression of different types of information and realizes the gradual restoration of the clear image with the semantic information auxiliary spatial information in the dehazing network.A parallel residual twicing module is proposed,which performs dehazing on the difference information of features at different stages to improve the model’s ability to discriminate haze images.The peak signal-to-noise ratio(PSNR)and structural similarity are used to quantitatively evaluate the similarity between the dehazing results of each algorithm and the original image.The structure similarity and PSNR of the method in this paper reached 0.852 and 17.557dB on the HazeRD dataset,which were higher than existing comparison algorithms.On the SOTS dataset,the indicators are 0.955 and 27.348dB,which are sub-optimal results.In experiments with real haze images,this method can also achieve excellent visual restoration effects.The experimental results show that the model proposed in this paper can restore desired visual effects without fog images,and it also has good generalization performance in real haze scenes.展开更多
Circular RNAs(circRNAs)are RNAs with closed circular structure involved in many biological processes by key interactions with RNA binding proteins(RBPs).Existing methods for predicting these interactions have limitati...Circular RNAs(circRNAs)are RNAs with closed circular structure involved in many biological processes by key interactions with RNA binding proteins(RBPs).Existing methods for predicting these interactions have limitations in feature learning.In view of this,we propose a method named circ2CBA,which uses only sequence information of circRNAs to predict circRNA-RBP binding sites.We have constructed a data set which includes eight sub-datasets.First,circ2CBA encodes circRNA sequences using the one-hot method.Next,a two-layer convolutional neural network(CNN)is used to initially extract the features.After CNN,circ2CBA uses a layer of bidirectional long and short-term memory network(BiLSTM)and the self-attention mechanism to learn the features.The AUC value of circ2CBA reaches 0.8987.Comparison of circ2CBA with other three methods on our data set and an ablation experiment confirm that circ2CBA is an effective method to predict the binding sites between circRNAs and RBPs.展开更多
LIDAR point cloud-based 3D object detection aims to sense the surrounding environment by anchoring objects with the Bounding Box(BBox).However,under the three-dimensional space of autonomous driving scenes,the previou...LIDAR point cloud-based 3D object detection aims to sense the surrounding environment by anchoring objects with the Bounding Box(BBox).However,under the three-dimensional space of autonomous driving scenes,the previous object detection methods,due to the pre-processing of the original LIDAR point cloud into voxels or pillars,lose the coordinate information of the original point cloud,slow detection speed,and gain inaccurate bounding box positioning.To address the issues above,this study proposes a new two-stage network structure to extract point cloud features directly by PointNet++,which effectively preserves the original point cloud coordinate information.To improve the detection accuracy,a shell-based modeling method is proposed.It roughly determines which spherical shell the coordinates belong to.Then,the results are refined to ground truth,thereby narrowing the localization range and improving the detection accuracy.To improve the recall of 3D object detection with bounding boxes,this paper designs a self-attention module for 3D object detection with a skip connection structure.Some of these features are highlighted by weighting them on the feature dimensions.After training,it makes the feature weights that are favorable for object detection get larger.Thus,the extracted features are more adapted to the object detection task.Extensive comparison experiments and ablation experiments conducted on the KITTI dataset verify the effectiveness of our proposed method in improving recall and precision.展开更多
The production data in the industrialfield have the characteristics of multimodality,high dimensionality and large correlation differences between attributes.Existing data prediction methods cannot effectively capture ...The production data in the industrialfield have the characteristics of multimodality,high dimensionality and large correlation differences between attributes.Existing data prediction methods cannot effectively capture time series and modal features,which leads to prediction hysteresis and poor prediction stabil-ity.Aiming at the above problems,this paper proposes a time-series and modal fea-tureenhancementmethodbasedonadual-stageself-attentionmechanism(DATT),and a time series prediction method based on a gated feedforward recurrent unit(GFRU).On this basis,the DATT-GFRU neural network with a gated feedforward recurrent neural network and dual-stage self-attention mechanism is designed and implemented.Experiments show that the prediction effect of the neural network prediction model based on DATT is significantly improved.Compared with the traditional prediction model,the DATT-GFRU neural network has a smaller aver-age error of model prediction results,stable prediction performance,and strong generalization ability on the three datasets with different numbers of attributes and different training sample sizes.展开更多
To fully make use of information from different representation subspaces,a multi-head attention-based long short-term memory(LSTM)model is proposed in this study for speech emotion recognition(SER).The proposed model ...To fully make use of information from different representation subspaces,a multi-head attention-based long short-term memory(LSTM)model is proposed in this study for speech emotion recognition(SER).The proposed model uses frame-level features and takes the temporal information of emotion speech as the input of the LSTM layer.Here,a multi-head time-dimension attention(MHTA)layer was employed to linearly project the output of the LSTM layer into different subspaces for the reduced-dimension context vectors.To provide relative vital information from other dimensions,the output of MHTA,the output of feature-dimension attention,and the last time-step output of LSTM were utilized to form multiple context vectors as the input of the fully connected layer.To improve the performance of multiple vectors,feature-dimension attention was employed for the all-time output of the first LSTM layer.The proposed model was evaluated on the eNTERFACE and GEMEP corpora,respectively.The results indicate that the proposed model outperforms LSTM by 14.6%and 10.5%for eNTERFACE and GEMEP,respectively,proving the effectiveness of the proposed model in SER tasks.展开更多
Pedestrian attribute recognition is often considered as a multi-label image classification task. In order to make full use of attribute-related location information, a saliency guided self-attention network(SGSA-Net) ...Pedestrian attribute recognition is often considered as a multi-label image classification task. In order to make full use of attribute-related location information, a saliency guided self-attention network(SGSA-Net) was proposed to weakly supervise attribute localization, without annotations of attribute-related regions. Saliency priors were integrated into the spatial attention module(SAM). Meanwhile, channel-wise attention and spatial attention were introduced into the network. Moreover, a weighted binary cross-entropy loss(WCEL) function was employed to handle the imbalance of training data. Extensive experiments on richly annotated pedestrian(RAP) and pedestrian attribute(PETA) datasets demonstrated that SGSA-Net outperformed other state-of-the-art methods.展开更多
Purpose-Clothing patterns play a dominant role in costume design and have become an important link in the perception of costume art.Conventional clothing patterns design relies on experienced designers.Although the qu...Purpose-Clothing patterns play a dominant role in costume design and have become an important link in the perception of costume art.Conventional clothing patterns design relies on experienced designers.Although the quality of clothing patterns is very high on conventional design,the input time and output amount ratio is relative low for conventional design.In order to break through the bottleneck of conventional clothing patterns design,this paper proposes a novel way based on generative adversarial network(GAN)model for automatic clothing patterns generation,which not only reduces the dependence of experienced designer,but also improve the input-output ratio.Design/methodology/approach-In view of the fact that clothing patterns have high requirements for global artistic perception and local texture details,this paper improves the conventional GAN model from two aspects:a multi-scales discriminators strategy is introduced to deal with the local texture details;and the selfattention mechanism is introduced to improve the global artistic perception.Therefore,the improved GAN called multi-scales self-attention improved generative adversarial network(MS-SA-GAN)model,which is used for high resolution clothing patterns generation.Findings-To verify the feasibility and effectiveness of the proposed MS-SA-GAN model,a crawler is designed to acquire standard clothing patterns dataset from Baidu pictures,and a comparative experiment is conducted on our designed clothing patterns dataset.In experiments,we have adjusted different parameters of the proposed MS-SA-GAN model,and compared the global artistic perception and local texture details of the generated clothing patterns.Originality/value-Experimental results have shown that the clothing patterns generated by the proposed MS-SA-GANmodel are superior to the conventional algorithms in some local texture detail indexes.In addition,a group of clothing design professionals is invited to evaluate the global artistic perception through a valencearousal scale.The scale results have shown that the proposed MS-SA-GAN model achieves a better global art perception.展开更多
With rapid economic development,the per capita ownership of automobiles in our country has begun to rise year by year.More researchers have paid attention to using scientific methods to solve traffic flow problems.Tra...With rapid economic development,the per capita ownership of automobiles in our country has begun to rise year by year.More researchers have paid attention to using scientific methods to solve traffic flow problems.Traffic flow prediction is not simply affected by the number of vehicles,but also contains various complex factors,such as time,road conditions,and people flow.However,the existing methods ignore the complexity of road conditions and the correlation between individual nodes,which leads to the poor performance.In this study,a deep learning model SAMGCN is proposed to effectively capture the correlation between individual nodes to improve the performance of traffic flow prediction.First,the theory of spatiotemporal decoupling is used to divide each time of each node into finer particles.Second,multimodule fusion is used to mine the potential periodic relationships in the data.Finally,GRU is used to obtain the potential time relationship of the three modules.Extensive experiments were conducted on two traffic flow datasets,PeMS04 and PeMS08 in the Caltrans Performance Measurement System to prove the validity of the proposed model.展开更多
The game of Tibetan Go faces the scarcity of expert knowledge and research literature.Therefore,we study the zero learning model of Tibetan Go under limited computing power resources and propose a novel scaleinvariant...The game of Tibetan Go faces the scarcity of expert knowledge and research literature.Therefore,we study the zero learning model of Tibetan Go under limited computing power resources and propose a novel scaleinvariant U-Net style two-headed output lightweight network TibetanGoTinyNet.The lightweight convolutional neural networks and capsule structure are applied to the encoder and decoder of TibetanGoTinyNet to reduce computational burden and achieve better feature extraction results.Several autonomous self-attention mechanisms are integrated into TibetanGoTinyNet to capture the Tibetan Go board’s spatial and global information and select important channels.The training data are generated entirely from self-play games.TibetanGoTinyNet achieves 62%–78%winning rate against other four U-Net style models including Res-UNet,Res-UNet Attention,Ghost-UNet,and Ghost Capsule-UNet.It also achieves 75%winning rate in the ablation experiments on the attention mechanism with embedded positional information.The model saves about 33%of the training time with 45%–50%winning rate for different Monte–Carlo tree search(MCTS)simulation counts when migrated from 9×9 to 11×11 boards.Code for our model is available at https://github.com/paulzyy/TibetanGoTinyNet.展开更多
Methanol-to-olefins,as a promising non-oil pathway for the synthesis of light olefins,has been successfully industrialized.The accurate prediction of process variables can yield significant benefits for advanced proce...Methanol-to-olefins,as a promising non-oil pathway for the synthesis of light olefins,has been successfully industrialized.The accurate prediction of process variables can yield significant benefits for advanced process control and optimization.The challenge of this task is underscored by the failure of traditional methods in capturing the complex characteristics of industrial processes,such as high nonlinearities,dynamics,and data distribution shift caused by diverse operating conditions.In this paper,we propose a novel hybrid spatial-temporal deep learning prediction model to address these issues.Firstly,a unique data normalization technique called reversible instance normalization is employed to solve the problem of different data distributions.Subsequently,convolutional neural network integrated with the self-attention mechanism are utilized to extract the temporal patterns.Meanwhile,a multi-graph convolutional network is leveraged to model the spatial interactions.Afterward,the extracted temporal and spatial features are fused as input into a fully connected neural network to complete the prediction.Finally,the outputs are denormalized to obtain the ultimate results.The monitoring results of the dynamic trends of process variables in an actual industrial methanol-to-olefins process demonstrate that our model not only achieves superior prediction performance but also can reveal complex spatial-temporal relationships using the learned attention matrices and adjacency matrices,making the model more interpretable.Lastly,this model is deployed onto an end-to-end Industrial Internet Platform,which achieves effective practical results.展开更多
Recently, facial-expression recognition (FER)has primarily focused on images in the wild, includingfactors such as face occlusion and image blurring, ratherthan laboratory images. Complex field environmentshave introd...Recently, facial-expression recognition (FER)has primarily focused on images in the wild, includingfactors such as face occlusion and image blurring, ratherthan laboratory images. Complex field environmentshave introduced new challenges to FER. To addressthese challenges, this study proposes a cross-fusion dualattention network. The network comprises three parts:(1) a cross-fusion grouped dual-attention mechanism torefine local features and obtain global information;(2) aproposed C2 activation function construction method,which is a piecewise cubic polynomial with threedegrees of freedom, requiring less computation withimproved flexibility and recognition abilities, whichcan better address slow running speeds and neuroninactivation problems;and (3) a closed-loop operationbetween the self-attention distillation process andresidual connections to suppress redundant informationand improve the generalization ability of the model.The recognition accuracies on the RAF-DB, FERPlus,and AffectNet datasets were 92.78%, 92.02%, and63.58%, respectively. Experiments show that this modelcan provide more effective solutions for FER tasks.展开更多
文摘Due to the lack of long-range association and spatial location information,fine details and accurate boundaries of complex clothing images cannot always be obtained by using the existing deep learning-based methods.This paper presents a convolutional structure with multi-scale fusion to optimize the step of clothing feature extraction and a self-attention module to capture long-range association information.The structure enables the self-attention mechanism to directly participate in the process of information exchange through the down-scaling projection operation of the multi-scale framework.In addition,the improved self-attention module introduces the extraction of 2-dimensional relative position information to make up for its lack of ability to extract spatial position features from clothing images.The experimental results based on the colorful fashion parsing dataset(CFPD)show that the proposed network structure achieves 53.68%mean intersection over union(mIoU)and has better performance on the clothing parsing task.
基金supported by the National Natural Science Foundation of China(62073140,62073141)the Shanghai Rising-Star Program(21QA1401800).
文摘Fault diagnosis is important for maintaining the safety and effectiveness of chemical process.Considering the multivariate,nonlinear,and dynamic characteristic of chemical process,many time-series-based data-driven fault diagnosis methods have been developed in recent years.However,the existing methods have the problem of long-term dependency and are difficult to train due to the sequential way of training.To overcome these problems,a novel fault diagnosis method based on time-series and the hierarchical multihead self-attention(HMSAN)is proposed for chemical process.First,a sliding window strategy is adopted to construct the normalized time-series dataset.Second,the HMSAN is developed to extract the time-relevant features from the time-series process data.It improves the basic self-attention model in both width and depth.With the multihead structure,the HMSAN can pay attention to different aspects of the complicated chemical process and obtain the global dynamic features.However,the multiple heads in parallel lead to redundant information,which cannot improve the diagnosis performance.With the hierarchical structure,the redundant information is reduced and the deep local time-related features are further extracted.Besides,a novel many-to-one training strategy is introduced for HMSAN to simplify the training procedure and capture the long-term dependency.Finally,the effectiveness of the proposed method is demonstrated by two chemical cases.The experimental results show that the proposed method achieves a great performance on time-series industrial data and outperforms the state-of-the-art approaches.
基金supported in part by the Research Fund of Guangxi Key Lab of Multi-Source Information Mining&Security(MIMS21-M-02).
文摘False data injection attack(FDIA)can affect the state estimation of the power grid by tampering with the measured value of the power grid data,and then destroying the stable operation of the smart grid.Existing work usually trains a detection model by fusing the data-driven features from diverse power data streams.Data-driven features,however,cannot effectively capture the differences between noisy data and attack samples.As a result,slight noise disturbances in the power grid may cause a large number of false detections for FDIA attacks.To address this problem,this paper designs a deep collaborative self-attention network to achieve robust FDIA detection,in which the spatio-temporal features of cascaded FDIA attacks are fully integrated.Firstly,a high-order Chebyshev polynomials-based graph convolution module is designed to effectively aggregate the spatio information between grid nodes,and the spatial self-attention mechanism is involved to dynamically assign attention weights to each node,which guides the network to pay more attention to the node information that is conducive to FDIA detection.Furthermore,the bi-directional Long Short-Term Memory(LSTM)network is introduced to conduct time series modeling and long-term dependence analysis for power grid data and utilizes the temporal selfattention mechanism to describe the time correlation of data and assign different weights to different time steps.Our designed deep collaborative network can effectively mine subtle perturbations from spatiotemporal feature information,efficiently distinguish power grid noise from FDIA attacks,and adapt to diverse attack intensities.Extensive experiments demonstrate that our method can obtain an efficient detection performance over actual load data from New York Independent System Operator(NYISO)in IEEE 14,IEEE 39,and IEEE 118 bus systems,and outperforms state-of-the-art FDIA detection schemes in terms of detection accuracy and robustness.
基金National Natural Science Foundation of China(No.61562057)Gansu Science and Technology Plan Project(No.18JR3RA104)。
文摘With the development of short video industry,video and bullet screen have become important ways to spread public opinions.Public attitudes can be timely obtained through emotional analysis on bullet screen,which can also reduce difficulties in management of online public opinions.A convolutional neural network model based on multi-head attention is proposed to solve the problem of how to effectively model relations among words and identify key words in emotion classification tasks with short text contents and lack of complete context information.Firstly,encode word positions so that order information of input sequences can be used by the model.Secondly,use a multi-head attention mechanism to obtain semantic expressions in different subspaces,effectively capture internal relevance and enhance dependent relationships among words,as well as highlight emotional weights of key emotional words.Then a dilated convolution is used to increase the receptive field and extract more features.On this basis,the above multi-attention mechanism is combined with a convolutional neural network to model and analyze the seven emotional categories of bullet screens.Testing from perspectives of model and dataset,experimental results can validate effectiveness of our approach.Finally,emotions of bullet screens are visualized to provide data supports for hot event controls and other fields.
文摘Keyphrase greatly provides summarized and valuable information.This information can help us not only understand text semantics,but also organize and retrieve text content effectively.The task of automatically generating it has received considerable attention in recent decades.From the previous studies,we can see many workable solutions for obtaining keyphrases.One method is to divide the content to be summarized into multiple blocks of text,then we rank and select the most important content.The disadvantage of this method is that it cannot identify keyphrase that does not include in the text,let alone get the real semantic meaning hidden in the text.Another approach uses recurrent neural networks to generate keyphrases from the semantic aspects of the text,but the inherently sequential nature precludes parallelization within training examples,and distances have limitations on context dependencies.Previous works have demonstrated the benefits of the self-attention mechanism,which can learn global text dependency features and can be parallelized.Inspired by the above observation,we propose a keyphrase generation model,which is based entirely on the self-attention mechanism.It is an encoder-decoder model that can make up the above disadvantage effectively.In addition,we also consider the semantic similarity between keyphrases,and add semantic similarity processing module into the model.This proposed model,which is demonstrated by empirical analysis on five datasets,can achieve competitive performance compared to baseline methods.
基金supported by the National Natural Science Foundation of China(Nos.U19A208162202320)+2 种基金the Fundamental Research Funds for the Central Universities(No.SCU2023D008)the Science and Engineering Connotation Development Project of Sichuan University(No.2020SCUNG129)the Key Laboratory of Data Protection and Intelligent Management(Sichuan University),Ministry of Education.
文摘Due to the rapid evolution of Advanced Persistent Threats(APTs)attacks,the emergence of new and rare attack samples,and even those never seen before,make it challenging for traditional rule-based detection methods to extract universal rules for effective detection.With the progress in techniques such as transfer learning and meta-learning,few-shot network attack detection has progressed.However,challenges in few-shot network attack detection arise from the inability of time sequence flow features to adapt to the fixed length input requirement of deep learning,difficulties in capturing rich information from original flow in the case of insufficient samples,and the challenge of high-level abstract representation.To address these challenges,a few-shot network attack detection based on NFHP(Network Flow Holographic Picture)-RN(ResNet)is proposed.Specifically,leveraging inherent properties of images such as translation invariance,rotation invariance,scale invariance,and illumination invariance,network attack traffic features and contextual relationships are intuitively represented in NFHP.In addition,an improved RN network model is employed for high-level abstract feature extraction,ensuring that the extracted high-level abstract features maintain the detailed characteristics of the original traffic behavior,regardless of changes in background traffic.Finally,a meta-learning model based on the self-attention mechanism is constructed,achieving the detection of novel APT few-shot network attacks through the empirical generalization of high-level abstract feature representations of known-class network attack behaviors.Experimental results demonstrate that the proposed method can learn high-level abstract features of network attacks across different traffic detail granularities.Comparedwith state-of-the-artmethods,it achieves favorable accuracy,precision,recall,and F1 scores for the identification of unknown-class network attacks through cross-validation onmultiple datasets.
基金support from the Liaoning Province Nature Fund Project(No.2022-MS-291)the Scientific Research Project of Liaoning Province Education Department(LJKMZ20220781,LJKMZ20220783,LJKQZ20222457,JYTMS20231488).
文摘Nowadays,with the rapid development of industrial Internet technology,on the one hand,advanced industrial control systems(ICS)have improved industrial production efficiency.However,there are more and more cyber-attacks targeting industrial control systems.To ensure the security of industrial networks,intrusion detection systems have been widely used in industrial control systems,and deep neural networks have always been an effective method for identifying cyber attacks.Current intrusion detection methods still suffer from low accuracy and a high false alarm rate.Therefore,it is important to build a more efficient intrusion detection model.This paper proposes a hybrid deep learning intrusion detection method based on convolutional neural networks and bidirectional long short-term memory neural networks(CNN-BiLSTM).To address the issue of imbalanced data within the dataset and improve the model’s detection capabilities,the Synthetic Minority Over-sampling Technique-Edited Nearest Neighbors(SMOTE-ENN)algorithm is applied in the preprocessing phase.This algorithm is employed to generate synthetic instances for the minority class,simultaneously mitigating the impact of noise in the majority class.This approach aims to create a more equitable distribution of classes,thereby enhancing the model’s ability to effectively identify patterns in both minority and majority classes.In the experimental phase,the detection performance of the method is verified using two data sets.Experimental results show that the accuracy rate on the CICIDS-2017 data set reaches 97.7%.On the natural gas pipeline dataset collected by Lan Turnipseed from Mississippi State University in the United States,the accuracy rate also reaches 85.5%.
基金supported in part by the National Natural Science Foundation of China(General Program)under Grants 62073193 and 61873333in part by the National Key Research and Development Project(General Program)under Grant 2020YFE0204900in part by the Key Research and Development Plan of Shandong Province(General Program)under Grant 2021CXGC010204.
文摘Due to their robust learning and expression ability for complex features,the deep learning(DL)model plays a vital role in bearing fault diagnosis.However,since there are fewer labeled samples in fault diagnosis,the depth of DL models in fault diagnosis is generally shallower than that of DL models in other fields,which limits the diagnostic performance.To solve this problem,a novel transfer residual Swin Transformer(RST)is proposed for rolling bearings in this paper.RST has 24 residual self-attention layers,which use the hierarchical design and the shifted window-based residual self-attention.Combined with transfer learning techniques,the transfer RST model uses pre-trained parameters from ImageNet.A new end-to-end method for fault diagnosis based on deep transfer RST is proposed.Firstly,wavelet transform transforms the vibration signal into a wavelet time-frequency diagram.The signal’s time-frequency domain representation can be represented simultaneously.Secondly,the wavelet time-frequency diagram is the input of the RST model to obtain the fault type.Finally,our method is verified on public and self-built datasets.Experimental results show the superior performance of our method by comparing it with a shallow neural network.
文摘Emotional electroencephalography(EEG)signals are a primary means of recording emotional brain activity.Currently,the most effective methods for analyzing emotional EEG signals involve feature engineering and neural networks.However,neural networks possess a strong ability for automatic feature extraction.Is it possible to discard feature engineering and directly employ neural networks for end-to-end recognition?Based on the characteristics of EEG signals,this paper proposes an end-to-end feature extraction and classification method for a dynamic self-attention network(DySAT).The study reveals significant differences in brain activity patterns associated with different emotions across various experimenters and time periods.The results of this experiment can provide insights into the reasons behind these differences.
基金supported by the National Natural Science Foundationof China under Grant No. 61803061, 61906026Innovation research groupof universities in Chongqing+4 种基金the Chongqing Natural Science Foundationunder Grant cstc2020jcyj-msxmX0577, cstc2020jcyj-msxmX0634“Chengdu-Chongqing Economic Circle” innovation funding of Chongqing Municipal Education Commission KJCXZD2020028the Science andTechnology Research Program of Chongqing Municipal Education Commission grants KJQN202000602Ministry of Education China MobileResearch Fund (MCM 20180404)Special key project of Chongqingtechnology innovation and application development: cstc2019jscxzdztzx0068.
文摘The haze weather environment leads to the deterioration of the visual effect of the image,and it is difficult to carry out the work of the advanced vision task.Therefore,dehazing the haze image is an important step before the execution of the advanced vision task.Traditional dehazing algorithms achieve image dehazing by improving image brightness and contrast or constructing artificial priors such as color attenuation priors and dark channel priors.However,the effect is unstable when dealing with complex scenes.In the method based on convolutional neural network,the image dehazing network of the encoding and decoding structure does not consider the difference before and after the dehazing image,and the image spatial information is lost in the encoding stage.In order to overcome these problems,this paper proposes a novel end-to-end two-stream convolutional neural network for single-image dehazing.The network model is composed of a spatial information feature stream and a highlevel semantic feature stream.The spatial information feature stream retains the detailed information of the dehazing image,and the high-level semantic feature stream extracts the multi-scale structural features of the dehazing image.A spatial information auxiliary module is designed and placed between the feature streams.This module uses the attention mechanism to construct a unified expression of different types of information and realizes the gradual restoration of the clear image with the semantic information auxiliary spatial information in the dehazing network.A parallel residual twicing module is proposed,which performs dehazing on the difference information of features at different stages to improve the model’s ability to discriminate haze images.The peak signal-to-noise ratio(PSNR)and structural similarity are used to quantitatively evaluate the similarity between the dehazing results of each algorithm and the original image.The structure similarity and PSNR of the method in this paper reached 0.852 and 17.557dB on the HazeRD dataset,which were higher than existing comparison algorithms.On the SOTS dataset,the indicators are 0.955 and 27.348dB,which are sub-optimal results.In experiments with real haze images,this method can also achieve excellent visual restoration effects.The experimental results show that the model proposed in this paper can restore desired visual effects without fog images,and it also has good generalization performance in real haze scenes.
基金supported by the National Natural Science Foundation of China(Grant Nos.61972451,61902230)the Fundamental Research Funds for the Central Universities,Shaanxi Normal University(GK202103091)。
文摘Circular RNAs(circRNAs)are RNAs with closed circular structure involved in many biological processes by key interactions with RNA binding proteins(RBPs).Existing methods for predicting these interactions have limitations in feature learning.In view of this,we propose a method named circ2CBA,which uses only sequence information of circRNAs to predict circRNA-RBP binding sites.We have constructed a data set which includes eight sub-datasets.First,circ2CBA encodes circRNA sequences using the one-hot method.Next,a two-layer convolutional neural network(CNN)is used to initially extract the features.After CNN,circ2CBA uses a layer of bidirectional long and short-term memory network(BiLSTM)and the self-attention mechanism to learn the features.The AUC value of circ2CBA reaches 0.8987.Comparison of circ2CBA with other three methods on our data set and an ablation experiment confirm that circ2CBA is an effective method to predict the binding sites between circRNAs and RBPs.
基金This work was supported,in part,by the National Nature Science Foundation of China under grant numbers 62272236in part,by the Natural Science Foundation of Jiangsu Province under grant numbers BK20201136,BK20191401in part,by the Priority Academic Program Development of Jiangsu Higher Education Institutions(PAPD)fund.
文摘LIDAR point cloud-based 3D object detection aims to sense the surrounding environment by anchoring objects with the Bounding Box(BBox).However,under the three-dimensional space of autonomous driving scenes,the previous object detection methods,due to the pre-processing of the original LIDAR point cloud into voxels or pillars,lose the coordinate information of the original point cloud,slow detection speed,and gain inaccurate bounding box positioning.To address the issues above,this study proposes a new two-stage network structure to extract point cloud features directly by PointNet++,which effectively preserves the original point cloud coordinate information.To improve the detection accuracy,a shell-based modeling method is proposed.It roughly determines which spherical shell the coordinates belong to.Then,the results are refined to ground truth,thereby narrowing the localization range and improving the detection accuracy.To improve the recall of 3D object detection with bounding boxes,this paper designs a self-attention module for 3D object detection with a skip connection structure.Some of these features are highlighted by weighting them on the feature dimensions.After training,it makes the feature weights that are favorable for object detection get larger.Thus,the extracted features are more adapted to the object detection task.Extensive comparison experiments and ablation experiments conducted on the KITTI dataset verify the effectiveness of our proposed method in improving recall and precision.
基金This work is financially supported by:The National Key R&D Program of China(No.2020YFB1712600)The Fundamental Research Funds for Central University(No.3072022QBZ0601)+1 种基金The National Natural Science Foundation of China(No.62272126)The National Natural Science Foundation of China(No.61872104).
文摘The production data in the industrialfield have the characteristics of multimodality,high dimensionality and large correlation differences between attributes.Existing data prediction methods cannot effectively capture time series and modal features,which leads to prediction hysteresis and poor prediction stabil-ity.Aiming at the above problems,this paper proposes a time-series and modal fea-tureenhancementmethodbasedonadual-stageself-attentionmechanism(DATT),and a time series prediction method based on a gated feedforward recurrent unit(GFRU).On this basis,the DATT-GFRU neural network with a gated feedforward recurrent neural network and dual-stage self-attention mechanism is designed and implemented.Experiments show that the prediction effect of the neural network prediction model based on DATT is significantly improved.Compared with the traditional prediction model,the DATT-GFRU neural network has a smaller aver-age error of model prediction results,stable prediction performance,and strong generalization ability on the three datasets with different numbers of attributes and different training sample sizes.
基金The National Natural Science Foundation of China(No.61571106,61633013,61673108,81871444).
文摘To fully make use of information from different representation subspaces,a multi-head attention-based long short-term memory(LSTM)model is proposed in this study for speech emotion recognition(SER).The proposed model uses frame-level features and takes the temporal information of emotion speech as the input of the LSTM layer.Here,a multi-head time-dimension attention(MHTA)layer was employed to linearly project the output of the LSTM layer into different subspaces for the reduced-dimension context vectors.To provide relative vital information from other dimensions,the output of MHTA,the output of feature-dimension attention,and the last time-step output of LSTM were utilized to form multiple context vectors as the input of the fully connected layer.To improve the performance of multiple vectors,feature-dimension attention was employed for the all-time output of the first LSTM layer.The proposed model was evaluated on the eNTERFACE and GEMEP corpora,respectively.The results indicate that the proposed model outperforms LSTM by 14.6%and 10.5%for eNTERFACE and GEMEP,respectively,proving the effectiveness of the proposed model in SER tasks.
基金supported by the National Natural Science Foundation of China (41874173)。
文摘Pedestrian attribute recognition is often considered as a multi-label image classification task. In order to make full use of attribute-related location information, a saliency guided self-attention network(SGSA-Net) was proposed to weakly supervise attribute localization, without annotations of attribute-related regions. Saliency priors were integrated into the spatial attention module(SAM). Meanwhile, channel-wise attention and spatial attention were introduced into the network. Moreover, a weighted binary cross-entropy loss(WCEL) function was employed to handle the imbalance of training data. Extensive experiments on richly annotated pedestrian(RAP) and pedestrian attribute(PETA) datasets demonstrated that SGSA-Net outperformed other state-of-the-art methods.
基金This paper is supported by university fund project of Hubei Institute of Fine Arts,named“The construction of blended teaching mode based on flipped classroom-Taking the Course of“Fashion Painting Illustration”as an Example.”(No.202028)。
文摘Purpose-Clothing patterns play a dominant role in costume design and have become an important link in the perception of costume art.Conventional clothing patterns design relies on experienced designers.Although the quality of clothing patterns is very high on conventional design,the input time and output amount ratio is relative low for conventional design.In order to break through the bottleneck of conventional clothing patterns design,this paper proposes a novel way based on generative adversarial network(GAN)model for automatic clothing patterns generation,which not only reduces the dependence of experienced designer,but also improve the input-output ratio.Design/methodology/approach-In view of the fact that clothing patterns have high requirements for global artistic perception and local texture details,this paper improves the conventional GAN model from two aspects:a multi-scales discriminators strategy is introduced to deal with the local texture details;and the selfattention mechanism is introduced to improve the global artistic perception.Therefore,the improved GAN called multi-scales self-attention improved generative adversarial network(MS-SA-GAN)model,which is used for high resolution clothing patterns generation.Findings-To verify the feasibility and effectiveness of the proposed MS-SA-GAN model,a crawler is designed to acquire standard clothing patterns dataset from Baidu pictures,and a comparative experiment is conducted on our designed clothing patterns dataset.In experiments,we have adjusted different parameters of the proposed MS-SA-GAN model,and compared the global artistic perception and local texture details of the generated clothing patterns.Originality/value-Experimental results have shown that the clothing patterns generated by the proposed MS-SA-GANmodel are superior to the conventional algorithms in some local texture detail indexes.In addition,a group of clothing design professionals is invited to evaluate the global artistic perception through a valencearousal scale.The scale results have shown that the proposed MS-SA-GAN model achieves a better global art perception.
基金supported by the National Key R&D Program of China under Grant No.2020YFB1710200the National Natural Science Foundation of China under Grant No.61872105 and No.62072136.
文摘With rapid economic development,the per capita ownership of automobiles in our country has begun to rise year by year.More researchers have paid attention to using scientific methods to solve traffic flow problems.Traffic flow prediction is not simply affected by the number of vehicles,but also contains various complex factors,such as time,road conditions,and people flow.However,the existing methods ignore the complexity of road conditions and the correlation between individual nodes,which leads to the poor performance.In this study,a deep learning model SAMGCN is proposed to effectively capture the correlation between individual nodes to improve the performance of traffic flow prediction.First,the theory of spatiotemporal decoupling is used to divide each time of each node into finer particles.Second,multimodule fusion is used to mine the potential periodic relationships in the data.Finally,GRU is used to obtain the potential time relationship of the three modules.Extensive experiments were conducted on two traffic flow datasets,PeMS04 and PeMS08 in the Caltrans Performance Measurement System to prove the validity of the proposed model.
基金the National Natural Science Foundation of China(Nos.62276285 and 62236011)the Major Projects of Social Science Fundation of China(No.20&ZD279)。
文摘The game of Tibetan Go faces the scarcity of expert knowledge and research literature.Therefore,we study the zero learning model of Tibetan Go under limited computing power resources and propose a novel scaleinvariant U-Net style two-headed output lightweight network TibetanGoTinyNet.The lightweight convolutional neural networks and capsule structure are applied to the encoder and decoder of TibetanGoTinyNet to reduce computational burden and achieve better feature extraction results.Several autonomous self-attention mechanisms are integrated into TibetanGoTinyNet to capture the Tibetan Go board’s spatial and global information and select important channels.The training data are generated entirely from self-play games.TibetanGoTinyNet achieves 62%–78%winning rate against other four U-Net style models including Res-UNet,Res-UNet Attention,Ghost-UNet,and Ghost Capsule-UNet.It also achieves 75%winning rate in the ablation experiments on the attention mechanism with embedded positional information.The model saves about 33%of the training time with 45%–50%winning rate for different Monte–Carlo tree search(MCTS)simulation counts when migrated from 9×9 to 11×11 boards.Code for our model is available at https://github.com/paulzyy/TibetanGoTinyNet.
基金the National Natural Science Foundation of China(Grant No.21991093)the Strategic Priority Research Program of Chinese Academy of Sciences(Grant No.XDA29050200)+1 种基金the Dalian Institute of Chemical Physics(DICP I202135)the Energy Science and Technology Revolution Project(Grant No.E2010412).
文摘Methanol-to-olefins,as a promising non-oil pathway for the synthesis of light olefins,has been successfully industrialized.The accurate prediction of process variables can yield significant benefits for advanced process control and optimization.The challenge of this task is underscored by the failure of traditional methods in capturing the complex characteristics of industrial processes,such as high nonlinearities,dynamics,and data distribution shift caused by diverse operating conditions.In this paper,we propose a novel hybrid spatial-temporal deep learning prediction model to address these issues.Firstly,a unique data normalization technique called reversible instance normalization is employed to solve the problem of different data distributions.Subsequently,convolutional neural network integrated with the self-attention mechanism are utilized to extract the temporal patterns.Meanwhile,a multi-graph convolutional network is leveraged to model the spatial interactions.Afterward,the extracted temporal and spatial features are fused as input into a fully connected neural network to complete the prediction.Finally,the outputs are denormalized to obtain the ultimate results.The monitoring results of the dynamic trends of process variables in an actual industrial methanol-to-olefins process demonstrate that our model not only achieves superior prediction performance but also can reveal complex spatial-temporal relationships using the learned attention matrices and adjacency matrices,making the model more interpretable.Lastly,this model is deployed onto an end-to-end Industrial Internet Platform,which achieves effective practical results.
基金supported in part by the National Natural Science Foundation of China under Grant Nos.62272281 and 62007017the Special Funds for Taishan Scholars Project under Grant No.tsqn202306274Youth Innovation Technology Project of the Higher School in Shandong Province under Grant No.2019KJN042.
文摘Recently, facial-expression recognition (FER)has primarily focused on images in the wild, includingfactors such as face occlusion and image blurring, ratherthan laboratory images. Complex field environmentshave introduced new challenges to FER. To addressthese challenges, this study proposes a cross-fusion dualattention network. The network comprises three parts:(1) a cross-fusion grouped dual-attention mechanism torefine local features and obtain global information;(2) aproposed C2 activation function construction method,which is a piecewise cubic polynomial with threedegrees of freedom, requiring less computation withimproved flexibility and recognition abilities, whichcan better address slow running speeds and neuroninactivation problems;and (3) a closed-loop operationbetween the self-attention distillation process andresidual connections to suppress redundant informationand improve the generalization ability of the model.The recognition accuracies on the RAF-DB, FERPlus,and AffectNet datasets were 92.78%, 92.02%, and63.58%, respectively. Experiments show that this modelcan provide more effective solutions for FER tasks.