Numerous works prove that existing neighbor-averaging graph neural networks(GNNs)cannot efficiently catch structure features,and many works show that injecting structure,distance,position,or spatial features can signi...Numerous works prove that existing neighbor-averaging graph neural networks(GNNs)cannot efficiently catch structure features,and many works show that injecting structure,distance,position,or spatial features can significantly improve the performance of GNNs,however,injecting high-level structure and distance into GNNs is an intuitive but untouched idea.This work sheds light on this issue and proposes a scheme to enhance graph attention networks(GATs)by encoding distance and hop-wise structure statistics.Firstly,the hop-wise structure and distributional distance information are extracted based on several hop-wise ego-nets of every target node.Secondly,the derived structure information,distance information,and intrinsic features are encoded into the same vector space and then added together to get initial embedding vectors.Thirdly,the derived embedding vectors are fed into GATs,such as GAT and adaptive graph diffusion network(AGDN)to get the soft labels.Fourthly,the soft labels are fed into correct and smooth(C&S)to conduct label propagation and get final predictions.Experiments show that the distance and hop-wise structures encoding enhanced graph attention networks(DHSEGATs)achieve a competitive result.展开更多
Automatic text summarization(ATS)plays a significant role in Natural Language Processing(NLP).Abstractive summarization produces summaries by identifying and compressing the most important information in a document.Ho...Automatic text summarization(ATS)plays a significant role in Natural Language Processing(NLP).Abstractive summarization produces summaries by identifying and compressing the most important information in a document.However,there are only relatively several comprehensively evaluated abstractive summarization models that work well for specific types of reports due to their unstructured and oral language text characteristics.In particular,Chinese complaint reports,generated by urban complainers and collected by government employees,describe existing resident problems in daily life.Meanwhile,the reflected problems are required to respond speedily.Therefore,automatic summarization tasks for these reports have been developed.However,similar to traditional summarization models,the generated summaries still exist problems of informativeness and conciseness.To address these issues and generate suitably informative and less redundant summaries,a topic-based abstractive summarization method is proposed to obtain global and local features.Additionally,a heterogeneous graph of the original document is constructed using word-level and topic-level features.Experiments and analyses on public review datasets(Yelp and Amazon)and our constructed dataset(Chinese complaint reports)show that the proposed framework effectively improves the performance of the abstractive summarization model for Chinese complaint reports.展开更多
Referring expressions comprehension is the task of locating the image region described by a natural language expression,which refer to the properties of the region or the relationships with other regions.Most previous...Referring expressions comprehension is the task of locating the image region described by a natural language expression,which refer to the properties of the region or the relationships with other regions.Most previous work handles this problem by selecting the most relevant regions from a set of candidate regions,when there are many candidate regions in the set these methods are inefficient.Inspired by recent success of image captioning by using deep learning methods,in this paper we proposed a framework to understand the referring expressions by multiple steps of reasoning.We present a model for referring expressions comprehension by selecting the most relevant region directly from the image.The core of our model is a recurrent attention network which can be seen as an extension of Memory Network.The proposed model capable of improving the results by multiple computational hops.We evaluate the proposed model on two referring expression datasets:Visual Genome and Flickr30k Entities.The experimental results demonstrate that the proposed model outperform previous state-of-the-art methods both in accuracy and efficiency.We also conduct an ablation experiment to show that the performance of the model is not getting better with the increase of the attention layers.展开更多
Social robot accounts controlled by artificial intelligence or humans are active in social networks,bringing negative impacts to network security and social life.Existing social robot detection methods based on graph ...Social robot accounts controlled by artificial intelligence or humans are active in social networks,bringing negative impacts to network security and social life.Existing social robot detection methods based on graph neural networks suffer from the problem of many social network nodes and complex relationships,which makes it difficult to accurately describe the difference between the topological relations of nodes,resulting in low detection accuracy of social robots.This paper proposes a social robot detection method with the use of an improved neural network.First,social relationship subgraphs are constructed by leveraging the user’s social network to disentangle intricate social relationships effectively.Then,a linear modulated graph attention residual network model is devised to extract the node and network topology features of the social relation subgraph,thereby generating comprehensive social relation subgraph features,and the feature-wise linear modulation module of the model can better learn the differences between the nodes.Next,user text content and behavioral gene sequences are extracted to construct social behavioral features combined with the social relationship subgraph features.Finally,social robots can be more accurately identified by combining user behavioral and relationship features.By carrying out experimental studies based on the publicly available datasets TwiBot-20 and Cresci-15,the suggested method’s detection accuracies can achieve 86.73%and 97.86%,respectively.Compared with the existing mainstream approaches,the accuracy of the proposed method is 2.2%and 1.35%higher on the two datasets.The results show that the method proposed in this paper can effectively detect social robots and maintain a healthy ecological environment of social networks.展开更多
Bone age assessment(BAA)helps doctors determine how a child’s bones grow and develop in clinical medicine.Traditional BAA methods rely on clinician expertise,leading to time-consuming predictions and inaccurate resul...Bone age assessment(BAA)helps doctors determine how a child’s bones grow and develop in clinical medicine.Traditional BAA methods rely on clinician expertise,leading to time-consuming predictions and inaccurate results.Most deep learning-based BAA methods feed the extracted critical points of images into the network by providing additional annotations.This operation is costly and subjective.To address these problems,we propose a multi-scale attentional densely connected network(MSADCN)in this paper.MSADCN constructs a multi-scale dense connectivity mechanism,which can avoid overfitting,obtain the local features effectively and prevent gradient vanishing even in limited training data.First,MSADCN designs multi-scale structures in the densely connected network to extract fine-grained features at different scales.Then,coordinate attention is embedded to focus on critical features and automatically locate the regions of interest(ROI)without additional annotation.In addition,to improve the model’s generalization,transfer learning is applied to train the proposed MSADCN on the public dataset IMDB-WIKI,and the obtained pre-trained weights are loaded onto the Radiological Society of North America(RSNA)dataset.Finally,label distribution learning(LDL)and expectation regression techniques are introduced into our model to exploit the correlation between hand bone images of different ages,which can obtain stable age estimates.Extensive experiments confirm that our model can converge more efficiently and obtain a mean absolute error(MAE)of 4.64 months,outperforming some state-of-the-art BAA methods.展开更多
The fluctuation of wind power affects the operating safety and power consumption of the electric power grid and restricts the grid connection of wind power on a large scale.Therefore,wind power forecasting plays a key...The fluctuation of wind power affects the operating safety and power consumption of the electric power grid and restricts the grid connection of wind power on a large scale.Therefore,wind power forecasting plays a key role in improving the safety and economic benefits of the power grid.This paper proposes a wind power predicting method based on a convolutional graph attention deep neural network with multi-wind farm data.Based on the graph attention network and attention mechanism,the method extracts spatial-temporal characteristics from the data of multiple wind farms.Then,combined with a deep neural network,a convolutional graph attention deep neural network model is constructed.Finally,the model is trained with the quantile regression loss function to achieve the wind power deterministic and probabilistic prediction based on multi-wind farm spatial-temporal data.A wind power dataset in the U.S.is taken as an example to demonstrate the efficacy of the proposed model.Compared with the selected baseline methods,the proposed model achieves the best prediction performance.The point prediction errors(i.e.,root mean square error(RMSE)and normalized mean absolute percentage error(NMAPE))are 0.304 MW and 1.177%,respectively.And the comprehensive performance of probabilistic prediction(i.e.,con-tinuously ranked probability score(CRPS))is 0.580.Thus,the significance of multi-wind farm data and spatial-temporal feature extraction module is self-evident.展开更多
We present a novel approach for the prediction of crystal material properties that is distinct from the computationally complex and expensive density functional theory(DFT)-based calculations.Instead,we utilize an att...We present a novel approach for the prediction of crystal material properties that is distinct from the computationally complex and expensive density functional theory(DFT)-based calculations.Instead,we utilize an attention-based graph neural network that yields high-accuracy predictions.Our approach employs two attention mechanisms that allow for message passing on the crystal graphs,which in turn enable the model to selectively attend to pertinent atoms and their local environments,thereby improving performance.We conduct comprehensive experiments to validate our approach,which demonstrates that our method surpasses existing methods in terms of predictive accuracy.Our results suggest that deep learning,particularly attention-based networks,holds significant promise for predicting crystal material properties,with implications for material discovery and the refined intelligent systems.展开更多
In recent years,the convolutional neural networks(CNNs)for single image super-resolution(SISR)are becoming more and more complex,and it is more challenging to improve the SISR performance.In contrast,the reference ima...In recent years,the convolutional neural networks(CNNs)for single image super-resolution(SISR)are becoming more and more complex,and it is more challenging to improve the SISR performance.In contrast,the reference image guided super-resolution(RefSR)is an effective strategy to boost the SR(super-resolution)performance.In RefSR,the introduced high-resolution(HR)references can facilitate the high-frequency residual prediction process.According to the best of our knowledge,the existing CNN-based RefSR methods treat the features from the references and the low-resolution(LR)input equally by simply concatenating them together.However,the HR references and the LR inputs contribute differently to the final SR results.Therefore,we propose a progressive channel attention network(PCANet)for RefSR.There are two technical contributions in this paper.First,we propose a novel channel attention module(CAM),which estimates the channel weighting parameter by weightedly averaging the spatial features instead of using global averaging.Second,considering that the residual prediction process can be improved when the LR input is enriched with more details,we perform super-resolution progressively,which can take advantage of the reference images in multi-scales.Extensive quantitative and qualitative evaluations on three benchmark datasets,which represent three typical scenarios for RefSR,demonstrate that our method is superior to the state-of-the-art SISR and RefSR methods in terms of PSNR(Peak Signal-to-Noise Ratio)and SSIM(Structural Similarity).展开更多
Continuous sign language recognition(CSLR)is challenging due to the complexity of video background,hand gesture variability,and temporal modeling difficulties.This work proposes a CSLR method based on a spatialtempora...Continuous sign language recognition(CSLR)is challenging due to the complexity of video background,hand gesture variability,and temporal modeling difficulties.This work proposes a CSLR method based on a spatialtemporal graph attention network to focus on essential features of video series.The method considers local details of sign language movements by taking the information on joints and bones as inputs and constructing a spatialtemporal graph to reflect inter-frame relevance and physical connections between nodes.The graph-based multihead attention mechanism is utilized with adjacent matrix calculation for better local-feature exploration,and short-term motion correlation modeling is completed via a temporal convolutional network.We adopted BLSTM to learn the long-termdependence and connectionist temporal classification to align the word-level sequences.The proposed method achieves competitive results regarding word error rates(1.59%)on the Chinese Sign Language dataset and the mean Jaccard Index(65.78%)on the ChaLearn LAP Continuous Gesture Dataset.展开更多
Accurate detection of pipeline leakage is essential to maintain the safety of pipeline transportation.Recently,deep learning(DL)has emerged as a promising tool for pipeline leakage detection(PLD).However,most existing...Accurate detection of pipeline leakage is essential to maintain the safety of pipeline transportation.Recently,deep learning(DL)has emerged as a promising tool for pipeline leakage detection(PLD).However,most existing DL methods have difficulty in achieving good performance in identifying leakage types due to the complex time dynamics of pipeline data.On the other hand,the initial parameter selection in the detection model is generally random,which may lead to unstable recognition performance.For this reason,a hybrid DL framework referred to as parameter-optimized recurrent attention network(PRAN)is presented in this paper to improve the accuracy of PLD.First,a parameter-optimized long short-term memory(LSTM)network is introduced to extract effective and robust features,which exploits a particle swarm optimization(PSO)algorithm with cross-entropy fitness function to search for globally optimal parameters.With this framework,the learning representation capability of the model is improved and the convergence rate is accelerated.Moreover,an anomaly-attention mechanism(AM)is proposed to discover class discriminative information by weighting the hidden states,which contributes to amplifying the normalabnormal distinguishable discrepancy,further improving the accuracy of PLD.After that,the proposed PRAN not only implements the adaptive optimization of network parameters,but also enlarges the contribution of normal-abnormal discrepancy,thereby overcoming the drawbacks of instability and poor generalization.Finally,the experimental results demonstrate the effectiveness and superiority of the proposed PRAN for PLD.展开更多
Tumour segmentation in medical images(especially 3D tumour segmentation)is highly challenging due to the possible similarity between tumours and adjacent tissues,occurrence of multiple tumours and variable tumour shap...Tumour segmentation in medical images(especially 3D tumour segmentation)is highly challenging due to the possible similarity between tumours and adjacent tissues,occurrence of multiple tumours and variable tumour shapes and sizes.The popular deep learning‐based segmentation algorithms generally rely on the convolutional neural network(CNN)and Transformer.The former cannot extract the global image features effectively while the latter lacks the inductive bias and involves the complicated computation for 3D volume data.The existing hybrid CNN‐Transformer network can only provide the limited performance improvement or even poorer segmentation performance than the pure CNN.To address these issues,a short‐term and long‐term memory self‐attention network is proposed.Firstly,a distinctive self‐attention block uses the Transformer to explore the correlation among the region features at different levels extracted by the CNN.Then,the memory structure filters and combines the above information to exclude the similar regions and detect the multiple tumours.Finally,the multi‐layer reconstruction blocks will predict the tumour boundaries.Experimental results demonstrate that our method outperforms other methods in terms of subjective visual and quantitative evaluation.Compared with the most competitive method,the proposed method provides Dice(82.4%vs.76.6%)and Hausdorff distance 95%(HD95)(10.66 vs.11.54 mm)on the KiTS19 as well as Dice(80.2%vs.78.4%)and HD95(9.632 vs.12.17 mm)on the LiTS.展开更多
In computer vision,object recognition and image categorization have proven to be difficult challenges.They have,nevertheless,generated responses to a wide range of difficult issues from a variety of fields.Convolution...In computer vision,object recognition and image categorization have proven to be difficult challenges.They have,nevertheless,generated responses to a wide range of difficult issues from a variety of fields.Convolution Neural Networks(CNNs)have recently been identified as the most widely proposed deep learning(DL)algorithms in the literature.CNNs have unquestionably delivered cutting-edge achievements,particularly in the areas of image classification,speech recognition,and video processing.However,it has been noticed that the CNN-training assignment demands a large amount of data,which is in low supply,especially in the medical industry,and as a result,the training process takes longer.In this paper,we describe an attentionaware CNN architecture for classifying chest X-ray images to diagnose Pneumonia in order to address the aforementioned difficulties.AttentionModules provide attention-aware properties to the Attention Network.The attentionaware features of various modules alter as the layers become deeper.Using a bottom-up top-down feedforward structure,the feedforward and feedback attention processes are integrated into a single feedforward process inside each attention module.In the present work,a deep neural network(DNN)is combined with an attention mechanism to test the prediction of Pneumonia disease using chest X-ray pictures.To produce attention-aware features,the suggested networkwas built by merging channel and spatial attentionmodules in DNN architecture.With this network,we worked on a publicly available Kaggle chest X-ray dataset.Extensive testing was carried out to validate the suggested model.In the experimental results,we attained an accuracy of 95.47%and an F-score of 0.92,indicating that the suggested model outperformed against the baseline models.展开更多
The natural language to SQL(NL2SQL)task is an emerging research area that aims to transform a natural language with a given database into an SQL query.The earlier approaches were to process the input into a heterogene...The natural language to SQL(NL2SQL)task is an emerging research area that aims to transform a natural language with a given database into an SQL query.The earlier approaches were to process the input into a heterogeneous graph.However,previous models failed to distinguish the types of multi-hop connections of the heterogeneous graph,which tended to ignore crucial semantic path information.To this end,a two-layer attention network is presented to focus on essential neighbor nodes and mine enlightening semantic paths for feature encoding.The weighted edge is introduced for schema linking to connect the nodes with semantic similarity.In the decoding phase,a rule-based pruning strategy is offered to refine the generated SQL queries.From the experimental results,the approach is shown to learn a good encoding representation and decode the representation to generate results with practical meaning.展开更多
Accurate pancreas segmentation is critical for the diagnosis and management of diseases of the pancreas. It is challenging to precisely delineate pancreas due to the highly variations in volume, shape and location. In...Accurate pancreas segmentation is critical for the diagnosis and management of diseases of the pancreas. It is challenging to precisely delineate pancreas due to the highly variations in volume, shape and location. In recent years, coarse-to-fine methods have been widely used to alleviate class imbalance issue and improve pancreas segmentation accuracy. However,cascaded methods could be computationally intensive and the refined results are significantly dependent on the performance of its coarse segmentation results. To balance the segmentation accuracy and computational efficiency, we propose a Discriminative Feature Attention Network for pancreas segmentation, to effectively highlight pancreas features and improve segmentation accuracy without explicit pancreas location. The final segmentation is obtained by applying a simple yet effective post-processing step. Two experiments on both public NIH pancreas CT dataset and abdominal BTCV multi-organ dataset are individually conducted to show the effectiveness of our method for 2 D pancreas segmentation. We obtained average Dice Similarity Coefficient(DSC) of 82.82±6.09%, average Jaccard Index(JI) of 71.13± 8.30% and average Symmetric Average Surface Distance(ASD) of 1.69 ± 0.83 mm on the NIH dataset. Compared to the existing deep learning-based pancreas segmentation methods, our experimental results achieve the best average DSC and JI value.展开更多
PM2.5 concentration prediction is of great significance to environmental protection and human health.Achieving accurate prediction of PM2.5 concentration has become an important research task.However,PM2.5 pollutants ...PM2.5 concentration prediction is of great significance to environmental protection and human health.Achieving accurate prediction of PM2.5 concentration has become an important research task.However,PM2.5 pollutants can spread in the earth’s atmosphere,causing mutual influence between different cities.To effectively capture the air pollution relationship between cities,this paper proposes a novel spatiotemporal model combining graph attention neural network(GAT)and gated recurrent unit(GRU),named GAT-GRU for PM2.5 concentration prediction.Specifically,GAT is used to learn the spatial dependence of PM2.5 concentration data in different cities,and GRU is to extract the temporal dependence of the long-term data series.The proposed model integrates the learned spatio-temporal dependencies to capture long-term complex spatio-temporal features.Considering that air pollution is related to the meteorological conditions of the city,the knowledge acquired from meteorological data is used in the model to enhance PM2.5 prediction performance.The input of the GAT-GRU model consists of PM2.5 concentration data and meteorological data.In order to verify the effectiveness of the proposed GAT-GRU prediction model,this paper designs experiments on real-world datasets compared with other baselines.Experimental results prove that our model achieves excellent performance in PM2.5 concentration prediction.展开更多
Traffic flow prediction is an important part of the intelligent transportation system. Accurate multi-step traffic flow prediction plays an important role in improving the operational efficiency of the traffic network...Traffic flow prediction is an important part of the intelligent transportation system. Accurate multi-step traffic flow prediction plays an important role in improving the operational efficiency of the traffic network. Since traffic flow data has complex spatio-temporal correlation and non-linearity, existing prediction methods are mainly accomplished through a combination of a Graph Convolutional Network (GCN) and a recurrent neural network. The combination strategy has an excellent performance in traffic prediction tasks. However, multi-step prediction error accumulates with the predicted step size. Some scholars use multiple sampling sequences to achieve more accurate prediction results. But it requires high hardware conditions and multiplied training time. Considering the spatiotemporal correlation of traffic flow and influence of external factors, we propose an Attention Based Spatio-Temporal Graph Convolutional Network considering External Factors (ABSTGCN-EF) for multi-step traffic flow prediction. This model models the traffic flow as diffusion on a digraph and extracts the spatial characteristics of traffic flow through GCN. We add meaningful time-slots attention to the encoder-decoder to form an Attention Encoder Network (AEN) to handle temporal correlation. The attention vector is used as a competitive choice to draw the correlation between predicted states and historical states. We considered the impact of three external factors (daytime, weekdays, and traffic accident markers) on the traffic flow prediction tasks. Experiments on two public data sets show that it makes sense to consider external factors. The prediction performance of our ABSTGCN-EF model achieves 7.2%–8.7% higher than the state-of-the-art baselines.展开更多
Taking Jiuhong Modern Agriculture Demonstration Park of Heilongjiang Province as the base for rice disease image acquisition,a total of 841 images of the four different diseases,including rice blast,stripe leaf blight...Taking Jiuhong Modern Agriculture Demonstration Park of Heilongjiang Province as the base for rice disease image acquisition,a total of 841 images of the four different diseases,including rice blast,stripe leaf blight,red blight and bacterial brown spot,were obtained.In this study,an interleaved attention neural network(IANN)was proposed to realize the recognition of rice disease images and an interleaved group convolutions(IGC)network was introduced to reduce the number of convolutional parameters,which realized the information interaction between channels.Based on the convolutional block attention module(CBAM),attention was paid to the features of results of the primary group convolution in the cross-group convolution to improve the classification performance of the deep learning model.The results showed that the classification accuracy of IANN was 96.14%,which was 4.72%higher than that of the classical convolutional neural network(CNN).This study showed a new idea for the efficient training of neural networks in the case of small samples and provided a reference for the image recognition and diagnosis of rice and other crop diseases.展开更多
In lung nodules there is a huge variation in structural properties like Shape, Surface Texture. Even the spatial properties vary, where they can be found attached to lung walls, blood vessels in complex non-homogenous...In lung nodules there is a huge variation in structural properties like Shape, Surface Texture. Even the spatial properties vary, where they can be found attached to lung walls, blood vessels in complex non-homogenous lung structures. Moreover, the nodules are of small size at their early stage of development. This poses a serious challenge to develop a Computer aided diagnosis (CAD) system with better false positive reduction. Hence, to reduce the false positives per scan and to deal with the challenges mentioned, this paper proposes a set of three diverse 3D Attention based CNN architectures (3D ACNN) whose predictions on given low dose Volumetric Computed Tomography (CT) scans are fused to achieve more effective and reliable results. Attention mechanism is employed to selectively concentrate/weigh more on nodule specific features and less weight age over other irrelevant features. By using this attention based mechanism in CNN unlike traditional methods there was a significant gain in the classification performance. Contextual dependencies are also taken into account by giving three patches of different sizes surrounding the nodule as input to the ACNN architectures. The system is trained and validated using a publicly available LUNA16 dataset in a 10 fold cross validation approach where a competition performance metric (CPM) score of 0.931 is achieved. The experimental results demonstrate that either a single patch or a single architecture in a one-to-one fashion that is adopted in earlier methods cannot achieve a better performance and signifies the necessity of fusing different multi patched architectures. Though the proposed system is mainly designed for pulmonary nodule detection it can be easily extended to classification tasks of any other 3D medical diagnostic computed tomography images where there is a huge variation and uncertainty in classification.展开更多
Facial Expression Recognition(FER)has been an interesting area of research in places where there is human-computer interaction.Human psychol-ogy,emotions and behaviors can be analyzed in FER.Classifiers used in FER hav...Facial Expression Recognition(FER)has been an interesting area of research in places where there is human-computer interaction.Human psychol-ogy,emotions and behaviors can be analyzed in FER.Classifiers used in FER have been perfect on normal faces but have been found to be constrained in occluded faces.Recently,Deep Learning Techniques(DLT)have gained popular-ity in applications of real-world problems including recognition of human emo-tions.The human face reflects emotional states and human intentions.An expression is the most natural and powerful way of communicating non-verbally.Systems which form communications between the two are termed Human Machine Interaction(HMI)systems.FER can improve HMI systems as human expressions convey useful information to an observer.This paper proposes a FER scheme called EECNN(Enhanced Convolution Neural Network with Atten-tion mechanism)to recognize seven types of human emotions with satisfying results in its experiments.Proposed EECNN achieved 89.8%accuracy in classi-fying the images.展开更多
文摘Numerous works prove that existing neighbor-averaging graph neural networks(GNNs)cannot efficiently catch structure features,and many works show that injecting structure,distance,position,or spatial features can significantly improve the performance of GNNs,however,injecting high-level structure and distance into GNNs is an intuitive but untouched idea.This work sheds light on this issue and proposes a scheme to enhance graph attention networks(GATs)by encoding distance and hop-wise structure statistics.Firstly,the hop-wise structure and distributional distance information are extracted based on several hop-wise ego-nets of every target node.Secondly,the derived structure information,distance information,and intrinsic features are encoded into the same vector space and then added together to get initial embedding vectors.Thirdly,the derived embedding vectors are fed into GATs,such as GAT and adaptive graph diffusion network(AGDN)to get the soft labels.Fourthly,the soft labels are fed into correct and smooth(C&S)to conduct label propagation and get final predictions.Experiments show that the distance and hop-wise structures encoding enhanced graph attention networks(DHSEGATs)achieve a competitive result.
基金supported byNationalNatural Science Foundation of China(52274205)and Project of Education Department of Liaoning Province(LJKZ0338).
文摘Automatic text summarization(ATS)plays a significant role in Natural Language Processing(NLP).Abstractive summarization produces summaries by identifying and compressing the most important information in a document.However,there are only relatively several comprehensively evaluated abstractive summarization models that work well for specific types of reports due to their unstructured and oral language text characteristics.In particular,Chinese complaint reports,generated by urban complainers and collected by government employees,describe existing resident problems in daily life.Meanwhile,the reflected problems are required to respond speedily.Therefore,automatic summarization tasks for these reports have been developed.However,similar to traditional summarization models,the generated summaries still exist problems of informativeness and conciseness.To address these issues and generate suitably informative and less redundant summaries,a topic-based abstractive summarization method is proposed to obtain global and local features.Additionally,a heterogeneous graph of the original document is constructed using word-level and topic-level features.Experiments and analyses on public review datasets(Yelp and Amazon)and our constructed dataset(Chinese complaint reports)show that the proposed framework effectively improves the performance of the abstractive summarization model for Chinese complaint reports.
基金This work was supported in part by audio-visual new media laboratory operation and maintenance of Academy of Broadcasting Science,Grant No.200304in part by the National Key Research and Development Program of China(Grant No.2019YFB1406201).
文摘Referring expressions comprehension is the task of locating the image region described by a natural language expression,which refer to the properties of the region or the relationships with other regions.Most previous work handles this problem by selecting the most relevant regions from a set of candidate regions,when there are many candidate regions in the set these methods are inefficient.Inspired by recent success of image captioning by using deep learning methods,in this paper we proposed a framework to understand the referring expressions by multiple steps of reasoning.We present a model for referring expressions comprehension by selecting the most relevant region directly from the image.The core of our model is a recurrent attention network which can be seen as an extension of Memory Network.The proposed model capable of improving the results by multiple computational hops.We evaluate the proposed model on two referring expression datasets:Visual Genome and Flickr30k Entities.The experimental results demonstrate that the proposed model outperform previous state-of-the-art methods both in accuracy and efficiency.We also conduct an ablation experiment to show that the performance of the model is not getting better with the increase of the attention layers.
基金This work was supported in part by the National Natural Science Foundation of China under Grants 62273272,62303375 and 61873277in part by the Key Research and Development Program of Shaanxi Province under Grant 2023-YBGY-243+2 种基金in part by the Natural Science Foundation of Shaanxi Province under Grants 2022JQ-606 and 2020-JQ758in part by the Research Plan of Department of Education of Shaanxi Province under Grant 21JK0752in part by the Youth Innovation Team of Shaanxi Universities.
文摘Social robot accounts controlled by artificial intelligence or humans are active in social networks,bringing negative impacts to network security and social life.Existing social robot detection methods based on graph neural networks suffer from the problem of many social network nodes and complex relationships,which makes it difficult to accurately describe the difference between the topological relations of nodes,resulting in low detection accuracy of social robots.This paper proposes a social robot detection method with the use of an improved neural network.First,social relationship subgraphs are constructed by leveraging the user’s social network to disentangle intricate social relationships effectively.Then,a linear modulated graph attention residual network model is devised to extract the node and network topology features of the social relation subgraph,thereby generating comprehensive social relation subgraph features,and the feature-wise linear modulation module of the model can better learn the differences between the nodes.Next,user text content and behavioral gene sequences are extracted to construct social behavioral features combined with the social relationship subgraph features.Finally,social robots can be more accurately identified by combining user behavioral and relationship features.By carrying out experimental studies based on the publicly available datasets TwiBot-20 and Cresci-15,the suggested method’s detection accuracies can achieve 86.73%and 97.86%,respectively.Compared with the existing mainstream approaches,the accuracy of the proposed method is 2.2%and 1.35%higher on the two datasets.The results show that the method proposed in this paper can effectively detect social robots and maintain a healthy ecological environment of social networks.
基金This research is partially supported by grant from the National Natural Science Foundation of China(No.72071019)grant from the Natural Science Foundation of Chongqing(No.cstc2021jcyj-msxmX0185)grant from the Chongqing Graduate Education and Teaching Reform Research Project(No.yjg193096).
文摘Bone age assessment(BAA)helps doctors determine how a child’s bones grow and develop in clinical medicine.Traditional BAA methods rely on clinician expertise,leading to time-consuming predictions and inaccurate results.Most deep learning-based BAA methods feed the extracted critical points of images into the network by providing additional annotations.This operation is costly and subjective.To address these problems,we propose a multi-scale attentional densely connected network(MSADCN)in this paper.MSADCN constructs a multi-scale dense connectivity mechanism,which can avoid overfitting,obtain the local features effectively and prevent gradient vanishing even in limited training data.First,MSADCN designs multi-scale structures in the densely connected network to extract fine-grained features at different scales.Then,coordinate attention is embedded to focus on critical features and automatically locate the regions of interest(ROI)without additional annotation.In addition,to improve the model’s generalization,transfer learning is applied to train the proposed MSADCN on the public dataset IMDB-WIKI,and the obtained pre-trained weights are loaded onto the Radiological Society of North America(RSNA)dataset.Finally,label distribution learning(LDL)and expectation regression techniques are introduced into our model to exploit the correlation between hand bone images of different ages,which can obtain stable age estimates.Extensive experiments confirm that our model can converge more efficiently and obtain a mean absolute error(MAE)of 4.64 months,outperforming some state-of-the-art BAA methods.
基金supported by the Science and Technology Project of State Grid Corporation of China(4000-202122070A-0-0-00).
文摘The fluctuation of wind power affects the operating safety and power consumption of the electric power grid and restricts the grid connection of wind power on a large scale.Therefore,wind power forecasting plays a key role in improving the safety and economic benefits of the power grid.This paper proposes a wind power predicting method based on a convolutional graph attention deep neural network with multi-wind farm data.Based on the graph attention network and attention mechanism,the method extracts spatial-temporal characteristics from the data of multiple wind farms.Then,combined with a deep neural network,a convolutional graph attention deep neural network model is constructed.Finally,the model is trained with the quantile regression loss function to achieve the wind power deterministic and probabilistic prediction based on multi-wind farm spatial-temporal data.A wind power dataset in the U.S.is taken as an example to demonstrate the efficacy of the proposed model.Compared with the selected baseline methods,the proposed model achieves the best prediction performance.The point prediction errors(i.e.,root mean square error(RMSE)and normalized mean absolute percentage error(NMAPE))are 0.304 MW and 1.177%,respectively.And the comprehensive performance of probabilistic prediction(i.e.,con-tinuously ranked probability score(CRPS))is 0.580.Thus,the significance of multi-wind farm data and spatial-temporal feature extraction module is self-evident.
基金the National Natural Science Foundation of China(Grant Nos.61972016 and 62032016)the Beijing Nova Program(Grant No.20220484106)。
文摘We present a novel approach for the prediction of crystal material properties that is distinct from the computationally complex and expensive density functional theory(DFT)-based calculations.Instead,we utilize an attention-based graph neural network that yields high-accuracy predictions.Our approach employs two attention mechanisms that allow for message passing on the crystal graphs,which in turn enable the model to selectively attend to pertinent atoms and their local environments,thereby improving performance.We conduct comprehensive experiments to validate our approach,which demonstrates that our method surpasses existing methods in terms of predictive accuracy.Our results suggest that deep learning,particularly attention-based networks,holds significant promise for predicting crystal material properties,with implications for material discovery and the refined intelligent systems.
基金This work was supported in part by the National Natural Science Foundation of China under Grant Nos.61672378,61771339,and 61520106002.
文摘In recent years,the convolutional neural networks(CNNs)for single image super-resolution(SISR)are becoming more and more complex,and it is more challenging to improve the SISR performance.In contrast,the reference image guided super-resolution(RefSR)is an effective strategy to boost the SR(super-resolution)performance.In RefSR,the introduced high-resolution(HR)references can facilitate the high-frequency residual prediction process.According to the best of our knowledge,the existing CNN-based RefSR methods treat the features from the references and the low-resolution(LR)input equally by simply concatenating them together.However,the HR references and the LR inputs contribute differently to the final SR results.Therefore,we propose a progressive channel attention network(PCANet)for RefSR.There are two technical contributions in this paper.First,we propose a novel channel attention module(CAM),which estimates the channel weighting parameter by weightedly averaging the spatial features instead of using global averaging.Second,considering that the residual prediction process can be improved when the LR input is enriched with more details,we perform super-resolution progressively,which can take advantage of the reference images in multi-scales.Extensive quantitative and qualitative evaluations on three benchmark datasets,which represent three typical scenarios for RefSR,demonstrate that our method is superior to the state-of-the-art SISR and RefSR methods in terms of PSNR(Peak Signal-to-Noise Ratio)and SSIM(Structural Similarity).
基金supported by the Key Research&Development Plan Project of Shandong Province,China(No.2017GGX10127).
文摘Continuous sign language recognition(CSLR)is challenging due to the complexity of video background,hand gesture variability,and temporal modeling difficulties.This work proposes a CSLR method based on a spatialtemporal graph attention network to focus on essential features of video series.The method considers local details of sign language movements by taking the information on joints and bones as inputs and constructing a spatialtemporal graph to reflect inter-frame relevance and physical connections between nodes.The graph-based multihead attention mechanism is utilized with adjacent matrix calculation for better local-feature exploration,and short-term motion correlation modeling is completed via a temporal convolutional network.We adopted BLSTM to learn the long-termdependence and connectionist temporal classification to align the word-level sequences.The proposed method achieves competitive results regarding word error rates(1.59%)on the Chinese Sign Language dataset and the mean Jaccard Index(65.78%)on the ChaLearn LAP Continuous Gesture Dataset.
基金This work was supported in part by the National Natural Science Foundation of China(U21A2019,61873058),Hainan Province Science and Technology Special Fund of China(ZDYF2022SHFZ105)the Alexander von Humboldt Foundation of Germany.
文摘Accurate detection of pipeline leakage is essential to maintain the safety of pipeline transportation.Recently,deep learning(DL)has emerged as a promising tool for pipeline leakage detection(PLD).However,most existing DL methods have difficulty in achieving good performance in identifying leakage types due to the complex time dynamics of pipeline data.On the other hand,the initial parameter selection in the detection model is generally random,which may lead to unstable recognition performance.For this reason,a hybrid DL framework referred to as parameter-optimized recurrent attention network(PRAN)is presented in this paper to improve the accuracy of PLD.First,a parameter-optimized long short-term memory(LSTM)network is introduced to extract effective and robust features,which exploits a particle swarm optimization(PSO)algorithm with cross-entropy fitness function to search for globally optimal parameters.With this framework,the learning representation capability of the model is improved and the convergence rate is accelerated.Moreover,an anomaly-attention mechanism(AM)is proposed to discover class discriminative information by weighting the hidden states,which contributes to amplifying the normalabnormal distinguishable discrepancy,further improving the accuracy of PLD.After that,the proposed PRAN not only implements the adaptive optimization of network parameters,but also enlarges the contribution of normal-abnormal discrepancy,thereby overcoming the drawbacks of instability and poor generalization.Finally,the experimental results demonstrate the effectiveness and superiority of the proposed PRAN for PLD.
基金supported by the National Key Research and Development Program of China under Grant No.2018YFE0206900the National Natural Science Foundation of China under Grant No.61871440 and CAAI‐Huawei Mind-Spore Open Fund.
文摘Tumour segmentation in medical images(especially 3D tumour segmentation)is highly challenging due to the possible similarity between tumours and adjacent tissues,occurrence of multiple tumours and variable tumour shapes and sizes.The popular deep learning‐based segmentation algorithms generally rely on the convolutional neural network(CNN)and Transformer.The former cannot extract the global image features effectively while the latter lacks the inductive bias and involves the complicated computation for 3D volume data.The existing hybrid CNN‐Transformer network can only provide the limited performance improvement or even poorer segmentation performance than the pure CNN.To address these issues,a short‐term and long‐term memory self‐attention network is proposed.Firstly,a distinctive self‐attention block uses the Transformer to explore the correlation among the region features at different levels extracted by the CNN.Then,the memory structure filters and combines the above information to exclude the similar regions and detect the multiple tumours.Finally,the multi‐layer reconstruction blocks will predict the tumour boundaries.Experimental results demonstrate that our method outperforms other methods in terms of subjective visual and quantitative evaluation.Compared with the most competitive method,the proposed method provides Dice(82.4%vs.76.6%)and Hausdorff distance 95%(HD95)(10.66 vs.11.54 mm)on the KiTS19 as well as Dice(80.2%vs.78.4%)and HD95(9.632 vs.12.17 mm)on the LiTS.
文摘In computer vision,object recognition and image categorization have proven to be difficult challenges.They have,nevertheless,generated responses to a wide range of difficult issues from a variety of fields.Convolution Neural Networks(CNNs)have recently been identified as the most widely proposed deep learning(DL)algorithms in the literature.CNNs have unquestionably delivered cutting-edge achievements,particularly in the areas of image classification,speech recognition,and video processing.However,it has been noticed that the CNN-training assignment demands a large amount of data,which is in low supply,especially in the medical industry,and as a result,the training process takes longer.In this paper,we describe an attentionaware CNN architecture for classifying chest X-ray images to diagnose Pneumonia in order to address the aforementioned difficulties.AttentionModules provide attention-aware properties to the Attention Network.The attentionaware features of various modules alter as the layers become deeper.Using a bottom-up top-down feedforward structure,the feedforward and feedback attention processes are integrated into a single feedforward process inside each attention module.In the present work,a deep neural network(DNN)is combined with an attention mechanism to test the prediction of Pneumonia disease using chest X-ray pictures.To produce attention-aware features,the suggested networkwas built by merging channel and spatial attentionmodules in DNN architecture.With this network,we worked on a publicly available Kaggle chest X-ray dataset.Extensive testing was carried out to validate the suggested model.In the experimental results,we attained an accuracy of 95.47%and an F-score of 0.92,indicating that the suggested model outperformed against the baseline models.
文摘The natural language to SQL(NL2SQL)task is an emerging research area that aims to transform a natural language with a given database into an SQL query.The earlier approaches were to process the input into a heterogeneous graph.However,previous models failed to distinguish the types of multi-hop connections of the heterogeneous graph,which tended to ignore crucial semantic path information.To this end,a two-layer attention network is presented to focus on essential neighbor nodes and mine enlightening semantic paths for feature encoding.The weighted edge is introduced for schema linking to connect the nodes with semantic similarity.In the decoding phase,a rule-based pruning strategy is offered to refine the generated SQL queries.From the experimental results,the approach is shown to learn a good encoding representation and decode the representation to generate results with practical meaning.
基金Supported by the Ph.D. Research Startup Project of Minnan Normal University(KJ2021020)the National Natural Science Foundation of China(12090020 and 12090025)Zhejiang Provincial Natural Science Foundation of China(LSD19H180005)。
文摘Accurate pancreas segmentation is critical for the diagnosis and management of diseases of the pancreas. It is challenging to precisely delineate pancreas due to the highly variations in volume, shape and location. In recent years, coarse-to-fine methods have been widely used to alleviate class imbalance issue and improve pancreas segmentation accuracy. However,cascaded methods could be computationally intensive and the refined results are significantly dependent on the performance of its coarse segmentation results. To balance the segmentation accuracy and computational efficiency, we propose a Discriminative Feature Attention Network for pancreas segmentation, to effectively highlight pancreas features and improve segmentation accuracy without explicit pancreas location. The final segmentation is obtained by applying a simple yet effective post-processing step. Two experiments on both public NIH pancreas CT dataset and abdominal BTCV multi-organ dataset are individually conducted to show the effectiveness of our method for 2 D pancreas segmentation. We obtained average Dice Similarity Coefficient(DSC) of 82.82±6.09%, average Jaccard Index(JI) of 71.13± 8.30% and average Symmetric Average Surface Distance(ASD) of 1.69 ± 0.83 mm on the NIH dataset. Compared to the existing deep learning-based pancreas segmentation methods, our experimental results achieve the best average DSC and JI value.
基金Authors The research project is partially supported by National Natural ScienceFoundation of China under Grant No. 62072015, U19B2039, U1811463National Key R&D Programof China 2018YFB1600903.
文摘PM2.5 concentration prediction is of great significance to environmental protection and human health.Achieving accurate prediction of PM2.5 concentration has become an important research task.However,PM2.5 pollutants can spread in the earth’s atmosphere,causing mutual influence between different cities.To effectively capture the air pollution relationship between cities,this paper proposes a novel spatiotemporal model combining graph attention neural network(GAT)and gated recurrent unit(GRU),named GAT-GRU for PM2.5 concentration prediction.Specifically,GAT is used to learn the spatial dependence of PM2.5 concentration data in different cities,and GRU is to extract the temporal dependence of the long-term data series.The proposed model integrates the learned spatio-temporal dependencies to capture long-term complex spatio-temporal features.Considering that air pollution is related to the meteorological conditions of the city,the knowledge acquired from meteorological data is used in the model to enhance PM2.5 prediction performance.The input of the GAT-GRU model consists of PM2.5 concentration data and meteorological data.In order to verify the effectiveness of the proposed GAT-GRU prediction model,this paper designs experiments on real-world datasets compared with other baselines.Experimental results prove that our model achieves excellent performance in PM2.5 concentration prediction.
基金supported by the Nation Natural Science Foundation of China(NSFC)under Grant No.61462042 and No.61966018.
文摘Traffic flow prediction is an important part of the intelligent transportation system. Accurate multi-step traffic flow prediction plays an important role in improving the operational efficiency of the traffic network. Since traffic flow data has complex spatio-temporal correlation and non-linearity, existing prediction methods are mainly accomplished through a combination of a Graph Convolutional Network (GCN) and a recurrent neural network. The combination strategy has an excellent performance in traffic prediction tasks. However, multi-step prediction error accumulates with the predicted step size. Some scholars use multiple sampling sequences to achieve more accurate prediction results. But it requires high hardware conditions and multiplied training time. Considering the spatiotemporal correlation of traffic flow and influence of external factors, we propose an Attention Based Spatio-Temporal Graph Convolutional Network considering External Factors (ABSTGCN-EF) for multi-step traffic flow prediction. This model models the traffic flow as diffusion on a digraph and extracts the spatial characteristics of traffic flow through GCN. We add meaningful time-slots attention to the encoder-decoder to form an Attention Encoder Network (AEN) to handle temporal correlation. The attention vector is used as a competitive choice to draw the correlation between predicted states and historical states. We considered the impact of three external factors (daytime, weekdays, and traffic accident markers) on the traffic flow prediction tasks. Experiments on two public data sets show that it makes sense to consider external factors. The prediction performance of our ABSTGCN-EF model achieves 7.2%–8.7% higher than the state-of-the-art baselines.
基金Supported by the Heilongjiang Provincial Key Research and Development Program Guidance Project(GZ20210103)。
文摘Taking Jiuhong Modern Agriculture Demonstration Park of Heilongjiang Province as the base for rice disease image acquisition,a total of 841 images of the four different diseases,including rice blast,stripe leaf blight,red blight and bacterial brown spot,were obtained.In this study,an interleaved attention neural network(IANN)was proposed to realize the recognition of rice disease images and an interleaved group convolutions(IGC)network was introduced to reduce the number of convolutional parameters,which realized the information interaction between channels.Based on the convolutional block attention module(CBAM),attention was paid to the features of results of the primary group convolution in the cross-group convolution to improve the classification performance of the deep learning model.The results showed that the classification accuracy of IANN was 96.14%,which was 4.72%higher than that of the classical convolutional neural network(CNN).This study showed a new idea for the efficient training of neural networks in the case of small samples and provided a reference for the image recognition and diagnosis of rice and other crop diseases.
文摘In lung nodules there is a huge variation in structural properties like Shape, Surface Texture. Even the spatial properties vary, where they can be found attached to lung walls, blood vessels in complex non-homogenous lung structures. Moreover, the nodules are of small size at their early stage of development. This poses a serious challenge to develop a Computer aided diagnosis (CAD) system with better false positive reduction. Hence, to reduce the false positives per scan and to deal with the challenges mentioned, this paper proposes a set of three diverse 3D Attention based CNN architectures (3D ACNN) whose predictions on given low dose Volumetric Computed Tomography (CT) scans are fused to achieve more effective and reliable results. Attention mechanism is employed to selectively concentrate/weigh more on nodule specific features and less weight age over other irrelevant features. By using this attention based mechanism in CNN unlike traditional methods there was a significant gain in the classification performance. Contextual dependencies are also taken into account by giving three patches of different sizes surrounding the nodule as input to the ACNN architectures. The system is trained and validated using a publicly available LUNA16 dataset in a 10 fold cross validation approach where a competition performance metric (CPM) score of 0.931 is achieved. The experimental results demonstrate that either a single patch or a single architecture in a one-to-one fashion that is adopted in earlier methods cannot achieve a better performance and signifies the necessity of fusing different multi patched architectures. Though the proposed system is mainly designed for pulmonary nodule detection it can be easily extended to classification tasks of any other 3D medical diagnostic computed tomography images where there is a huge variation and uncertainty in classification.
文摘Facial Expression Recognition(FER)has been an interesting area of research in places where there is human-computer interaction.Human psychol-ogy,emotions and behaviors can be analyzed in FER.Classifiers used in FER have been perfect on normal faces but have been found to be constrained in occluded faces.Recently,Deep Learning Techniques(DLT)have gained popular-ity in applications of real-world problems including recognition of human emo-tions.The human face reflects emotional states and human intentions.An expression is the most natural and powerful way of communicating non-verbally.Systems which form communications between the two are termed Human Machine Interaction(HMI)systems.FER can improve HMI systems as human expressions convey useful information to an observer.This paper proposes a FER scheme called EECNN(Enhanced Convolution Neural Network with Atten-tion mechanism)to recognize seven types of human emotions with satisfying results in its experiments.Proposed EECNN achieved 89.8%accuracy in classi-fying the images.