Most existing network representation learning algorithms focus on network structures for learning.However,network structure is only one kind of view and feature for various networks,and it cannot fully reflect all cha...Most existing network representation learning algorithms focus on network structures for learning.However,network structure is only one kind of view and feature for various networks,and it cannot fully reflect all characteristics of networks.In fact,network vertices usually contain rich text information,which can be well utilized to learn text-enhanced network representations.Meanwhile,Matrix-Forest Index(MFI)has shown its high effectiveness and stability in link prediction tasks compared with other algorithms of link prediction.Both MFI and Inductive Matrix Completion(IMC)are not well applied with algorithmic frameworks of typical representation learning methods.Therefore,we proposed a novel semi-supervised algorithm,tri-party deep network representation learning using inductive matrix completion(TDNR).Based on inductive matrix completion algorithm,TDNR incorporates text features,the link certainty degrees of existing edges and the future link probabilities of non-existing edges into network representations.The experimental results demonstrated that TFNR outperforms other baselines on three real-world datasets.The visualizations of TDNR show that proposed algorithm is more discriminative than other unsupervised approaches.展开更多
Prior studies have demonstrated that deep learning-based approaches can enhance the performance of source code vulnerability detection by training neural networks to learn vulnerability patterns in code representation...Prior studies have demonstrated that deep learning-based approaches can enhance the performance of source code vulnerability detection by training neural networks to learn vulnerability patterns in code representations.However,due to limitations in code representation and neural network design,the validity and practicality of the model still need to be improved.Additionally,due to differences in programming languages,most methods lack cross-language detection generality.To address these issues,in this paper,we analyze the shortcomings of previous code representations and neural networks.We propose a novel hierarchical code representation that combines Concrete Syntax Trees(CST)with Program Dependence Graphs(PDG).Furthermore,we introduce a Tree-Graph-Gated-Attention(TGGA)network based on gated recurrent units and attention mechanisms to build a Hierarchical Code Representation learning-based Vulnerability Detection(HCRVD)system.This system enables cross-language vulnerability detection at the function-level.The experiments show that HCRVD surpasses many competitors in vulnerability detection capabilities.It benefits from the hierarchical code representation learning method,and outperforms baseline in cross-language vulnerability detection by 9.772%and 11.819%in the C/C++and Java datasets,respectively.Moreover,HCRVD has certain ability to detect vulnerabilities in unknown programming languages and is useful in real open-source projects.HCRVD shows good validity,generality and practicality.展开更多
The homogeneity analysis of multi-airport system can provide important decision-making support for the route layout and cooperative operation.Existing research seldom analyzes the homogeneity of multi-airport system f...The homogeneity analysis of multi-airport system can provide important decision-making support for the route layout and cooperative operation.Existing research seldom analyzes the homogeneity of multi-airport system from the perspective of route network analysis,and the attribute information of airport nodes in the airport route network is not appropriately integrated into the airport network.In order to solve this problem,a multi-airport system homogeneity analysis method based on airport attribute network representation learning is proposed.Firstly,the route network of a multi-airport system with attribute information is constructed.If there are flights between airports,an edge is added between airports,and regional attribute information is added for each airport node.Secondly,the airport attributes and the airport network vector are represented respectively.The airport attributes and the airport network vector are embedded into the unified airport representation vector space by the network representation learning method,and then the airport vector integrating the airport attributes and the airport network characteristics is obtained.By calculating the similarity of the airport vectors,it is convenient to calculate the degree of homogeneity between airports and the homogeneity of the multi-airport system.The experimental results on the Beijing-Tianjin-Hebei multi-airport system show that,compared with other existing algorithms,the homogeneity analysis method based on attributed network representation learning can get more consistent results with the current situation of Beijing-Tianjin-Hebei multi-airport system.展开更多
The traditional malware research is mainly based on its recognition and detection as a breakthrough point,without focusing on its propagation trends or predicting the subsequently infected nodes.The complexity of netw...The traditional malware research is mainly based on its recognition and detection as a breakthrough point,without focusing on its propagation trends or predicting the subsequently infected nodes.The complexity of network structure,diversity of network nodes,and sparsity of data all pose difficulties in predicting propagation.This paper proposes a malware propagation prediction model based on representation learning and Graph Convolutional Networks(GCN)to address the aforementioned problems.First,to solve the problem of the inaccuracy of infection intensity calculation caused by the sparsity of node interaction behavior data in the malware propagation network,a mechanism based on a tensor to mine the infection intensity among nodes is proposed to retain the network structure information.The influence of the relationship between nodes on the infection intensity is also analyzed.Second,given the diversity and complexity of the content and structure of infected and normal nodes in the network,considering the advantages of representation learning in data feature extraction,the corresponding representation learning method is adopted for the characteristics of infection intensity among nodes.This can efficiently calculate the relationship between entities and relationships in low dimensional space to achieve the goal of low dimensional,dense,and real-valued representation learning for the characteristics of propagation spatial data.We also design a new method,Tensor2vec,to learn the potential structural features of malware propagation.Finally,considering the convolution ability of GCN for non-Euclidean data,we propose a dynamic prediction model of malware propagation based on representation learning and GCN to solve the time effectiveness problem of the malware propagation carrier.The experimental results show that the proposed model can effectively predict the behaviors of the nodes in the network and discover the influence of different characteristics of nodes on the malware propagation situation.展开更多
Objective To construct symptom-formula-herb heterogeneous graphs structured Treatise on Febrile Diseases(Shang Han Lun,《伤寒论》)dataset and explore an optimal learning method represented with node attributes based o...Objective To construct symptom-formula-herb heterogeneous graphs structured Treatise on Febrile Diseases(Shang Han Lun,《伤寒论》)dataset and explore an optimal learning method represented with node attributes based on graph convolutional network(GCN).Methods Clauses that contain symptoms,formulas,and herbs were abstracted from Treatise on Febrile Diseases to construct symptom-formula-herb heterogeneous graphs,which were used to propose a node representation learning method based on GCN−the Traditional Chinese Medicine Graph Convolution Network(TCM-GCN).The symptom-formula,symptom-herb,and formula-herb heterogeneous graphs were processed with the TCM-GCN to realize high-order propagating message passing and neighbor aggregation to obtain new node representation attributes,and thus acquiring the nodes’sum-aggregations of symptoms,formulas,and herbs to lay a foundation for the downstream tasks of the prediction models.Results Comparisons among the node representations with multi-hot encoding,non-fusion encoding,and fusion encoding showed that the Precision@10,Recall@10,and F1-score@10 of the fusion encoding were 9.77%,6.65%,and 8.30%,respectively,higher than those of the non-fusion encoding in the prediction studies of the model.Conclusion Node representations by fusion encoding achieved comparatively ideal results,indicating the TCM-GCN is effective in realizing node-level representations of heterogeneous graph structured Treatise on Febrile Diseases dataset and is able to elevate the performance of the downstream tasks of the diagnosis model.展开更多
Recently,sparse representation classification(SRC)and fisher discrimination dictionary learning(FDDL)methods have emerged as important methods for vehicle classification.In this paper,inspired by recent breakthroughs ...Recently,sparse representation classification(SRC)and fisher discrimination dictionary learning(FDDL)methods have emerged as important methods for vehicle classification.In this paper,inspired by recent breakthroughs of discrimination dictionary learning approach and multi-task joint covariate selection,we focus on the problem of vehicle classification in real-world applications by formulating it as a multi-task joint sparse representation model based on fisher discrimination dictionary learning to merge the strength of multiple features among multiple sensors.To improve the classification accuracy in complex scenes,we develop a new method,called multi-task joint sparse representation classification based on fisher discrimination dictionary learning,for vehicle classification.In our proposed method,the acoustic and seismic sensor data sets are captured to measure the same physical event simultaneously by multiple heterogeneous sensors and the multi-dimensional frequency spectrum features of sensors data are extracted using Mel frequency cepstral coefficients(MFCC).Moreover,we extend our model to handle sparse environmental noise.We experimentally demonstrate the benefits of joint information fusion based on fisher discrimination dictionary learning from different sensors in vehicle classification tasks.展开更多
In the era of Big data,learning discriminant feature representation from network traffic is identified has as an invariably essential task for improving the detection ability of an intrusion detection system(IDS).Owin...In the era of Big data,learning discriminant feature representation from network traffic is identified has as an invariably essential task for improving the detection ability of an intrusion detection system(IDS).Owing to the lack of accurately labeled network traffic data,many unsupervised feature representation learning models have been proposed with state-of-theart performance.Yet,these models fail to consider the classification error while learning the feature representation.Intuitively,the learnt feature representation may degrade the performance of the classification task.For the first time in the field of intrusion detection,this paper proposes an unsupervised IDS model leveraging the benefits of deep autoencoder(DAE)for learning the robust feature representation and one-class support vector machine(OCSVM)for finding the more compact decision hyperplane for intrusion detection.Specially,the proposed model defines a new unified objective function to minimize the reconstruction and classification error simultaneously.This unique contribution not only enables the model to support joint learning for feature representation and classifier training but also guides to learn the robust feature representation which can improve the discrimination ability of the classifier for intrusion detection.Three set of evaluation experiments are conducted to demonstrate the potential of the proposed model.First,the ablation evaluation on benchmark dataset,NSL-KDD validates the design decision of the proposed model.Next,the performance evaluation on recent intrusion dataset,UNSW-NB15 signifies the stable performance of the proposed model.Finally,the comparative evaluation verifies the efficacy of the proposed model against recently published state-of-the-art methods.展开更多
Roller bearing failure is one of the most common faults in rotating machines.Various techniques for bearing fault diagnosis based on faults feature extraction have been proposed.But feature extraction from fault signa...Roller bearing failure is one of the most common faults in rotating machines.Various techniques for bearing fault diagnosis based on faults feature extraction have been proposed.But feature extraction from fault signals requires expert prior information and human labour.Recently,deep learning algorithms have been applied extensively in the condition monitoring of rotating machines to learn features automatically from the input data.Given its robust performance in image recognition,the convolutional neural network(CNN)architecture has been widely used to learn automatically discriminative features from vibration images and classify health conditions.This paper proposes and evaluates a two-stage method RGBVI-CNN for roller bearings fault diagnosis.The first stage in the proposed method is to generate the RGB vibration images(RGBVIs)from the input vibration signals.To begin this process,first,the 1-D vibration signals were converted to 2-D grayscale vibration Images.Once the conversion was completed,the regions of interest(ROI)were found in the converted 2-D grayscale vibration images.Finally,to produce vibration images with more discriminative characteristics,an algorithm was applied to the 2-D grayscale vibration images to produce connected components-based RGB vibration images(RGBVIs)with sets of colours and texture features.In the second stage,with these RGBVIs a CNN-based architecture was employed to learn automatically features from the RGBVIs and to classify bearing health conditions.Two cases of fault classification of rolling element bearings are used to validate the proposed method.Experimental results of this investigation demonstrate that RGBVI-CNN can generate advantageous health condition features from bearing vibration signals and classify the health conditions under different working loads with high accuracy.Moreover,several classification models trained using RGBVI-CNN offered high performance in the testing results of the overall classification accuracy,precision,recall,and F-score.展开更多
Diabetic retinopathy (DR) is a retinal disease that causes irreversible blindness.DR occurs due to the high blood sugar level of the patient, and it is clumsy tobe detected at an early stage as no early symptoms appea...Diabetic retinopathy (DR) is a retinal disease that causes irreversible blindness.DR occurs due to the high blood sugar level of the patient, and it is clumsy tobe detected at an early stage as no early symptoms appear at the initial level. To preventblindness, early detection and regular treatment are needed. Automated detectionbased on machine intelligence may assist the ophthalmologist in examining thepatients’ condition more accurately and efficiently. The purpose of this study is toproduce an automated screening system for recognition and grading of diabetic retinopathyusing machine learning through deep transfer and representational learning.The artificial intelligence technique used is transfer learning on the deep neural network,Inception-v4. Two configuration variants of transfer learning are applied onInception-v4: Fine-tune mode and fixed feature extractor mode. Both configurationmodes have achieved decent accuracy values, but the fine-tuning method outperformsthe fixed feature extractor configuration mode. Fine-tune configuration modehas gained 96.6% accuracy in early detection of DR and 97.7% accuracy in gradingthe disease and has outperformed the state of the art methods in the relevant literature.展开更多
Introduce a method of generation of new units within a cluster and aalgorithm of generating new clusters. The model automatically builds up its dynamically growinginternal representation structure during the learning ...Introduce a method of generation of new units within a cluster and aalgorithm of generating new clusters. The model automatically builds up its dynamically growinginternal representation structure during the learning process. Comparing model with other typicalclassification algorithm such as the Kohonen's self-organizing map, the model realizes a multilevelclassification of the input pattern with an optional accuracy and gives a strong support possibilityfor the parallel computational main processor. The idea is suitable for the high-level storage ofcomplex datas structures for object recognition.展开更多
Contrastive self‐supervised representation learning on attributed graph networks with Graph Neural Networks has attracted considerable research interest recently.However,there are still two challenges.First,most of t...Contrastive self‐supervised representation learning on attributed graph networks with Graph Neural Networks has attracted considerable research interest recently.However,there are still two challenges.First,most of the real‐word system are multiple relations,where entities are linked by different types of relations,and each relation is a view of the graph network.Second,the rich multi‐scale information(structure‐level and feature‐level)of the graph network can be seen as self‐supervised signals,which are not fully exploited.A novel contrastive self‐supervised representation learning framework on attributed multiplex graph networks with multi‐scale(named CoLM^(2)S)information is presented in this study.It mainly contains two components:intra‐relation contrast learning and interrelation contrastive learning.Specifically,the contrastive self‐supervised representation learning framework on attributed single‐layer graph networks with multi‐scale information(CoLMS)framework with the graph convolutional network as encoder to capture the intra‐relation information with multi‐scale structure‐level and feature‐level selfsupervised signals is introduced first.The structure‐level information includes the edge structure and sub‐graph structure,and the feature‐level information represents the output of different graph convolutional layer.Second,according to the consensus assumption among inter‐relations,the CoLM^(2)S framework is proposed to jointly learn various graph relations in attributed multiplex graph network to achieve global consensus node embedding.The proposed method can fully distil the graph information.Extensive experiments on unsupervised node clustering and graph visualisation tasks demonstrate the effectiveness of our methods,and it outperforms existing competitive baselines.展开更多
Most modern face recognition and classification systems mainly rely on hand-crafted image feature descriptors. In this paper, we propose a novel deep learning algorithm combining unsupervised and supervised learning n...Most modern face recognition and classification systems mainly rely on hand-crafted image feature descriptors. In this paper, we propose a novel deep learning algorithm combining unsupervised and supervised learning named deep belief network embedded with Softmax regress (DBNESR) as a natural source for obtaining additional, complementary hierarchical representations, which helps to relieve us from the complicated hand-crafted feature-design step. DBNESR first learns hierarchical representations of feature by greedy layer-wise unsupervised learning in a feed-forward (bottom-up) and back-forward (top-down) manner and then makes more efficient recognition with Softmax regress by supervised learning. As a comparison with the algorithms only based on supervised learning, we again propose and design many kinds of classifiers: BP, HBPNNs, RBF, HRBFNNs, SVM and multiple classification decision fusion classifier (MCDFC)—hybrid HBPNNs-HRBFNNs-SVM classifier. The conducted experiments validate: Firstly, the proposed DBNESR is optimal for face recognition with the highest and most stable recognition rates;second, the algorithm combining unsupervised and supervised learning has better effect than all supervised learning algorithms;third, hybrid neural networks have better effect than single model neural network;fourth, the average recognition rate and variance of these algorithms in order of the largest to the smallest are respectively shown as DBNESR, MCDFC, SVM, HRBFNNs, RBF, HBPNNs, BP and BP, RBF, HBPNNs, HRBFNNs, SVM, MCDFC, DBNESR;at last, it reflects hierarchical representations of feature by DBNESR in terms of its capability of modeling hard artificial intelligent tasks.展开更多
With the wide application of location-based social networks(LBSNs),personalized point of interest(POI)recommendation becomes popular,especially in the commercial field.Unfortunately,it is challenging to accurately rec...With the wide application of location-based social networks(LBSNs),personalized point of interest(POI)recommendation becomes popular,especially in the commercial field.Unfortunately,it is challenging to accurately recommend POIs to users because the user-POI matrix is extremely sparse.In addition,a user's check-in activities are affected by many influential factors.However,most of existing studies capture only few influential factors.It is hard for them to be extended to incorporate other heterogeneous information in a unified way.To address these problems,we propose a meta-path-based deep representation learning(MPDRL)model for personalized POI recommendation.In this model,we design eight types of meta-paths to fully utilize the rich heterogeneous information in LBSNs for the representations of users and POIs,and deeply mine the correlations between users and POIs.To further improve the recommendation performance,we design an attention-based long short-term memory(LSTM)network to learn the importance of different influential factors on a user's specific check-in activity.To verify the effectiveness of our proposed method,we conduct extensive experiments on a real-world dataset,Foursquare.Experimental results show that the MPDRL model improves at least 16.97%and 23.55%over all comparison methods in terms of the metric Precision@N(Pre@N)and Recall@N(Rec@N)respectively.展开更多
Many network presentation learning algorithms(NPLA)have originated from the process of the random walk between nodes in recent years.Despite these algorithms can obtain great embedding results,there may be also some l...Many network presentation learning algorithms(NPLA)have originated from the process of the random walk between nodes in recent years.Despite these algorithms can obtain great embedding results,there may be also some limitations.For instance,only the structural information of nodes is considered when these kinds of algorithms are constructed.Aiming at this issue,a label and community information-based network presentation learning algorithm(LC-NPLA)is proposed in this paper.First of all,by using the community information and the label information of nodes,the first-order neighbors of nodes are reconstructed.In the next,the random walk strategy is improved by integrating the degree information and label information of nodes.Then,the node sequence obtained from random walk sampling is transformed into the node representation vector by the Skip-Gram model.At last,the experimental results on ten real-world networks demonstrate that the proposed algorithm has great advantages in the label classification,network reconstruction and link prediction tasks,compared with three benchmark algorithms.展开更多
Network anomaly detection plays a vital role in safeguarding network security.However,the existing network anomaly detection task is typically based on the one-class zero-positive scenario.This approach is susceptible...Network anomaly detection plays a vital role in safeguarding network security.However,the existing network anomaly detection task is typically based on the one-class zero-positive scenario.This approach is susceptible to overfitting during the training process due to discrepancies in data distribution between the training set and the test set.This phenomenon is known as prediction drift.Additionally,the rarity of anomaly data,often masked by normal data,further complicates network anomaly detection.To address these challenges,we propose the PUNet network,which ingeniously combines the strengths of traditional machine learning and deep learning techniques for anomaly detection.Specifically,PUNet employs a reconstruction-based autoencoder to pre-train normal data,enabling the network to capture potential features and correlations within the data.Subsequently,PUNet integrates a sampling algorithm to construct a pseudo-label candidate set among the outliers based on the reconstruction loss of the samples.This approach effectively mitigates the prediction drift problem by incorporating abnormal samples.Furthermore,PUNet utilizes the CatBoost classifier for anomaly detection to tackle potential data imbalance issues within the candidate set.Extensive experimental evaluations demonstrate that PUNet effectively resolves the prediction drift and data imbalance problems,significantly outperforming competing methods.展开更多
Graph Neural Networks(GNNs)play a significant role in tasks related to homophilic graphs.Traditional GNNs,based on the assumption of homophily,employ low-pass filters for neighboring nodes to achieve information aggre...Graph Neural Networks(GNNs)play a significant role in tasks related to homophilic graphs.Traditional GNNs,based on the assumption of homophily,employ low-pass filters for neighboring nodes to achieve information aggregation and embedding.However,in heterophilic graphs,nodes from different categories often establish connections,while nodes of the same category are located further apart in the graph topology.This characteristic poses challenges to traditional GNNs,leading to issues of“distant node modeling deficiency”and“failure of the homophily assumption”.In response,this paper introduces the Spatial-Frequency domain Adaptive Heterophilic Graph Neural Networks(SFA-HGNN),which integrates adaptive embedding mechanisms for both spatial and frequency domains to address the aforementioned issues.Specifically,for the first problem,we propose the“Distant Spatial Embedding Module”,aiming to select and aggregate distant nodes through high-order randomwalk transition probabilities to enhance modeling capabilities.For the second issue,we design the“Proximal Frequency Domain Embedding Module”,constructing adaptive filters to separate high and low-frequency signals of nodes,and introduce frequency-domain guided attention mechanisms to fuse the relevant information,thereby reducing the noise introduced by the failure of the homophily assumption.We deploy the SFA-HGNN on six publicly available heterophilic networks,achieving state-of-the-art results in four of them.Furthermore,we elaborate on the hyperparameter selection mechanism and validate the performance of each module through experimentation,demonstrating a positive correlation between“node structural similarity”,“node attribute vector similarity”,and“node homophily”in heterophilic networks.展开更多
基金Projects(11661069,61763041) supported by the National Natural Science Foundation of ChinaProject(IRT_15R40) supported by Changjiang Scholars and Innovative Research Team in University,ChinaProject(2017TS045) supported by the Fundamental Research Funds for the Central Universities,China
文摘Most existing network representation learning algorithms focus on network structures for learning.However,network structure is only one kind of view and feature for various networks,and it cannot fully reflect all characteristics of networks.In fact,network vertices usually contain rich text information,which can be well utilized to learn text-enhanced network representations.Meanwhile,Matrix-Forest Index(MFI)has shown its high effectiveness and stability in link prediction tasks compared with other algorithms of link prediction.Both MFI and Inductive Matrix Completion(IMC)are not well applied with algorithmic frameworks of typical representation learning methods.Therefore,we proposed a novel semi-supervised algorithm,tri-party deep network representation learning using inductive matrix completion(TDNR).Based on inductive matrix completion algorithm,TDNR incorporates text features,the link certainty degrees of existing edges and the future link probabilities of non-existing edges into network representations.The experimental results demonstrated that TFNR outperforms other baselines on three real-world datasets.The visualizations of TDNR show that proposed algorithm is more discriminative than other unsupervised approaches.
基金funded by the Major Science and Technology Projects in Henan Province,China,Grant No.221100210600.
文摘Prior studies have demonstrated that deep learning-based approaches can enhance the performance of source code vulnerability detection by training neural networks to learn vulnerability patterns in code representations.However,due to limitations in code representation and neural network design,the validity and practicality of the model still need to be improved.Additionally,due to differences in programming languages,most methods lack cross-language detection generality.To address these issues,in this paper,we analyze the shortcomings of previous code representations and neural networks.We propose a novel hierarchical code representation that combines Concrete Syntax Trees(CST)with Program Dependence Graphs(PDG).Furthermore,we introduce a Tree-Graph-Gated-Attention(TGGA)network based on gated recurrent units and attention mechanisms to build a Hierarchical Code Representation learning-based Vulnerability Detection(HCRVD)system.This system enables cross-language vulnerability detection at the function-level.The experiments show that HCRVD surpasses many competitors in vulnerability detection capabilities.It benefits from the hierarchical code representation learning method,and outperforms baseline in cross-language vulnerability detection by 9.772%and 11.819%in the C/C++and Java datasets,respectively.Moreover,HCRVD has certain ability to detect vulnerabilities in unknown programming languages and is useful in real open-source projects.HCRVD shows good validity,generality and practicality.
基金supported by the Natural Science Foundation of Tianjin(No.20JCQNJC00720)the Fundamental Research Fund for the Central Universities(No.3122021052)。
文摘The homogeneity analysis of multi-airport system can provide important decision-making support for the route layout and cooperative operation.Existing research seldom analyzes the homogeneity of multi-airport system from the perspective of route network analysis,and the attribute information of airport nodes in the airport route network is not appropriately integrated into the airport network.In order to solve this problem,a multi-airport system homogeneity analysis method based on airport attribute network representation learning is proposed.Firstly,the route network of a multi-airport system with attribute information is constructed.If there are flights between airports,an edge is added between airports,and regional attribute information is added for each airport node.Secondly,the airport attributes and the airport network vector are represented respectively.The airport attributes and the airport network vector are embedded into the unified airport representation vector space by the network representation learning method,and then the airport vector integrating the airport attributes and the airport network characteristics is obtained.By calculating the similarity of the airport vectors,it is convenient to calculate the degree of homogeneity between airports and the homogeneity of the multi-airport system.The experimental results on the Beijing-Tianjin-Hebei multi-airport system show that,compared with other existing algorithms,the homogeneity analysis method based on attributed network representation learning can get more consistent results with the current situation of Beijing-Tianjin-Hebei multi-airport system.
基金This research is partially supported by the National Natural Science Foundation of China(Grant No.61772098)Chongqing Technology Innovation and Application Development Project(Grant No.cstc2020jscxmsxmX0150)+2 种基金Chongqing Science and Technology Innovation Leading Talent Support Program(CSTCCXLJRC201908)Basic and Advanced Research Projects of CSTC(No.cstc2019jcyj-zdxmX0008)Science and Technology Research Program of Chongqing Municipal Education Commission(Grant No.KJZD-K201900605).
文摘The traditional malware research is mainly based on its recognition and detection as a breakthrough point,without focusing on its propagation trends or predicting the subsequently infected nodes.The complexity of network structure,diversity of network nodes,and sparsity of data all pose difficulties in predicting propagation.This paper proposes a malware propagation prediction model based on representation learning and Graph Convolutional Networks(GCN)to address the aforementioned problems.First,to solve the problem of the inaccuracy of infection intensity calculation caused by the sparsity of node interaction behavior data in the malware propagation network,a mechanism based on a tensor to mine the infection intensity among nodes is proposed to retain the network structure information.The influence of the relationship between nodes on the infection intensity is also analyzed.Second,given the diversity and complexity of the content and structure of infected and normal nodes in the network,considering the advantages of representation learning in data feature extraction,the corresponding representation learning method is adopted for the characteristics of infection intensity among nodes.This can efficiently calculate the relationship between entities and relationships in low dimensional space to achieve the goal of low dimensional,dense,and real-valued representation learning for the characteristics of propagation spatial data.We also design a new method,Tensor2vec,to learn the potential structural features of malware propagation.Finally,considering the convolution ability of GCN for non-Euclidean data,we propose a dynamic prediction model of malware propagation based on representation learning and GCN to solve the time effectiveness problem of the malware propagation carrier.The experimental results show that the proposed model can effectively predict the behaviors of the nodes in the network and discover the influence of different characteristics of nodes on the malware propagation situation.
基金New-Generation Artificial Intelligence-Major Program in the Sci-Tech Innovation 2030 Agenda from the Ministry of Science and Technology of China(2018AAA0102100)Hunan Provincial Department of Education key project(21A0250)The First Class Discipline Open Fund of Hunan University of Traditional Chinese Medicine(2022ZYX08)。
文摘Objective To construct symptom-formula-herb heterogeneous graphs structured Treatise on Febrile Diseases(Shang Han Lun,《伤寒论》)dataset and explore an optimal learning method represented with node attributes based on graph convolutional network(GCN).Methods Clauses that contain symptoms,formulas,and herbs were abstracted from Treatise on Febrile Diseases to construct symptom-formula-herb heterogeneous graphs,which were used to propose a node representation learning method based on GCN−the Traditional Chinese Medicine Graph Convolution Network(TCM-GCN).The symptom-formula,symptom-herb,and formula-herb heterogeneous graphs were processed with the TCM-GCN to realize high-order propagating message passing and neighbor aggregation to obtain new node representation attributes,and thus acquiring the nodes’sum-aggregations of symptoms,formulas,and herbs to lay a foundation for the downstream tasks of the prediction models.Results Comparisons among the node representations with multi-hot encoding,non-fusion encoding,and fusion encoding showed that the Precision@10,Recall@10,and F1-score@10 of the fusion encoding were 9.77%,6.65%,and 8.30%,respectively,higher than those of the non-fusion encoding in the prediction studies of the model.Conclusion Node representations by fusion encoding achieved comparatively ideal results,indicating the TCM-GCN is effective in realizing node-level representations of heterogeneous graph structured Treatise on Febrile Diseases dataset and is able to elevate the performance of the downstream tasks of the diagnosis model.
基金This work was supported by National Natural Science Foundation of China(NSFC)under Grant No.61771299,No.61771322,No.61375015,No.61301027.
文摘Recently,sparse representation classification(SRC)and fisher discrimination dictionary learning(FDDL)methods have emerged as important methods for vehicle classification.In this paper,inspired by recent breakthroughs of discrimination dictionary learning approach and multi-task joint covariate selection,we focus on the problem of vehicle classification in real-world applications by formulating it as a multi-task joint sparse representation model based on fisher discrimination dictionary learning to merge the strength of multiple features among multiple sensors.To improve the classification accuracy in complex scenes,we develop a new method,called multi-task joint sparse representation classification based on fisher discrimination dictionary learning,for vehicle classification.In our proposed method,the acoustic and seismic sensor data sets are captured to measure the same physical event simultaneously by multiple heterogeneous sensors and the multi-dimensional frequency spectrum features of sensors data are extracted using Mel frequency cepstral coefficients(MFCC).Moreover,we extend our model to handle sparse environmental noise.We experimentally demonstrate the benefits of joint information fusion based on fisher discrimination dictionary learning from different sensors in vehicle classification tasks.
基金This work was supported by the Research Deanship of Prince Sattam Bin Abdulaziz University,Al-Kharj,Saudi Arabia(Grant No.2020/01/17215).Also,the author thanks Deanship of college of computer engineering and sciences for technical support provided to complete the project successfully。
文摘In the era of Big data,learning discriminant feature representation from network traffic is identified has as an invariably essential task for improving the detection ability of an intrusion detection system(IDS).Owing to the lack of accurately labeled network traffic data,many unsupervised feature representation learning models have been proposed with state-of-theart performance.Yet,these models fail to consider the classification error while learning the feature representation.Intuitively,the learnt feature representation may degrade the performance of the classification task.For the first time in the field of intrusion detection,this paper proposes an unsupervised IDS model leveraging the benefits of deep autoencoder(DAE)for learning the robust feature representation and one-class support vector machine(OCSVM)for finding the more compact decision hyperplane for intrusion detection.Specially,the proposed model defines a new unified objective function to minimize the reconstruction and classification error simultaneously.This unique contribution not only enables the model to support joint learning for feature representation and classifier training but also guides to learn the robust feature representation which can improve the discrimination ability of the classifier for intrusion detection.Three set of evaluation experiments are conducted to demonstrate the potential of the proposed model.First,the ablation evaluation on benchmark dataset,NSL-KDD validates the design decision of the proposed model.Next,the performance evaluation on recent intrusion dataset,UNSW-NB15 signifies the stable performance of the proposed model.Finally,the comparative evaluation verifies the efficacy of the proposed model against recently published state-of-the-art methods.
文摘Roller bearing failure is one of the most common faults in rotating machines.Various techniques for bearing fault diagnosis based on faults feature extraction have been proposed.But feature extraction from fault signals requires expert prior information and human labour.Recently,deep learning algorithms have been applied extensively in the condition monitoring of rotating machines to learn features automatically from the input data.Given its robust performance in image recognition,the convolutional neural network(CNN)architecture has been widely used to learn automatically discriminative features from vibration images and classify health conditions.This paper proposes and evaluates a two-stage method RGBVI-CNN for roller bearings fault diagnosis.The first stage in the proposed method is to generate the RGB vibration images(RGBVIs)from the input vibration signals.To begin this process,first,the 1-D vibration signals were converted to 2-D grayscale vibration Images.Once the conversion was completed,the regions of interest(ROI)were found in the converted 2-D grayscale vibration images.Finally,to produce vibration images with more discriminative characteristics,an algorithm was applied to the 2-D grayscale vibration images to produce connected components-based RGB vibration images(RGBVIs)with sets of colours and texture features.In the second stage,with these RGBVIs a CNN-based architecture was employed to learn automatically features from the RGBVIs and to classify bearing health conditions.Two cases of fault classification of rolling element bearings are used to validate the proposed method.Experimental results of this investigation demonstrate that RGBVI-CNN can generate advantageous health condition features from bearing vibration signals and classify the health conditions under different working loads with high accuracy.Moreover,several classification models trained using RGBVI-CNN offered high performance in the testing results of the overall classification accuracy,precision,recall,and F-score.
基金the National Research Foundation(NRF)of Korea under the auspices of the Ministry of Science and ICT,Republic of Korea(Grant No.NRF-2020R1G1A1012741)received by M.R.Bhutta.https://nrf.kird.re.kr/main.do.
文摘Diabetic retinopathy (DR) is a retinal disease that causes irreversible blindness.DR occurs due to the high blood sugar level of the patient, and it is clumsy tobe detected at an early stage as no early symptoms appear at the initial level. To preventblindness, early detection and regular treatment are needed. Automated detectionbased on machine intelligence may assist the ophthalmologist in examining thepatients’ condition more accurately and efficiently. The purpose of this study is toproduce an automated screening system for recognition and grading of diabetic retinopathyusing machine learning through deep transfer and representational learning.The artificial intelligence technique used is transfer learning on the deep neural network,Inception-v4. Two configuration variants of transfer learning are applied onInception-v4: Fine-tune mode and fixed feature extractor mode. Both configurationmodes have achieved decent accuracy values, but the fine-tuning method outperformsthe fixed feature extractor configuration mode. Fine-tune configuration modehas gained 96.6% accuracy in early detection of DR and 97.7% accuracy in gradingthe disease and has outperformed the state of the art methods in the relevant literature.
基金Supported by the National"Fifteenth Year Plan"Key Project(2001BA307B01 02 01)
文摘Introduce a method of generation of new units within a cluster and aalgorithm of generating new clusters. The model automatically builds up its dynamically growinginternal representation structure during the learning process. Comparing model with other typicalclassification algorithm such as the Kohonen's self-organizing map, the model realizes a multilevelclassification of the input pattern with an optional accuracy and gives a strong support possibilityfor the parallel computational main processor. The idea is suitable for the high-level storage ofcomplex datas structures for object recognition.
基金support by the National Natural Science Foundation of China(NSFC)under grant number 61873274.
文摘Contrastive self‐supervised representation learning on attributed graph networks with Graph Neural Networks has attracted considerable research interest recently.However,there are still two challenges.First,most of the real‐word system are multiple relations,where entities are linked by different types of relations,and each relation is a view of the graph network.Second,the rich multi‐scale information(structure‐level and feature‐level)of the graph network can be seen as self‐supervised signals,which are not fully exploited.A novel contrastive self‐supervised representation learning framework on attributed multiplex graph networks with multi‐scale(named CoLM^(2)S)information is presented in this study.It mainly contains two components:intra‐relation contrast learning and interrelation contrastive learning.Specifically,the contrastive self‐supervised representation learning framework on attributed single‐layer graph networks with multi‐scale information(CoLMS)framework with the graph convolutional network as encoder to capture the intra‐relation information with multi‐scale structure‐level and feature‐level selfsupervised signals is introduced first.The structure‐level information includes the edge structure and sub‐graph structure,and the feature‐level information represents the output of different graph convolutional layer.Second,according to the consensus assumption among inter‐relations,the CoLM^(2)S framework is proposed to jointly learn various graph relations in attributed multiplex graph network to achieve global consensus node embedding.The proposed method can fully distil the graph information.Extensive experiments on unsupervised node clustering and graph visualisation tasks demonstrate the effectiveness of our methods,and it outperforms existing competitive baselines.
文摘Most modern face recognition and classification systems mainly rely on hand-crafted image feature descriptors. In this paper, we propose a novel deep learning algorithm combining unsupervised and supervised learning named deep belief network embedded with Softmax regress (DBNESR) as a natural source for obtaining additional, complementary hierarchical representations, which helps to relieve us from the complicated hand-crafted feature-design step. DBNESR first learns hierarchical representations of feature by greedy layer-wise unsupervised learning in a feed-forward (bottom-up) and back-forward (top-down) manner and then makes more efficient recognition with Softmax regress by supervised learning. As a comparison with the algorithms only based on supervised learning, we again propose and design many kinds of classifiers: BP, HBPNNs, RBF, HRBFNNs, SVM and multiple classification decision fusion classifier (MCDFC)—hybrid HBPNNs-HRBFNNs-SVM classifier. The conducted experiments validate: Firstly, the proposed DBNESR is optimal for face recognition with the highest and most stable recognition rates;second, the algorithm combining unsupervised and supervised learning has better effect than all supervised learning algorithms;third, hybrid neural networks have better effect than single model neural network;fourth, the average recognition rate and variance of these algorithms in order of the largest to the smallest are respectively shown as DBNESR, MCDFC, SVM, HRBFNNs, RBF, HBPNNs, BP and BP, RBF, HBPNNs, HRBFNNs, SVM, MCDFC, DBNESR;at last, it reflects hierarchical representations of feature by DBNESR in terms of its capability of modeling hard artificial intelligent tasks.
基金National Natural Science Foundation of China(No.61972080)Shanghai Rising-Star Program,China(No.19QA1400300)。
文摘With the wide application of location-based social networks(LBSNs),personalized point of interest(POI)recommendation becomes popular,especially in the commercial field.Unfortunately,it is challenging to accurately recommend POIs to users because the user-POI matrix is extremely sparse.In addition,a user's check-in activities are affected by many influential factors.However,most of existing studies capture only few influential factors.It is hard for them to be extended to incorporate other heterogeneous information in a unified way.To address these problems,we propose a meta-path-based deep representation learning(MPDRL)model for personalized POI recommendation.In this model,we design eight types of meta-paths to fully utilize the rich heterogeneous information in LBSNs for the representations of users and POIs,and deeply mine the correlations between users and POIs.To further improve the recommendation performance,we design an attention-based long short-term memory(LSTM)network to learn the importance of different influential factors on a user's specific check-in activity.To verify the effectiveness of our proposed method,we conduct extensive experiments on a real-world dataset,Foursquare.Experimental results show that the MPDRL model improves at least 16.97%and 23.55%over all comparison methods in terms of the metric Precision@N(Pre@N)and Recall@N(Rec@N)respectively.
基金What is more,we thank the National Natural Science Foundation of China(Nos.61966039,62241604)the Scientific Research Fund Project of the Education Department of Yunnan Province(No.2023Y0565)Also,this work was supported in part by the Xingdian Talent Support Program for Young Talents(No.XDYC-QNRC-2022-0518).
文摘Many network presentation learning algorithms(NPLA)have originated from the process of the random walk between nodes in recent years.Despite these algorithms can obtain great embedding results,there may be also some limitations.For instance,only the structural information of nodes is considered when these kinds of algorithms are constructed.Aiming at this issue,a label and community information-based network presentation learning algorithm(LC-NPLA)is proposed in this paper.First of all,by using the community information and the label information of nodes,the first-order neighbors of nodes are reconstructed.In the next,the random walk strategy is improved by integrating the degree information and label information of nodes.Then,the node sequence obtained from random walk sampling is transformed into the node representation vector by the Skip-Gram model.At last,the experimental results on ten real-world networks demonstrate that the proposed algorithm has great advantages in the label classification,network reconstruction and link prediction tasks,compared with three benchmark algorithms.
文摘Network anomaly detection plays a vital role in safeguarding network security.However,the existing network anomaly detection task is typically based on the one-class zero-positive scenario.This approach is susceptible to overfitting during the training process due to discrepancies in data distribution between the training set and the test set.This phenomenon is known as prediction drift.Additionally,the rarity of anomaly data,often masked by normal data,further complicates network anomaly detection.To address these challenges,we propose the PUNet network,which ingeniously combines the strengths of traditional machine learning and deep learning techniques for anomaly detection.Specifically,PUNet employs a reconstruction-based autoencoder to pre-train normal data,enabling the network to capture potential features and correlations within the data.Subsequently,PUNet integrates a sampling algorithm to construct a pseudo-label candidate set among the outliers based on the reconstruction loss of the samples.This approach effectively mitigates the prediction drift problem by incorporating abnormal samples.Furthermore,PUNet utilizes the CatBoost classifier for anomaly detection to tackle potential data imbalance issues within the candidate set.Extensive experimental evaluations demonstrate that PUNet effectively resolves the prediction drift and data imbalance problems,significantly outperforming competing methods.
基金supported by the Fundamental Research Funds for the Central Universities(Grant No.2022JKF02039).
文摘Graph Neural Networks(GNNs)play a significant role in tasks related to homophilic graphs.Traditional GNNs,based on the assumption of homophily,employ low-pass filters for neighboring nodes to achieve information aggregation and embedding.However,in heterophilic graphs,nodes from different categories often establish connections,while nodes of the same category are located further apart in the graph topology.This characteristic poses challenges to traditional GNNs,leading to issues of“distant node modeling deficiency”and“failure of the homophily assumption”.In response,this paper introduces the Spatial-Frequency domain Adaptive Heterophilic Graph Neural Networks(SFA-HGNN),which integrates adaptive embedding mechanisms for both spatial and frequency domains to address the aforementioned issues.Specifically,for the first problem,we propose the“Distant Spatial Embedding Module”,aiming to select and aggregate distant nodes through high-order randomwalk transition probabilities to enhance modeling capabilities.For the second issue,we design the“Proximal Frequency Domain Embedding Module”,constructing adaptive filters to separate high and low-frequency signals of nodes,and introduce frequency-domain guided attention mechanisms to fuse the relevant information,thereby reducing the noise introduced by the failure of the homophily assumption.We deploy the SFA-HGNN on six publicly available heterophilic networks,achieving state-of-the-art results in four of them.Furthermore,we elaborate on the hyperparameter selection mechanism and validate the performance of each module through experimentation,demonstrating a positive correlation between“node structural similarity”,“node attribute vector similarity”,and“node homophily”in heterophilic networks.