Many network presentation learning algorithms(NPLA)have originated from the process of the random walk between nodes in recent years.Despite these algorithms can obtain great embedding results,there may be also some l...Many network presentation learning algorithms(NPLA)have originated from the process of the random walk between nodes in recent years.Despite these algorithms can obtain great embedding results,there may be also some limitations.For instance,only the structural information of nodes is considered when these kinds of algorithms are constructed.Aiming at this issue,a label and community information-based network presentation learning algorithm(LC-NPLA)is proposed in this paper.First of all,by using the community information and the label information of nodes,the first-order neighbors of nodes are reconstructed.In the next,the random walk strategy is improved by integrating the degree information and label information of nodes.Then,the node sequence obtained from random walk sampling is transformed into the node representation vector by the Skip-Gram model.At last,the experimental results on ten real-world networks demonstrate that the proposed algorithm has great advantages in the label classification,network reconstruction and link prediction tasks,compared with three benchmark algorithms.展开更多
XML data can be represented by a tree or graph and the query processing for XML data requires the structural information among nodes. Designing an efficient labeling scheme for the nodes of Order-Sensitive XML trees i...XML data can be represented by a tree or graph and the query processing for XML data requires the structural information among nodes. Designing an efficient labeling scheme for the nodes of Order-Sensitive XML trees is one of the important methods to obtain the excellent management of XML data. Previous labeling schemes such as region and prefix often sacrifice updating performance and suffer increasing labeling space when inserting new nodes. To overcome these limitations, in this paper we propose a new labeling idea of separating structure from order. According to the proposed idea, a novel Prime-based Middle Fraction Labeling Scheme(PMFLS) is designed accordingly, in which a series of algorithms are proposed to obtain the structural relationships among nodes and to support updates. PMFLS combines the advantages of both prefix and region schemes in which the structural information and sequential information are separately expressed. PMFLS also supports Order-Sensitive updates without relabeling or recalculation, and its labeling space is stable. Experiments and analysis on several benchmarks are conducted and the results show that PMFLS is efficient in handling updates and also significantly improves the performance of the query processing with good scalability.展开更多
In the realm of Multi-Label Text Classification(MLTC),the dual challenges of extracting rich semantic features from text and discerning inter-label relationships have spurred innovative approaches.Many studies in sema...In the realm of Multi-Label Text Classification(MLTC),the dual challenges of extracting rich semantic features from text and discerning inter-label relationships have spurred innovative approaches.Many studies in semantic feature extraction have turned to external knowledge to augment the model’s grasp of textual content,often overlooking intrinsic textual cues such as label statistical features.In contrast,these endogenous insights naturally align with the classification task.In our paper,to complement this focus on intrinsic knowledge,we introduce a novel Gate-Attention mechanism.This mechanism adeptly integrates statistical features from the text itself into the semantic fabric,enhancing the model’s capacity to understand and represent the data.Additionally,to address the intricate task of mining label correlations,we propose a Dual-end enhancement mechanism.This mechanism effectively mitigates the challenges of information loss and erroneous transmission inherent in traditional long short term memory propagation.We conducted an extensive battery of experiments on the AAPD and RCV1-2 datasets.These experiments serve the dual purpose of confirming the efficacy of both the Gate-Attention mechanism and the Dual-end enhancement mechanism.Our final model unequivocally outperforms the baseline model,attesting to its robustness.These findings emphatically underscore the imperativeness of taking into account not just external knowledge but also the inherent intricacies of textual data when crafting potent MLTC models.展开更多
Domain name system(DNS),as one of the most critical internet infrastructure,has been abused by various cyber attacks.Current malicious domain detection capabilities are limited by insufficient credible label informati...Domain name system(DNS),as one of the most critical internet infrastructure,has been abused by various cyber attacks.Current malicious domain detection capabilities are limited by insufficient credible label information,severe class imbalance,and incompact distribution of domain samples in different malicious activities.This paper proposes a malicious domain detection framework named PUMD,which innovatively introduces Positive and Unlabeled(PU)learning solution to solve the problem of insuffcient label information,adopts customized sample weight to improve the impact of class imbalance,and effectively constructs evidence features based on resource overlapping to reduce the intra-class distance of malicious samples.Besides,a feature selection strategy based on permutation importance and binning is proposed to screen the most informative detection features.Finally,we conduct experiments on the open source real DNS traffic dataset provided by QI-ANXIN Technology Group to evaluate the PUMD framework's abil-ity to capture potential command and control(C&C)domains for malicious activities.The experimental results prove that PUMD can achieve the best detection performance under different label frequencies and class imbalance ratios.展开更多
基金What is more,we thank the National Natural Science Foundation of China(Nos.61966039,62241604)the Scientific Research Fund Project of the Education Department of Yunnan Province(No.2023Y0565)Also,this work was supported in part by the Xingdian Talent Support Program for Young Talents(No.XDYC-QNRC-2022-0518).
文摘Many network presentation learning algorithms(NPLA)have originated from the process of the random walk between nodes in recent years.Despite these algorithms can obtain great embedding results,there may be also some limitations.For instance,only the structural information of nodes is considered when these kinds of algorithms are constructed.Aiming at this issue,a label and community information-based network presentation learning algorithm(LC-NPLA)is proposed in this paper.First of all,by using the community information and the label information of nodes,the first-order neighbors of nodes are reconstructed.In the next,the random walk strategy is improved by integrating the degree information and label information of nodes.Then,the node sequence obtained from random walk sampling is transformed into the node representation vector by the Skip-Gram model.At last,the experimental results on ten real-world networks demonstrate that the proposed algorithm has great advantages in the label classification,network reconstruction and link prediction tasks,compared with three benchmark algorithms.
基金supported by the National Science Foundation of China(Grant No.61272067,61370229)the National Key Technology R&D Program of China(Grant No.2012BAH27F05,2013BAH72B01)+1 种基金the National High Technology R&D Program of China(Grant No.2013AA01A212)the S&T Projects of Guangdong Province(Grant No.2016B010109008,2014B010117007,2015A030401087,2015B010109003,2015B010110002)
文摘XML data can be represented by a tree or graph and the query processing for XML data requires the structural information among nodes. Designing an efficient labeling scheme for the nodes of Order-Sensitive XML trees is one of the important methods to obtain the excellent management of XML data. Previous labeling schemes such as region and prefix often sacrifice updating performance and suffer increasing labeling space when inserting new nodes. To overcome these limitations, in this paper we propose a new labeling idea of separating structure from order. According to the proposed idea, a novel Prime-based Middle Fraction Labeling Scheme(PMFLS) is designed accordingly, in which a series of algorithms are proposed to obtain the structural relationships among nodes and to support updates. PMFLS combines the advantages of both prefix and region schemes in which the structural information and sequential information are separately expressed. PMFLS also supports Order-Sensitive updates without relabeling or recalculation, and its labeling space is stable. Experiments and analysis on several benchmarks are conducted and the results show that PMFLS is efficient in handling updates and also significantly improves the performance of the query processing with good scalability.
基金supported by National Natural Science Foundation of China(NSFC)(Grant Nos.62162022,62162024)the Key Research and Development Program of Hainan Province(Grant Nos.ZDYF2020040,ZDYF2021GXJS003)+2 种基金the Major Science and Technology Project of Hainan Province(Grant No.ZDKJ2020012)Hainan Provincial Natural Science Foundation of China(Grant Nos.620MS021,621QN211)Science and Technology Development Center of the Ministry of Education Industry-University-Research Innovation Fund(2021JQR017).
文摘In the realm of Multi-Label Text Classification(MLTC),the dual challenges of extracting rich semantic features from text and discerning inter-label relationships have spurred innovative approaches.Many studies in semantic feature extraction have turned to external knowledge to augment the model’s grasp of textual content,often overlooking intrinsic textual cues such as label statistical features.In contrast,these endogenous insights naturally align with the classification task.In our paper,to complement this focus on intrinsic knowledge,we introduce a novel Gate-Attention mechanism.This mechanism adeptly integrates statistical features from the text itself into the semantic fabric,enhancing the model’s capacity to understand and represent the data.Additionally,to address the intricate task of mining label correlations,we propose a Dual-end enhancement mechanism.This mechanism effectively mitigates the challenges of information loss and erroneous transmission inherent in traditional long short term memory propagation.We conducted an extensive battery of experiments on the AAPD and RCV1-2 datasets.These experiments serve the dual purpose of confirming the efficacy of both the Gate-Attention mechanism and the Dual-end enhancement mechanism.Our final model unequivocally outperforms the baseline model,attesting to its robustness.These findings emphatically underscore the imperativeness of taking into account not just external knowledge but also the inherent intricacies of textual data when crafting potent MLTC models.
基金This research is supported by National Key Research and Development Program of China(Nos.2021YFF0307203,2019QY1300)Youth Innovation Promotion Association CAS(No.2021156),the Strategic Priority Research Program of Chinese Academy of Sciences(No.XDC02040100)National Natural Science Foundation of China(No.61802404).
文摘Domain name system(DNS),as one of the most critical internet infrastructure,has been abused by various cyber attacks.Current malicious domain detection capabilities are limited by insufficient credible label information,severe class imbalance,and incompact distribution of domain samples in different malicious activities.This paper proposes a malicious domain detection framework named PUMD,which innovatively introduces Positive and Unlabeled(PU)learning solution to solve the problem of insuffcient label information,adopts customized sample weight to improve the impact of class imbalance,and effectively constructs evidence features based on resource overlapping to reduce the intra-class distance of malicious samples.Besides,a feature selection strategy based on permutation importance and binning is proposed to screen the most informative detection features.Finally,we conduct experiments on the open source real DNS traffic dataset provided by QI-ANXIN Technology Group to evaluate the PUMD framework's abil-ity to capture potential command and control(C&C)domains for malicious activities.The experimental results prove that PUMD can achieve the best detection performance under different label frequencies and class imbalance ratios.