摘要
多标签分类在基因分类,药物发现和文本分类等实际问题中有着广泛的应用.已存在的多标签分类算法,通常都是从网络中随机的选取节点作为训练集.然而,在分类算法执行的过程中,网络中不同节点所起的作用不同.在给定训练集数目的情况下,选择的训练集不同,分类精度也会不同.所以我们引入了种子节点的概念,标签分类从种子节点开始,经过不断推理,得到网络中其他所有节点的标签.本文提出了SHDA(Nodes Selection of High Degree from Each Affiliation)算法,即从网络的每个社团中,按比例的选取度数较大的节点,然后将其合并,处理后得到种子节点.真实数据集上的实验表明,将种子节点用作训练集进行多标签分类,能够提升网络环境下多标签分类的准确率.
Multi-label classification is widely used in genetic classification,drug discovery and text classification. The existing multi-label classification algorithms usually select nodes randomly from the network as their training set. However,during multi-label classification,different nodes have different effects. Given the number of nodes in the training set,a different training sub-set can lead to different classification accuracy. Hence,we introduce the concept of seed nodes,the classification procedure starts from the seed nodes,and after continuous reasoning,the labels of other nodes are inferred in the network. We propose an SHDA algorithm( Nodes Selection of High Degree from Each Affiliation) in which the nodes of high degrees from each affiliation belonging to the network are selected and merged,and after processing,the seed nodes are obtained. Experiments on several real-world datasets demonstrate that taking seed nodes as the training set to classify multi-labeled data can improve the classification performance.
出处
《电子学报》
EI
CAS
CSCD
北大核心
2016年第9期2074-2080,共7页
Acta Electronica Sinica
基金
国家重点基础研究发展规划(973计划)项目(No.2013CB329604)
教育部创新团队(No.IRT13059)
国家自然科学基金项目(No.61229301
No.61503114)