摘要
提出一种融合相似度图和随机游走模型的多标签短文本分类算法。首先,以样本数据和标签为节点创建相似度图,借助外部知识库计算样本与标签之间的权重,得到预测样本与标签集合之间的匹配度。然后,将多标签数据映射成多标签依赖图,在图上进行重启随机游走,并将已获得的匹配度作为初始预测值,计算每个节点的概率分布,直到概率分布趋于稳定时,节点的概率分布即为标签的概率分布,进而确定预测文本的标签集。实验结果表明,本文提出的算法有较好的多标签文本分类性能,与同类算法相比较,分类性能显著提升。
A short text multi-label classification algorithm combining similarity graph and random walk model is proposed.Firstly,the sample data and labels are used as nodes to create a similarity graph,and the weight between the sample and the label is calculated with the help of an external know-ledge base to obtain the matching degree between the predicted sample and the label set.Secondly,the multi-label data are mapped into a multi-label dependency graph.A random walk is performed on the graph,and the previous matching degree is used as the initial prediction value to calculate the probability distribution of each node.When the probability distribution tends to be stable,the probability distribution of the node is the probability distribution of the label,and then the label set of the predicted text is determined.The experimental results show that the proposed method achieves better performance in the classification of multi-label texts.Compared with similar algorithms,the classification performance is significantly improved.
作者
李晓红
王闪闪
马堉银
马慧芳
LI Xiao-hong;WANG Shan-shan;MA Yu-yin;MA Hui-fang(College of Computer Science and Engineering,Northwest Normal University,Lanzhou 730070,China)
出处
《计算机工程与科学》
CSCD
北大核心
2021年第6期1081-1087,共7页
Computer Engineering & Science
基金
国家自然科学基金(61762078,61967013)
高等学校创新创业基金(2020B-089)
甘肃省科技计划(20JR5RA518)
甘肃省自然科学基金(20JR10RA076)。