期刊文献+

融合相似度图和随机游走模型的多标签短文本分类算法 被引量:4

A short text multi-label classification method combining similarity graph and random walk model
下载PDF
导出
摘要 提出一种融合相似度图和随机游走模型的多标签短文本分类算法。首先,以样本数据和标签为节点创建相似度图,借助外部知识库计算样本与标签之间的权重,得到预测样本与标签集合之间的匹配度。然后,将多标签数据映射成多标签依赖图,在图上进行重启随机游走,并将已获得的匹配度作为初始预测值,计算每个节点的概率分布,直到概率分布趋于稳定时,节点的概率分布即为标签的概率分布,进而确定预测文本的标签集。实验结果表明,本文提出的算法有较好的多标签文本分类性能,与同类算法相比较,分类性能显著提升。 A short text multi-label classification algorithm combining similarity graph and random walk model is proposed.Firstly,the sample data and labels are used as nodes to create a similarity graph,and the weight between the sample and the label is calculated with the help of an external know-ledge base to obtain the matching degree between the predicted sample and the label set.Secondly,the multi-label data are mapped into a multi-label dependency graph.A random walk is performed on the graph,and the previous matching degree is used as the initial prediction value to calculate the probability distribution of each node.When the probability distribution tends to be stable,the probability distribution of the node is the probability distribution of the label,and then the label set of the predicted text is determined.The experimental results show that the proposed method achieves better performance in the classification of multi-label texts.Compared with similar algorithms,the classification performance is significantly improved.
作者 李晓红 王闪闪 马堉银 马慧芳 LI Xiao-hong;WANG Shan-shan;MA Yu-yin;MA Hui-fang(College of Computer Science and Engineering,Northwest Normal University,Lanzhou 730070,China)
出处 《计算机工程与科学》 CSCD 北大核心 2021年第6期1081-1087,共7页 Computer Engineering & Science
基金 国家自然科学基金(61762078,61967013) 高等学校创新创业基金(2020B-089) 甘肃省科技计划(20JR5RA518) 甘肃省自然科学基金(20JR10RA076)。
关键词 多标签短文本分类 相似度图 重启随机游走 语义网WordNet multi-label short text classification similarity graph restart random walk WordNet
  • 相关文献

参考文献3

二级参考文献22

  • 1Streich A, Buhmann J.Classification of multi-labeled data: a generative approaeh[C]//Proc of the ECML/PKDD,Antwerp, Belgium, 2008,2 : 390-405.
  • 2Clare A, King R.Knowledge discovery in multi-label phenotype data[C]//Proc of the 5th European Conference on Principles of Data Mining and Knowledge Discovery.London:Springer-Verlag,2001:42-53.
  • 3Gjorgjevikj D, Madjarov G.Two stage classifier chain architecture for efficient pair-wise multi-label learning[C]//Proc of the IEEE International Workshop on Machine Learning for Signal Processing, 2011.
  • 4Trohidis K, Tsoumarkas G, Kalliris G, et al.Multi-label classi- fication of music into emotions[C]//Proc of International Conference on Music Information Retrieval.Berlin: Springer, 2008:307-315.
  • 5Comite F D,Gilleron R,Tommasi M.Learning multi-label al- ternating decision trees from texts and data[C]//Proc of the 3rd Int Conf on Machine Learning and Data Mining in Pattern Recognition(MLDM 03).Berlin:Springer,2003:35-49.
  • 6Elisseeff A, Weston J.A kernel method for multi-labeled classification[C]//Advances in Neural Information Process- ing Systems 14 (NIPS 01).Cambridge, MA: MIT Press, 2002 : 681-687.
  • 7Boutell M R, Luo J, Shen X, et al.Learning multi-label scene classifieation[J].Pattem Recognition,2004,37(9) : 1757-1771.
  • 8Tsoumakas G.Multi-label classification[J].International Journal of Data Warehousing & Mining,2007,3(3) : 12-16.
  • 9Schapire R, Singer Y.BoosTexter: A boosting-based system for text categorization[J].Machine Learning, 2000,39 (2) : 135-168.
  • 10Zhang M L, Zhou Z H.Multi-label neural net-works with applications to functional genomics and text categorization[J].IEEE Transactions on Knowledge and Data Engineering, 2006, 18 : 1338-1351.

共引文献22

同被引文献44

引证文献4

二级引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部