期刊文献+

基于PLSA学习概率分布语义信息的多标签分类算法 被引量:8

Multi⁃label classification algorithm based on PLSA learning probability distribution semantic information
下载PDF
导出
摘要 多标签算法大多利用特征与标签嵌入等方法挖掘标签空间的语义信息,但这类方法没有利用特征与标签间可能存在的某种联系.类属属性的提出较好地诠释了特征与标签的联系,即标签可能对应一组自身的特征,然而这类方法未能给出特征与标签间可能存在的逻辑关系,也未证实标签与实例间可能存在同样的逻辑关系.因此,提出基于PLSA(Probabilistic Latent Semantic Analysis)学习概率分布语义信息的新型多标签分类算法.首先认为样本矩阵存在一种隐含变量作为标签,利用PLSA模型获取特征⁃标签与标签⁃实例条件概率分布矩阵,以条件概率分布的形式解释它们之间可能存在的联系;其次,建立模型学习概率分布矩阵中存在的语义信息,并应用于多标签算法的标签预测与分类;最后在13个公开的多标签文本类型的数据集上进行实验与统计假设检验,并与其他多标签分类算法对比.实验结果表明,提出的学习概率分布语义信息用于提高多标签算法的性能存在一定的合理性. In multi⁃label algorithms,features and label embedding are wildly used to mine the semantic information of the label space.However,these methods do not take advantage of the possible correlation information between features and labels.In the research of multi⁃label label⁃specific features algorithms,using correlation information among labels,among features and reshaping the label space are the major methods to improve the algorithm.However,this type of method fails to give a logical relationship between the feature and the label,and whether the label and the instance may have the same logical relationship.How to use these two semantic information to improve the performance of the multi⁃label algorithm is worthy of research.Therefore,this paper proposes a new multi⁃label classification algorithm based on PLSA(Probabilistic Latent Semantic Analysis)to learn the semantic information of probability distribution.Firstly,we consider that there is a latent variable in the sample matrix as the label.The feature⁃label and label⁃instance conditional probability distribution matrices are obtained using the PLSA model,and the possible relationships of them are explained in the form of conditional probability distributions.Secondly,the model learns the semantic information existing in the probability distribution matrix and applies it to the label prediction and classification of the multi⁃label algorithm.Finally,the proposed algorithm is compared with other multi⁃label algorithms on 13 public multi⁃label text type datasets.The statistical hypothesis tests illustrate the effect of the proposed algorithm.The experimental results show that the proposed algorithm improves the performance of the multi⁃label algorithm by learning the semantic information of the probability distribution is reasonable.
作者 王一宾 郑伟杰 程玉胜 曹天成 Wang Yibin;Zheng Weijie;Cheng Yusheng;Cao Tiancheng(School of Computer and Information,Anqing Normal University,Anqing,246113,China;The University Key Laboratory of Intelligent Perception and Computing of Anhui Province,Anqing Normal University,Anqing,246133,China)
出处 《南京大学学报(自然科学版)》 CAS CSCD 北大核心 2021年第1期75-89,共15页 Journal of Nanjing University(Natural Science)
基金 国家自然科学基金(617022012)。
关键词 多标签学习 概率分布 语义分析 标签相关性 multi⁃label learning probability distributions semantic analysis label correlations
  • 相关文献

参考文献3

二级参考文献14

共引文献37

同被引文献27

引证文献8

二级引证文献12

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部