期刊文献+

基于路径选择的层次多标签分类 被引量:2

Hierarchy Multi-label Classification Based on Path Selection
下载PDF
导出
摘要 多标签分类为每一个实例分配多个标签,当这些标签存在一种预定义的层次化结构时,该机器学习任务称为层次多标签分类(HMC)。传统的分类问题(二分类和多标签分类)往往会忽略各标签之间的结构关系,而层次多标签分类充分考虑标签集之间的层次结构关系,并以此来提高分类的效果。层次多标签分类是输出结构化预测结果的分类任务,其中类标签被组织成某种预定义(树形或者有向无环图)的结构,并且一个实例可以属于多个类。在HMC中有基于全局标签集的分类方法和基于单个标签的局部分类方法。全局方法将整个问题作为一个整体来处理,但往往会随着数据集的增长而出现性能瓶颈,而局部方法将问题分解为基于单个标签的二分类方法,但未充分考虑层次结构信息,并且无法处理预测节点终止于层次标签树内节点的分类问题。在分类阶段,修剪掉概率较低的分支,达到预测标签不一定到达叶子节点的目的。基于路径选择的层次多标签分类充分考虑修剪后的层次标签树从根节点出发的所有可能路径,结合各节点的预测概率值和节点所在的层次来选出得分最高的标签路径。该方法和现有的层次多标签分类方法在三种不同的数据集上进行实验对比,结果表明该方法在处理层次较深且叶子节点稠密的层次结构时获得了较好的结果。 Multi-label classification assigns more than one label for each instance when the labels are ordered in a predefined structure.The task is called hierarchical multi-label classification (HMC). Traditional classification problems (binary classification and multi-label classification) tend to ignore the structural relationship between the labels,and hierarchical multi-label classification takes full accountof the hierarchical relationship between the label sets,thus improving the classification effect. HMC is a task of structured output prediction where the classes are organized into a hierarchy and an instance may belong to multiple classes. The hierarchy structure that organizes the set of classes can assume the form of a tree or of a directed acyclic graph (DAG). In HMC there are global and local approaches.Global approaches treat the problem as whole but tend to explode with large datasets. Local approaches divide the problem into local subproblems,but usually do not exploit the information of the hierarchy. The hierarchical multi-label classification based on path selectionstudies the problem that the classification label does not reach the leaf node of the label tree. In the classification phase,the branches withlow probability to occur are pruned,performing non-mandatory leaf node prediction. This method evaluates each possible path from theroot of the hierarchy,taking into account the prediction value and the level of the nodes,selecting one or more label paths whose score isabove a threshold. It has been tested in three datasets with tree hierarchy structured hierarchies against a number of state-of-the-artmethods. The experiment shows that this method can obtain superior results when dealing with deep and populated hierarchies.
作者 张春焰 李涛 刘峥 ZHANG Chun-yan;LI Tao;LIU Zheng(School of Computer Science,Nanjing University of Posts and Telecommunications,Nanjing 210046,China)
出处 《计算机技术与发展》 2018年第10期37-43,共7页 Computer Technology and Development
基金 2015年教育部-中国移动科研基金项目(5-10) 江苏省自然科学基金面上项目(BK20171447) 江苏省高校自然科学研究面上项目(17JKB520024)
关键词 层次多标签分类 多标签学习 路径选择 层次分类 文本分类 层次标签树 剪枝 hierarchical multi-label classification multi-label learning path selection hierarchical classification text classification hier鄄archical label tree pruning
  • 相关文献

参考文献4

二级参考文献7

共引文献9

同被引文献16

引证文献2

二级引证文献10

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部