摘要
由于一个类别在层次树上可能存在多个镜像,基于层次树来进行分类可能会导致不一致性。一种自然的解决方法是采用图结构来描述类别关系,在现实生活中人们实际的描述方式也是如此。鉴于此,提出了一种直接基于图的层次多标记分类方法,称为GraphHMLTC。该方法利用有向无圈图的拓扑排序而非树的自顶向下的层次关系来确定类别之间的分类顺序,并且该拓扑序根据分类情形进行动态维护。实验表明,采用层次图分类的GraphHMLTC方法比非层次分类方法的代表之一BoosTexter.MH在较大程度上改善了分类精度。该工作体现了基于层次图的分类方法的可行性和优越性。
Most of existing hierarchical text classification methods is based on a hierarchical category tree. However, such a tree structure maybe leads to some kinds of inconsistency for the reason of multiple images of a category on it. A nature solution for this is to adopt a hierarchical graph structure, which is a practical way to depict category relationships in a real world. So this paper presented a novel method for muhi-lable text classification directly based on a hierarchical graph, called GraphHM- LTC. Determined the classification order among categories by a topological sorting of vertexes in a graph ( in fact, a directed acyclic graph), not by a hierarchical structure from top to down in a tree. Also, dynamically maintained the topological sorting according to the classification situation. Experiment results show that the method improves the classification accuracy in a great degree, compared to a representative of non-hierarchical muhi-lable classification methods, BoosTexter. MH. Therefore, this work reveals that a graph-based classification method is feasible and superior.
出处
《计算机应用研究》
CSCD
北大核心
2010年第3期909-912,共4页
Application Research of Computers
关键词
文本分类
层次分类
多标记分类
有向无圈图
拓扑排序
text classifieation(TC)
hierarchical classification
multi-lable classification
directed acyclie graph
topological sorting