期刊文献+

一种基于图的层次多标记文本分类方法 被引量:1

Graph-based method for hierarchical multi-lable text classification
下载PDF
导出
摘要 由于一个类别在层次树上可能存在多个镜像,基于层次树来进行分类可能会导致不一致性。一种自然的解决方法是采用图结构来描述类别关系,在现实生活中人们实际的描述方式也是如此。鉴于此,提出了一种直接基于图的层次多标记分类方法,称为GraphHMLTC。该方法利用有向无圈图的拓扑排序而非树的自顶向下的层次关系来确定类别之间的分类顺序,并且该拓扑序根据分类情形进行动态维护。实验表明,采用层次图分类的GraphHMLTC方法比非层次分类方法的代表之一BoosTexter.MH在较大程度上改善了分类精度。该工作体现了基于层次图的分类方法的可行性和优越性。 Most of existing hierarchical text classification methods is based on a hierarchical category tree. However, such a tree structure maybe leads to some kinds of inconsistency for the reason of multiple images of a category on it. A nature solution for this is to adopt a hierarchical graph structure, which is a practical way to depict category relationships in a real world. So this paper presented a novel method for muhi-lable text classification directly based on a hierarchical graph, called GraphHM- LTC. Determined the classification order among categories by a topological sorting of vertexes in a graph ( in fact, a directed acyclic graph), not by a hierarchical structure from top to down in a tree. Also, dynamically maintained the topological sorting according to the classification situation. Experiment results show that the method improves the classification accuracy in a great degree, compared to a representative of non-hierarchical muhi-lable classification methods, BoosTexter. MH. Therefore, this work reveals that a graph-based classification method is feasible and superior.
作者 罗俊
出处 《计算机应用研究》 CSCD 北大核心 2010年第3期909-912,共4页 Application Research of Computers
关键词 文本分类 层次分类 多标记分类 有向无圈图 拓扑排序 text classifieation(TC) hierarchical classification multi-lable classification directed acyclie graph topological sorting
  • 相关文献

参考文献20

  • 1苏金树,张博锋,徐昕.基于机器学习的文本分类技术研究进展[J].软件学报,2006,17(9):1848-1859. 被引量:378
  • 2郝秀兰,陶晓鹏,徐和祥,胡运发.kNN文本分类器类偏斜问题的一种处理对策[J].计算机研究与发展,2009,46(1):52-61. 被引量:33
  • 3周炎涛,唐剑波,吴正国.基于向量空间模型的多主题Web文本分类方法[J].计算机应用研究,2008,25(1):142-144. 被引量:14
  • 4TSOUMAKAS G, KATAKIS I. Multi-label classification:an overview [J]. International Journal of Data Warehousing and Mining, 2007,3(3) :1-13.
  • 5TSOUMAKAS G, VLAHAVAS I. Random k-1abelsets: an ensemble method for muhilabel classification [ C ]//Proc of the 18th European Conference on Machine Learning: Springer, 2007:406-417.
  • 6TSOUMAKAS G, KATAKIS I, VLAHAVAS I. Mining multi-label data[ K]//Data Mining and Knowledge Discovery Handbook. 2nd ed. New York :Springer, 2009: 1383.
  • 7FROMMHOLZ I. Categorizing Web documents in hierarchical catalogues[ C]//Proc of the 23rd European Colloquium on Information Retrieval Research. Darmstadt, Delaware: Springer, 2001.
  • 8SUN A, LIME P, NG W K, et al. Blocking reduction strategies in hierarchical text classification[ J]. IEEE Trans on Knowledge and Data Engineering, 2004,16 ( 10 ) : 1305-1308.
  • 9ROUSU J, SAUNDERS C, SZEDMAK S,et al. Learning hierarchical multi-category text classification models[ C]//Proc of the 22nd International Conference on Machine Learning. New York: ACM Press, 2005:744-751.
  • 10SUN Ai-xin, LIME P. Hierarchical text classification and evaluation [ C ]//Proc of IEEE International Conference on Data Mining. Washington DC: IEEE Computer Society, 2001 : 521-528.

二级参考文献46

共引文献426

同被引文献4

引证文献1

二级引证文献9

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部