期刊文献+

基于阻塞先验知识的文本层次分类模型 被引量:4

Hierarchical Text Classification Model Based on Blocking Priori Knowledge
原文传递
导出
摘要 文本层次分类中阻塞现象是影响层次分类器性能的重要原因.针对这一问题,提出基于阻塞先验知识的文本层次分类模型.该模型包括两部分:首先对阻塞分布进行估计,提出"阻塞对"识别技术,重点在于获取严重的阻塞方向;其次,把分析出的阻塞先验知识融合到分类过程中,利用层次拓扑结构修正算法,引导阻塞文本"回归"正确分类路径.在中文语料TanCorp上的实验表明,该算法在没有额外增加分类器数目的前提下,能有效改善层次分类性能,是解决层次分类阻塞问题的一种方法.另外,与平面分类算法比较后,该算法更稳定. Blocking exerts negative effect on the performance of text hierarchical classification. In this paper, a two-step hierarchical text classification model based on blocking priori knowledge is proposed to address the problem. Firstly, blocking distribution is estimated and blocking pair recognition technique focusing on mining the serious blocking direction is presented. Secondly, the hierarchy topology structure is actively refined which attempts to correct misclassification and reduce blocking errors by using blocking priori knowledge. The experimental results on TanCorp, which is a new corpus special for Chinese.text classification, show that the model can improve the performance significantly without increasing the extra number of classifiers and is a method of solving the hierarchical classification blocking problem. In addition, compared with fiat text classification algorithm, this method has stable performance.
出处 《模式识别与人工智能》 EI CSCD 北大核心 2010年第4期456-463,共8页 Pattern Recognition and Artificial Intelligence
基金 国家自然科学基金(No.60475019 60775036 60970061) 教育部博士点专项基金(No.20060247039)资助项目
关键词 阻塞 文本分类 层次结构 先验知识 动态修正 Blocking, Text Classification, Hierarchical Structure, Priori Knowledge, DynamicRefinement
  • 相关文献

参考文献17

  • 1Sun Aixin,Lim E P,Ng W K.Performance Measurement Framework for Hierarchical Text Classification.Journal of the American Society for Information Science and Technology,2003,54(11):1014-1028.
  • 2Ceci M,Malerba D.Classifying Web Documents in a Hierarchy of Categories:A Comprehensive Study.Journal of Intelligent Information Systems,2007,28(1):37-78.
  • 3Mladenic D,Grobelnik M.Feature Selection on Hierarchy of Web Documents.Decision Support Systems,2003,35 (1):45-87.
  • 4Vinokourov A,Girolami M.A Probabilistic Framework for the Hierarchic Organization and Classification of Document Collections.Journal of Intelligent Information Systems,2002,18(2/3):153-172.
  • 5Ruiz M E,Srinivasan P.Hierarchical Text Categorization Using Neural Networks.Information Retrieval,2002,5(1):87-118.
  • 6苏金树,张博锋,徐昕.基于机器学习的文本分类技术研究进展[J].软件学报,2006,17(9):1848-1859. 被引量:386
  • 7刘少辉,董明楷,张海俊,李蓉,史忠植.一种基于向量空间模型的多层次文本分类方法[J].中文信息学报,2002,16(3):8-14. 被引量:75
  • 8熊云波,李荣陆,胡运发.基于混淆矩阵的层次结构构造方法比较[J].模式识别与人工智能,2007,20(2):205-210. 被引量:6
  • 9Greiner R,Grove A,Schuurmans D.On learning Hierarchical Classifications[EB/OL].[2005-03-05].http://citeseer.nj.nec.com/article/greiner97learning,html.
  • 10Dumais S T,Chen Hao.Hierarchical Classification of Web Content //Proc of the 23rd ACM international Conference on Research and Development in Information Retrieval.Athens,Greece,2000:256-263.

二级参考文献65

共引文献815

同被引文献48

引证文献4

二级引证文献28

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部