期刊文献+

基于信息粒度的交叠类文本分类方法 被引量:7

A Text Categorization Method for Overlapping Classes Based on Information Granularity
下载PDF
导出
摘要 从信息粒度的角度分析了文本分类中出现样本错分的原因,同时结合人类认知方式,提出一种基于信息粒度的交叠类文本分类方法。新方法通过转换描述训练样本集合的粒度空间,对训练样本进行重新划分,加大训练样本之间的差异性,以此增加分类的先验知识;根据人类认知方式的特点,在划分后的训练样本集合上构建层次分类器进行分类。实验中采用了不同领域、不同类型的语料库,定量分析了类交叠程度对分类性能的影响并对新方法进行了测试。实验结果表明,新方法能够有效地提高分类性能,尤其适合于类交叠程度较高的情况。 The paper firstly analyses the cause of misclassification from the view of information granularity,then gives a method for classification of overlapping classes based on the characteristic of human cognitive style.The new method transfers granularity space that describes train corpus to redrawing trian samples in order to increase the difference between train samples and get more prior knowledge.Then,based on the characteristic of Human beings' cognitive style,new method builds a hierarchical classifier on new corpus.The experiments use corpuses with different types in different field to give quantitative analysis results about the effection of classes overlapping ratio on classification performance and test the performance of new method.The results show the new method can effectively improve classification performance,especially when the degree of classes overlapping is very high.
出处 《情报学报》 CSSCI 北大核心 2011年第4期339-346,共8页 Journal of the China Society for Scientific and Technical Information
基金 国家863项目“网络舆情态势分析与预警关键技术研究”基金资助
关键词 信息粒度 文本分类 认知方式 information granularity text categorization cognitive style
  • 相关文献

参考文献20

  • 1郑德权,李生,赵铁军,于浩.基于一种混合语言模型的自动文本分类技术研究[J].电子与信息学报,2007,29(3):601-605. 被引量:2
  • 2Moschitti A,Basili R. Complex linguistic features for text classification : A comprehensive study [ C ]//McDonald S, Tait J. Proceedings of the ECIR-04. Sunderland: Springer-Verlag. Sunderland, U. K. ,2004 : 181-196.
  • 3徐燕,李锦涛,王斌,孙春明.基于区分类别能力的高性能特征选择方法[J].软件学报,2008(1):82-89. 被引量:83
  • 4李文波,孙乐,张大鲲.基于Labeled-LDA模型的文本分类新算法[J].计算机学报,2008,31(4):620-627. 被引量:103
  • 5Garcia V, Alejo R, Sanchez J S,et. al. Combined effects of class imbalance and class overlap on instance-based classification [ C ]. IDEAL, 2006 : 371-378.
  • 6Orriols A, Bernardo E. The class imbalance problem in learning classifier systems: a preliminary study [ C ]// Proc. Conf. on Genetic and Evolutionary Computation, 2005:74-78.
  • 7Prati R C, Batista G E, Monard M. C:Class imbalance versus class overlapping an analysis of a learning system behavior[ C]//Proc. 3rd Mexican Intl. Conference on Artificial Intelligence,2004:312-321.
  • 8Li R L, Hu Y F. Nosice reduction to text catego-rization based on density for KNN [ C ]//Proceedings of the 2^nd International Conference on Machine Learning and Cybernetics. Xi' an,2003:3119-3124.
  • 9Zhou S G, Ling T W, Guan J H, et al. Fast Text Classification: A training-corpus pruning based approach [ C]//Proceedings of the 8th International Conference on Database Systems for Advanced Application. Los Alamitos : IEEE Computer Society ,2003 : 127-136.
  • 10Dehmeshki J, Karakoy M, Casique M V. A rule-based scheme for filtering examples from majority class in an imbalanced training set [ C ]//Proceedings of MLDM 2003 : 215-223.

二级参考文献44

共引文献279

同被引文献114

引证文献7

二级引证文献39

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部