期刊文献+

基于特征信息增益权重的文本分类算法 被引量:19

Classifying Text Corpus Based on Information Gain Weight of Feature
下载PDF
导出
摘要 为了在分类精度不受损失的情况下提高训练速度,设计了3种基于信息增益(information gain,简称IG) 特征权重的分类算法,分别被命名为:IG-C1、IG-C2、IG-C.它们根据特征对IG贡献的大小及在新文本中出现的次数进行分类.这3种算法都具有较低的时间复杂度和实现简单的特点.实验结果表明,其中IG-C的分类效果最为理想. In order to improve the training speed of classifiers without losing their accuracy, three classifying algorithms based on information gain of features are provided in this work. They are IG-C1, IG-C2 and IG-C, which classifies unlabeled text according to features' weight generated in feature selection phase. All these approaches have two characteristics: lower time complexity and simpler implementation. The performance comparison between these algorithms and Naive Bayes, Vector Space Model using retuers 21578 and 20 newsgroup data sets, shows that IG-C algorithm is best one.
出处 《北京工业大学学报》 EI CAS CSCD 北大核心 2006年第5期456-460,共5页 Journal of Beijing University of Technology
基金 国家自然科学基金资助项目(60173014) 北京市自然科学基金资助项目(4022003)
关键词 文本处理 信息分类 特征提取 text processing classification of information feature extraction entropy
  • 相关文献

参考文献6

  • 1SEBASTIANI F.Machine learning in automated text categorization[J].ACM Computer Survey,2002,34(1):1-47.
  • 2YANG Y,PEDERSEN J O.A comparative study on feature selection in text categorization[C]//FISHER D H.Proc.of the 14th International Conference on Machine Learning ICML97.San Francisco:Morgan Kaufmann Publishers,1997:412-420.
  • 3陆玉昌,鲁明羽,李凡,周立柱.向量空间法中单词权重函数的分析和构造[J].计算机研究与发展,2002,39(10):1205-1210. 被引量:126
  • 4MITCHELL T M.Machine learning[M].Beijing:China Machine Press,2003.
  • 5McCALLUM A,NIGAM K.A comparison of event models for Naive Bayes text classification:AAAI-98 Workshop on Learning for Text Categorization[R].Madison:AAAI Press,1998:41-48.
  • 6LI Wen-bin,ZHONG Ning,LIU Chun-nian.Design and implementation of an E-mail classifier[C]//The 2nd International Conference on Active Media Technology.Hongkong:World Scientific,2003:423-430.

二级参考文献1

共引文献125

同被引文献127

引证文献19

二级引证文献139

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部