期刊文献+

一种基于改进贝叶斯算法的Web文本分类方法 被引量:1

A Modified Complement Naive Bayes for Chinese Web Page Classification
下载PDF
导出
摘要 针对基于互补贝叶斯的分类算法在数据倾斜分布时由于过学习现象导致分类准确度不理想的状况,提出一种改进的互补贝叶斯分类算法。通过实验分析数据的倾斜分布对改进后的互补贝叶斯算法的影响,经验证该算法能够在数据倾斜分布时依然能保持较高的分类准确度,并且能随数据倾斜分布表现出较好的鲁棒性。讨论普通文本与Web文本的不同,建立一种带有主题权重的文档向量模型,并分析主题权重对文本算法的影响。实验发现,使用带有主题权重的文档向量模型,能够使得分类准确率相比普通的文本分类提高5%。 Focusing on the poor performance of complement naive Bayes algorithm on skewed data set,presents a modified complement naive Bayes algorithm by using a superior estimation for the prior class probability.Comprehensive experiments show that the modified complement naive Bayes algorithm exhibits excellent robustness to skewed data and achieves higher precision than any other naive Bayes algorithm.Furthermore,regards the difference between Web page classification and text classification,and presents a title weighted vector space model and analyses the effect of title weighted factor on classifier's precision.Experimental result shows that the precision is improved by 5% on average by using title weighted vector space model.
出处 《现代计算机(中旬刊)》 2012年第4期3-7,共5页 Modern Computer
基金 国家863高科技项目(No.2008AA01Z119)
关键词 朴素贝叶斯 互补贝叶斯 WEB文本分类 倾斜数据分布 Naive Bayes Complement Naive Bayes Web Classification Skewed Distribution
  • 相关文献

参考文献15

  • 1Fabrizio Sebastiani. Machine Learning in Automated Text Categorization. ACM Computing Surveys, Volume 34, Issue 1,1-47p,March 2002.
  • 2Ji He, Ah-hwee Tan, Chew-lim Tan. On Machine Learning Methods for Chinese Document Categorization. Applied Intel- ligence 18, 311-322p, 2003.
  • 3Yong WANG, Julia Hodges, Bo Tang. Classification of Web Documents Using a Naive Bayes Method. Proceedings of the 15th IEEE International Conference on Tools with Artificial Intelligence, 560p, November 03 - 05, 2003.
  • 4David D. Lewis. Naive (Bayes) at Forty: The Independence Assumption in Information Retrieval. ECML-98: European Conference on Machine Learning Nol0, Chemnitz, ALLE- MAGNE (21/04/1998), vol. 1398, 4-15p, 1998.
  • 5JD Rennie, L Shih, J Teevan, D Karger. Tackling the Poor Assumptions of Naive Bayes Text Classifiers. In Proceedings of the Twentieth International Conference on Machine Learn- ing, 2003.
  • 6S.Dumais, J.Platt, D.Heckerman, M.Sahami. Inductive Learn- ing Algorithms and Representations for Text Categorization. In Proceedings of the 1998 ACM 7th International Conference on Information and Knowledge Management, 1998:148-155.
  • 7Fu Chunpeng, Dale Schuurmans, Shao-jun Wang. Augment- ing Naive Bayse Classifiers with Statistical Language Models Information Retrival, 7, 317-345p, 2004.
  • 8Dou Shen, Yan Cong, Jian-tao Sun, Yu-chang Lu. Studies On Chinese Web Page Classification. Proceedings of the Sec- ond International Conference on Machine Learning and Cy- bernetics, Xi'an, 2-5 November 2003.
  • 9D.D. Lewis. Representation and Learning in Information Re- trieval. PHD thesis, Graduate School of the University of Maassachusetts, 1992.
  • 10Wei-tong HUANG, Lu-xiong XU, Jun-feng DUAN. Chinese Web Page Classification Study. 2007 IEEE International Conference on Control and Automation Guangzhou, China - May 30 to June 1, 2007.

同被引文献8

引证文献1

二级引证文献13

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部