期刊文献+

多特征融合的博客文章分类方法 被引量:6

Multi-feature Fusion Method for Blog Post Classification
下载PDF
导出
摘要 博客已经成为了互联网上最热门的应用之一.博客文章内容千差万别,对其进行分类具有重要意义.博客文章有别于新闻文章,普通文本分类方法直接应用于博客文章效果不理想.提出一种新的方法,充分利用了博客文章特有的Tag、用户自定义类别等多个特征,并对各项特征进行融合.另外,通过对自定义类别进行预处理,过滤与类别无关的噪声单词.实验结果表明多特征融合的方法能够有效提高博客文章分类的准确率. Blog has become one of the most popular applications on Internet.The content of Blog posts is various,so it's meaningful to have a research on Blog post classification.As Blog posts are different from News articles,common text classification methods doesn't perform well.We present a new method which is fit for Blog post classification in this paper.The method can make full use of the features of Blog post like Tag and custom category and fuse them.The noise words in custom category are filtered by pretreament.We find that the precision of this method is obviously better than common text classification methods.
作者 麦林 俞能海
出处 《小型微型计算机系统》 CSCD 北大核心 2010年第6期1129-1132,共4页 Journal of Chinese Computer Systems
基金 国家自然科学基金项目(60672056)资助 国家"八六三"高技术研究发展计划项目(2008AA01Z117)资助 高等学校博士学科点专项科研基金资助项目(20070358040)资助
关键词 文本分类 博客文章分类 博客文章特征 多特征融合 text classification blog post classification blog post feature multi-feature fusion
  • 相关文献

参考文献7

  • 1China Internet Network Information Center.The 23th Statistical Report of China Internet Network Development[EB/OL].http://www.cnnic.net.cn/uploadfiles/pdf/2009/1/13/92458.pdf,2009.
  • 2Sun Ai-xin,Suryanto M A,Liu Ying.Blog classification using tags:an empirical study[C].ICSDL 2007,LNCS 4882,307-316,2007.
  • 3Brooks C H,Montanez N.Improved annotation of the blogosphere via autotagging and hierarchical clustering[A].WWW 2006,625-632[C].ACM Press,2006.
  • 4McCallum A,Nigam K.A comparison of event models for Nave bayes text classification[A].AAAI-98 Workshop on Learning for Text Categorization[C/OL].AAAI Press.http://www.cs.cmu.edu/-mccallum,2000.
  • 5Ni Xiao-chuan,Wu Xiao-yuan,Yu Yong.Automatic identification of Chinese weblogger's interests based on text classification[C].Proceedings of the 2006 IEEE/WIC/ACM Internationl Conference on Web Intelligence,2006,247-253.
  • 6Yang Yi-ming,Pederson J.O.A comparative study on feature selection in text categorization[A].Proceedings of the 14th International Conference of Machine Learning[C].San Francisco:Morgan Kaufmann Publishers,1997,412-420.
  • 7Kohavi R.A study of cross-validation and bootstrap for accuracy estimation and model selection[A].FourteenthInternational Joint Conference on Artificial Intelligence (IJCAI 95)[C].Morgan Kaufmann Publishers,1995,1137-1143.

同被引文献55

引证文献6

二级引证文献14

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部