期刊文献+

基于改进卡方统计的微博特征提取方法 被引量:14

Feature selection methods of microblogging based on improved CHI-square statistics
下载PDF
导出
摘要 通过对微博文本特征信息的分析与研究,提出一种基于改进卡方统计的微博特征提取方法。扩充微博信息分类特征,在传统的卡方统计量的基础上,引入了频度等因素,改进特征选择方法;在传统的特征项权值计算的基础上,提出了新的改进卡方统计量的方法,改进权重计算效果。对上述方法利用经典KNN和SVM算法进行了测试,实验结果表明该方法提高了微博信息分类的准确率。 This paper analyzes the microblogging text feature information, and proposes a microblogging feature extraction method based on improved chi-square statistic. Firstly, the microblogging information classification features are expanded,microblogging features are increased frequency and other factors. It improves the traditional feature selection methods.Then, based on the traditional feature item weight calculation, the paper proposes a new improved method of CHI-square statistic for improving weight calculation results. Finally, the above method is tested by using the classical KNN and SVM algorithm, the experimental results show that this method improves the micro-blog information classification accuracy.
出处 《计算机工程与应用》 CSCD 2014年第19期113-117,142,共6页 Computer Engineering and Applications
基金 国家自然科学基金(No.61105040 No.61203284 No.61272361) 北京市自然科学基金(No.4133085) 北京市教委青年拔尖人才培育计划 北京工业大学数学统计学基础科学研究基金(No.006000542213501)
关键词 微博分类 卡方统计量 特征选择 权值计算 microblogging classification CHI-square statistics feature selection weight calculation
  • 相关文献

参考文献14

  • 1崔争艳.基于语义的微博短信息分类[J].现代计算机,2010,16(8):18-20. 被引量:18
  • 2Liu Zitao,Yu Wenchao,Chen Wei,et al.Short text feature selection for microblog mining[C]//Proceedings of the 4th International Conference on Computational Intelligence and Software Engineering,Wuhan,China,2010:1-4.
  • 3Fan Xinghua,Hu Hongge.A new model for chinese shorttext classification considering feature extension[C]//Proceedings of Artificial Intelligence and Computational Intelligence.[S.l.]:IEEE Computer Society,2010:7-11.
  • 4滕少华.基于CRFs的中文分词和短文本分类技术[D].北京:清华大学,2009.
  • 5黄永光,刘挺,车万翔,胡晓光.面向变异短文本的快速聚类算法[J].中文信息学报,2007,21(2):63-68. 被引量:17
  • 6Zelikovitz S,Transductive M F.Learning for short-text classification problem using latent semantic indexing international[J].Journal of Pattern Recognition and Artificial Intelligence,2005,19(2):143-163.
  • 7Sriram B,Fuhry D,Demir E,et al.Short text classification in twitter to improve information filtering computer science and engineering department[D].Columbus,USA:Ohio State University,2010:24-34.
  • 8彭泽映,俞晓明,许洪波,刘春阳.大规模短文本的不完全聚类[J].中文信息学报,2011,25(1):54-59. 被引量:35
  • 9Yang Yiming.An evaluation of statistical approaches to text categorization[J].Information Retrieval,1999,1(1/2):69-90.
  • 10王光,邱云飞,史庆伟.集合CHI与IG的特征选择方法[J].计算机应用研究,2012,29(7):2454-2456. 被引量:22

二级参考文献60

共引文献281

同被引文献101

引证文献14

二级引证文献69

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部