摘要
通过对微博文本特征信息的分析与研究,提出一种基于改进卡方统计的微博特征提取方法。扩充微博信息分类特征,在传统的卡方统计量的基础上,引入了频度等因素,改进特征选择方法;在传统的特征项权值计算的基础上,提出了新的改进卡方统计量的方法,改进权重计算效果。对上述方法利用经典KNN和SVM算法进行了测试,实验结果表明该方法提高了微博信息分类的准确率。
This paper analyzes the microblogging text feature information, and proposes a microblogging feature extraction method based on improved chi-square statistic. Firstly, the microblogging information classification features are expanded,microblogging features are increased frequency and other factors. It improves the traditional feature selection methods.Then, based on the traditional feature item weight calculation, the paper proposes a new improved method of CHI-square statistic for improving weight calculation results. Finally, the above method is tested by using the classical KNN and SVM algorithm, the experimental results show that this method improves the micro-blog information classification accuracy.
出处
《计算机工程与应用》
CSCD
2014年第19期113-117,142,共6页
Computer Engineering and Applications
基金
国家自然科学基金(No.61105040
No.61203284
No.61272361)
北京市自然科学基金(No.4133085)
北京市教委青年拔尖人才培育计划
北京工业大学数学统计学基础科学研究基金(No.006000542213501)
关键词
微博分类
卡方统计量
特征选择
权值计算
microblogging classification
CHI-square statistics
feature selection
weight calculation