摘要
机器学习过程中,由于训练集不完备,有必要构建具备主动学习能力的增量模型。对基于伪相关反馈的增量模型,现有的增量学习方法提出了一些选择反馈样本的策略,但对提高反馈样本类置信度的深入研究仍具有重要意义。针对这一问题,提出了基于K-Means聚类的伪相关反馈策略。对朴素贝叶斯分类器分类后的文档,用减量寻找质心向量的方式提取反馈文档以及新特征集合,对NB分类器进行反馈,将伪相关反馈策略运用于中文文本情感分类。实验表明,提取质心向量的准确率随反馈规模的扩大有所提高。方法从一定程度上实现了将后验概率转换为先验概率,随新特征的增加,配合CHI阈值调整可获取较高的查准率和查全率,证明了方法的可行性。
In the process of machine learning, it is necessary to build incremental model with automatic learning capabilities. For incremental model based on Pseudo-relevance feedback, the research on how to improve the confi- dence of feedback samples is still important, although some feedback strategy had been given. This paper presented a pseudo relevance feedback method based on K-Means clustering. For documents classified by Naive Bayesian classi- fier, we searched the center vector by means of reducing the sample number gradually, and extracted feedback sam- ples and feature concentration using for improve the performance of NB classifier. We carried out experiments in Chi- nese text sentiment classification according to the pseudo relevance feedback strategy. This method converts the poste- rior probability into prior probability in a degree. The results show that with the expansion of feature concentration, the strategy can achieve better than baseline in precision and recall.
出处
《计算机仿真》
CSCD
北大核心
2013年第11期268-271,共4页
Computer Simulation
关键词
伪相关反馈
情感分类
朴素贝叶斯
聚类
Pseudo relevance feedback
Sentiment classification
Naive Bayesian
K-Means clustering