摘要
提出了一种新增特征的朴素贝叶斯增量算法。在无标注语料增量样本的选择上,借助传统的类置信度阈值,构建一个最小后验概率作为样本选择的双阈值,当识别到增量语料中有新的特征时,会将该特征加入到特征空间,并对分类器进行相应的更新,发现对类置信度阈值起到很好的补充作用,最后利用了无标注和有标注语料验证所提算法。实验结果表明,改进的朴素贝叶斯增量算法较传统增量算法表现出了更优的增量学习效果。
A novel Naive Bayes incremental algorithm was proposed, which could select new features. For the incre- mental sample selection of the unlabeled corpus, a minimum posterior probability was designed as the double threshold of sample selection by using the traditional class confidence. When new feature was detected in the corpus, it would be mapped into feature space, and then the corresponding classifier was updated. Thus this method played a very important role in class confidence threshold. Finally, it took advantage of the unlabeled and annotated corpus to validate improved incremental algorithm of Naive Bayes. The experimental results show that an improved incremental algorithm of Naive Bayes significantly outperforms traditonal incremental algorithm.
出处
《通信学报》
EI
CSCD
北大核心
2016年第10期81-91,共11页
Journal on Communications
关键词
朴素贝叶斯
增量算法
特征空间
评价指标
Naive Bayes, incremental algorithm, feature space, evaluation index