摘要
在Bernoulli混合模型和期望最大化(EM)算法的基础上给出了一种基于不完整数据的改进方法。首先在已标记数据的基础上通过Bernoulli混合模型和朴素贝叶斯算法得到似然函数参数估计初始值,然后利用含有权值的EM算法对分类器的先验概率模型进行参数估计,得到最终的分类器。实验结果表明,该方法在准确率和查全率方面要优于朴素贝叶斯文本分类。
It is an important issue to construct the text classification with incomplete data. An improved method that based on Bernoulli Mixture Model and Expectation Maximization(EM) algorithm was introduced. Based on Bernoulli Mixture Model and EM algorithm, by learning the labeled data, the initial value of likelihood function parameter was obtained first. Then the parameter estimate of prior probability model on the classifier with EM algorithm including weight was presented. Finally we got the improved classifier. The results show that our new method is better than the naive hayes text classification in the recall and precision.
出处
《计算机应用》
CSCD
北大核心
2007年第5期1235-1237,1250,共4页
journal of Computer Applications
基金
2004年教育部优秀人才支持计划资助项目(NCET-04-0496)
教育部重点科学研究项目(105087)