摘要
提出了一种改进的基于统计的中文网页的分类算法,通过对传统的基于计算相似度文本分类方法和基于贝叶斯模型文本分类算法的研究,我们对贝叶斯模型分类算法进行了改进,提出了利用一种基于概率分布的可分性判据分类方法,即用类别密度函数似然比来增加特征词的可分性信息的算法。通过对计算相似度方法、贝叶斯方法及改进的贝叶斯方法的对比实验表明,改进算法可以使类与类的间隔最大化,因而具有较高的分类精确率和召回率。
This paper presents a modified statistic Chinese web page classification algorithm.Similarity based method and Bayes model based method are the popular approaches for text classification.In this paper, we modified Bayes model method,then uses the probability likelihood ratio of each class to increase the separability of feature words vectors.In the following method:Similarity based method,Bayes model based method,modified Bayes model based method.The experiment shows that among these methods,the modified algorithm not only presents the maximum distance between classes,but also improves precision and recall.
出处
《微处理机》
2002年第1期26-28,共3页
Microprocessors