摘要
文本分类(Text Categorization,TC)指的是把一个自然语言文本,根据其主题归入到预先定义好的类别中的过程。文本分类是自然语言处理的一个基础性工作,也是近年来人们研究的热点话题。针对朴素贝叶斯算法在小样本集分类效果不高的原因进行了分析,对其进行了改进和调整,提出了基于改进的朴素贝叶斯文本分类方法,试验结果表明,该方法取得了更好的效果。
Text categorization is such a procedure that it can classify the text automatically by computer,and the categories have been defined before classify. It's a hot topic in our study area and it's also a basic work in the area of natural language disposal.Here on the nave Bayes algorithm is modified and the result is better than before.A text categorization study on improval nave Bayes is presented.The result of the experition indicates that this method have better effect.
出处
《河北省科学院学报》
CAS
2007年第1期22-25,共4页
Journal of The Hebei Academy of Sciences
基金
河北省自然科学基金资助项目(2004000132)
关键词
文本分类
朴素贝叶斯
K近邻
知网
中文分词
Text categorization
Nave bayes
KNN
HowNet
Chinese word segmentation