摘要
文本分类能够帮助人们更有效地利用不断膨胀的海量网络信息,然而网络文本中已不再是以单一的文本内容形式出现,更多的是包含标题、关键字、摘要、正文等多种要素。若用传统的文本分类器,则效果明显不好。综合利用文本各要素,提出通过KNN算法对各要素进行文本分类,再使用模拟退火算法以及Bayes定理协调各要素比重的多要素文本协调分类算法。实验结果表明,该算法是可行的,并且使用该算法得到的分类器比仅使用文本内容得到的分类器具有更高的分类准确率。
Text classification can help people use massive information on the Internet effectively. However the information contains title, keywords, abstract, and contents rather than simple texts. There- fore, if we just use common text classifier, we can/t get the good effect. Considers all of the ele- ments in a text and uses KNN to do text classification. The proportion of each element is coordi- nated by the simulated annealing algorithm and Bayes theory. Gets a classification algorithm which applies to multiple elements of a text. The experimental result shows that this algorithm is feasible and the classification has more accuracy rate than a classification using only contents.
出处
《现代计算机》
2013年第7期9-12,共4页
Modern Computer
关键词
文本分类
KNN算法
多要素
模拟退火算法
Bayes定理
Text Classification
KNN Algorithm
Multiple Elements
the Simulated Annealing Algorithm~ Bayes Theory