摘要
自动文本分类是指在给定的分类体系下,让计算机根据文本的内容确定与它相关联的类别。现有的文本分类算法大都基于向量空间模型,因而不能充分表达文档的语义特征信息,从而影响了分类器性能。针对此问题,本文通过训练文档构造相似矩阵,从中获得每个类别的主题信息,由此构造分类器,最后与经典的分类器进行组合以确定文本类别。实验系统证明本文提出的分类方法较大改进了分类器性能。
Automatic text classification is defined as the task to assign pre-defined category labels to documents.Based on the limitations of Vector Space Model,the Vector Space Model is incapable of expressing the structure of documents effectively.To solve this problem,this paper constructs the sireilar matrix by train text,and achieves the subject information of each category through similar matrix,and then to construct the classifier by the subject information.Finally the classifier is combined with the classic classifier to determine the category of text.The experiment system shows the effectiveness of the method.
出处
《计算机与现代化》
2010年第11期9-11,15,共4页
Computer and Modernization