摘要
大多文本分类方法是基于向量空间模型的,基于这一模型的文本向量维数较高,导致分类器效率难以提高。针对这一不足,该文提出基于词向量空间模型的文本分类方法。其主要思想是把文本的特征词表示成空间向量,通过训练得到词-类别支持度矩阵,根据待分文本的词和词-类别支持度矩阵计算文本与类别的相似度。实验证明,这一分类方法取得了较高的分类精度和分类效率。
Most of the methods of text categorization are based on the vector space model,but the high dimension of document vectors based on the model leads to difficulty in improving efficiency of the classifier. In view of the defect, a method of Chinese text categorization based on the word vector space model is presented in this paper. The characteristic words of a text are defined as space vectors, and the word-class supporting matrix can be gotten by training, and then the characteristic words and the word-class supporting matrix are used for computing text similarity. Experiment shows that the presented method has higher precision and efficiency.
出处
《合肥工业大学学报(自然科学版)》
CAS
CSCD
北大核心
2007年第10期1261-1264,共4页
Journal of Hefei University of Technology:Natural Science
基金
安徽省自然科学基金资助项目(050420207)