摘要
由于网络的发展,中文文本的分类技术也有待提高。向量空间模型是中文文本分类中常用的模型,词作为中文文本的特征,其维数很高,如此高的维数对分类并不是都有用的。所以对特征的选择尤为重要,文章通过两种分类器对多种特征选择方法在同一平面内进行比较,将分类器的各种参数设为最优状态,得到了IG的分类效果较好,并且利用了平均查全率对不同的特征选择方法进行纵向与横向的比较,实验证明,SVM整体的分类效果要优于KNN的分类效果。
With the development of the network, the classified technology of Chinese text also waits for the enhancement. The vector space model is the model which is commonly used in Chinese text classification the word took the characteristic of Chinese text , its dimension is very high, so high dimension is not all useful to classifies. Therefore the characteristic choosing is especially important , this article carries on the comparison through two kinds of sorters to many kinds of characteristic choosing method in the identical plane. I supposed every parameter of sorters in the most superior condition, obtained the classified effect of IG is better, and used the average recall to carry on comparison to the different characteristic choosing method in the longitudinal and the crosswise. The experiment proved that the classified effect of SVM had surpassed the classified effect of KNN.
出处
《电子技术(上海)》
2007年第11期132-134,共3页
Electronic Technology
关键词
文本分类
特征选择
支持向量机
Text Classification Characteristics choosing method SVM