期刊文献+

两种分类方法特征选择的比较 被引量:1

The comparison of two kinds of sorters characteristics choosing
原文传递
导出
摘要 由于网络的发展,中文文本的分类技术也有待提高。向量空间模型是中文文本分类中常用的模型,词作为中文文本的特征,其维数很高,如此高的维数对分类并不是都有用的。所以对特征的选择尤为重要,文章通过两种分类器对多种特征选择方法在同一平面内进行比较,将分类器的各种参数设为最优状态,得到了IG的分类效果较好,并且利用了平均查全率对不同的特征选择方法进行纵向与横向的比较,实验证明,SVM整体的分类效果要优于KNN的分类效果。 With the development of the network, the classified technology of Chinese text also waits for the enhancement. The vector space model is the model which is commonly used in Chinese text classification the word took the characteristic of Chinese text , its dimension is very high, so high dimension is not all useful to classifies. Therefore the characteristic choosing is especially important , this article carries on the comparison through two kinds of sorters to many kinds of characteristic choosing method in the identical plane. I supposed every parameter of sorters in the most superior condition, obtained the classified effect of IG is better, and used the average recall to carry on comparison to the different characteristic choosing method in the longitudinal and the crosswise. The experiment proved that the classified effect of SVM had surpassed the classified effect of KNN.
作者 王晓微
出处 《电子技术(上海)》 2007年第11期132-134,共3页 Electronic Technology
关键词 文本分类 特征选择 支持向量机 Text Classification Characteristics choosing method SVM
  • 相关文献

参考文献5

  • 1陈鑫.基于文本的分类方法研究[J].电脑开发与应用,2003,16(7):4-5. 被引量:1
  • 2庞剑锋 卜东波 白硕.基于向量空间模型的文本自动分类系统的研究与实现[J].北京理工大学学报,2003,.
  • 3Yimng Yang and Jan O. Pedersen. A comparative Study on Feature Selection in Text Categorization. 2000
  • 4李建民,张钹,林福宗.支持向量机的训练算法[J].清华大学学报(自然科学版),2003,43(1):120-124. 被引量:46
  • 5Fabrizio Sebastiani. Machine Learning in Automated Text categorization[J]. ACMcom-putting surveys, 2002

二级参考文献31

  • 1Udo Klemens ,Schnatting. Deep Knowledge Discovery from Natural Language Texts. In: proc. of the 3rd Conf. on Knowledge Discovery and Data Mining, 1997 : 175-178.
  • 2Mladenic D. Machine Learning on non-homogeneous,Distributed Text Data. Doctoral Dissertation, University of Ljublijana, 1998: 163-168.
  • 3Joachims T. A Probalilistic Analysis of the Rocchio Algorithm with TFIDF for Text Categorization. ICML97 : 143-151.
  • 4Agrawal R, Srikant R. Fast Algorithms for Mining Association Rules. In:Proc of the 20^th int'1 Conf on Very Large Data Bases. Santiago ,chile, 1994 : 487 - 499.
  • 5VapnikV.统计学习理论的本质[M].北京:清华大学出版社,2000..
  • 6Osuna E,Freund R,Girosi F. Support Vector Machines: Training and Application [R]. CBCL Paper #144 / AI Memo #1602,Cambridge,MA: Massachusetts Institute of Technology,AI Lab,1997.
  • 7Osuna E,Freund R,Girosi F. An improved training algorithm for support vector machines [A]. Principe J,Gile L,Morgan N,et al. Proceedings of the 1997 IEEE Workshop on Neural Networks for Signal Processing [C]. IEEE,1997. 276-285.
  • 8Joachims T. Making large-scale support vector machine learning practical [A]. Scholkopf B,Burges C,Smola A. Advances in Kernel Methods - Support Vector Learning [C]. Cambridge,MA: MIT Press,1999. 169-184.
  • 9LIN Chihjen. On the convergence of the decomposition method for support vector machines [J]. IEEE Transactions on Neural Networks,2001,12(6): 1288-1298.
  • 10Laskov P. An improved decomposition algorithm for regression support vector machines [A]. Solla S,Leen T,Muller K-R. Advances in Neural Information Processing Systems 12 [C]. Cambridge,MA: MIT Press,2000. 484-490.

共引文献45

同被引文献5

引证文献1

二级引证文献2

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部