期刊文献+

文本分类中特征提取方法的比较与分析 被引量:8

Comparison and Analysis of Feature Extraction Methods for Text Categorization
下载PDF
导出
摘要 研究了在文本分类中,各种特征提取方法对分类效果的影响,比较了特征提取方法交叉熵(CE)、信息增益(IG)、互信息(MI)、及χ2对文本分类器性能的影响,分析了这几种特征提取方法对SVM和KNN分类器性能的影响。 Studies feature extraction in text categorization, compares cross entropy (CE), information gain(IG), mutual information(MI), X^2-test(CHI) and class selection these four method, analyzes the influence of performance of these feature extraction methods on SVM and KNN these two classifiers.
作者 屈军 林旭
出处 《现代计算机》 2007年第4期10-13,共4页 Modern Computer
关键词 文本自动分类 KNN SVM 特征提取 Text Categorization KNN SVM Feature Extraction
  • 相关文献

参考文献11

  • 1Y Yang and 10.Pedersen.A comparative study on feature selection in text categorization.In Proceedings of ICML-97,14th International Conference on Machine Learning,pages 412-20,Nashville,US,1997
  • 2朱明,王军,王俊普.Web网页识别中的特征选择问题研究[J].计算机工程,2000,26(8):35-37. 被引量:29
  • 3Dunja Mladenic,Marko Grobelink.Feature selection on hierarchy of web documents.Decision Support Systems,2003,35:45287.51
  • 4D.D.Lewis.Naive (Bayes) at forty:The Independence Assumption in Informationc Retrieval.In Proceedings of the 10th European Conference on Machine Learning,New York,1998,4~15
  • 5S.Eyheramendy,D.D.Lewis and and D.Madigan.On the Naive bayes model fortext categorization.Artificial Intelligence&Statistics 2003
  • 6Y Yang.An evaluation of statistical approaches to text categorization.Information Retrieval,1999,1(1):76~88
  • 7李荣陆,胡运发.基于密度的kNN文本分类器训练样本裁剪方法[J].计算机研究与发展,2004,41(4):539-545. 被引量:98
  • 8W.Cohen and Y Singer.Context-sensitive learning methods for text categorization.In Proceedings of the 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval,1996:307~315
  • 9Y Yang and C.G.Chute.A linear least squares fit mapping method for information retrieval from natural language texts.In Proceedings of the 14th Conference on Computational Linguistics (COLING92),1992
  • 10C.Hsu,C.Lin.A comparison on methods for multi-class support vector machines,IEEE Transactions on Neural Networks.2002,13:415425

二级参考文献14

  • 1[1]D D Lewis. Naive (Bayes) at forty: The independence assumption in information retrieval. In: The 10th European Conf on Machine Learning(ECML98), New York: Springer-Verlag, 1998. 4~15
  • 2[2]Y Yang, X Lin. A re-examination of text categorization methods. In: The 22nd Annual Int'l ACM SIGIR Conf on Research and Development in Information Retrieval, New York: ACM Press, 1999
  • 3[3]Y Yang, C G Chute. An example-based mapping method for text categorization and retrieval. ACM Trans on Information Systems, 1994, 12(3): 252~277
  • 4[4]E Wiener. A neural network approach to topic spotting. The 4th Annual Symp on Document Analysis and Information Retrieval (SDAIR 95), Las Vegas, NV, 1995
  • 5[5]R E Schapire, Y Singer. Improved boosting algorithms using confidence-rated predications. In: Proc of the 11th Annual Conf on Computational Learning Theory. Madison: ACM Press, 1998. 80~91
  • 6[6]T Joachims. Text categorization with support vector machines: Learning with many relevant features. In: The 10th European Conf on Machine Learning (ECML-98). Berlin: Springer, 1998. 137~142
  • 7[7]S O Belkasim, M Shridhar, M Ahmadi. Pattern classification using an efficient KNNR. Pattern Recognition Letter, 1992, 25(10): 1269~1273
  • 8[8]V E Ruiz. An algorithm for finding nearest neighbors in (approximately) constant average time. Pattern Recognition Letter, 1986, 4(3): 145~147
  • 9[9]P E Hart. The condensed nearest neighbor rule. IEEE Trans on Information Theory, 1968, IT-14(3): 515~516
  • 10[10]D L Wilson. Asymptotic properties of nearest neighbor rules using edited data. IEEE Trans on Systems, Man and Cybernetics, 1972, 2(3): 408~421

共引文献125

同被引文献59

引证文献8

二级引证文献34

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部