期刊文献+

基于主成分分析和KNN混合方法的文本分类研究 被引量:4

Research on Text Classification method Based on PCA and KNN
下载PDF
导出
摘要 特征选择和分类算法是文本分类中的两个关键技术,提出了基于主成分分析和KNN相结合的文本分类方法。该方法利用主成分分析对文本向量的高维空间进行特征选择,为克服因类别特征选择不当带来的不利影响,使用KNN算法进行分类可以最大程度地减少分类过程中的误差。为了验证方法的有效性,针对UCI标准数据集进行仿真实验。实验结果显示,PCA-KNN方法优于主成分分析和随机森林相结合的方法,能在一定程度上提高文本分类的精度。 Feature extraction and categorization algorithm are two crucial technologies for text classification. A text classification method based on PCA and KNN was presented. The proposed method use PCA to select fea-ture of the text vector from multi-dimension space. In order to overcome the negative influence for the improper category feature selection,the classification method KNN can minimize the error of the classification results. Some experiments are executed on the UCI standard data sets to demonstrate the advantages of the proposed method. The results show that PCA-KNN method is better than the method based on PCA and random forests and can improve the accuracy of text classification.
出处 《东北电力大学学报》 2013年第6期60-63,共4页 Journal of Northeast Electric Power University
基金 国家自然科学基金项目(11226263 11201057 61202261) 吉林省自然科学基金项目(201215165)
关键词 主成分分析 降维 KNN算法 文本分类 PCA Dimensionality reduction KNN Text classification
  • 相关文献

参考文献12

  • 1T Sergios, K Konstantinos. Pattern Recognition. Third Edition [ M ]. Amsterdam, Bostou : Academic Press,2007.
  • 2张锦,李光,曹伍,胡瑞芬.基于主成分分析的自动文本分类模型[J].北京邮电大学学报,2006,29(z2):136-138. 被引量:3
  • 3黎超,吴义国,魏星.基于主成分分析的SMO文本分类[J].现代计算机,2011,17(10):18-21. 被引量:2
  • 4杨俊,陈贤富.基于KPCA和RBF网络的文本分类研究[J].微电子学与计算机,2010,27(3):122-125. 被引量:12
  • 5Y Yang, X Lin. A re-examination of text categorization methods[ J]. In:Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, New York : ACM Press, 1999:42 - 49.
  • 6Y Yang. An evaluation of statistical approaches to text categorization [ J]. Information Retrieval, 1999,1 (1) :76 -88.
  • 7D.MChristopher,RPrabhakar,SHinrich.信息检索导论[M].王斌,译.北京:人民邮电出版社,2010.
  • 8K Aas, L Eikvil. Text Categorization : A Survey [ R ]. Oslo, Norway : Norwegian Computing Center [ R ], Tech Rep : NR941,1999.
  • 9G Salton,T Y Clement. On the Construction of Effective Vocabularies for Information Retrieval [ C ]. Proc of 1973 Meeting on Programming Languages and Information Retrieval, New York, USA : ACM Press, 1973.
  • 10P. Soucy, G. W. Mineau. A Simple KNN Algorithm for Text Categorization [ C ]. Data Mining, 2001. ICDM 2001, Proceedings IEEE Interna- tional Conference on ,2001:647 - 648.

二级参考文献42

共引文献309

同被引文献26

  • 1张锦,李光,曹伍,胡瑞芬.基于主成分分析的自动文本分类模型[J].北京邮电大学学报,2006,29(z2):136-138. 被引量:3
  • 2孙锐,石金涛.基于因子和聚类分析的区域创新能力再评价[J].科学学研究,2006,24(6):985-990. 被引量:46
  • 3杨宗凯.小波去噪及其在信号检测中的应用[J].华中理工大学学报,1997,25(2):1-4. 被引量:48
  • 4Jieming Yang,Yuanning Liu,Zhen Liu,Xiaodong Zhu,Xiaoxu Zhang.A new feature selection algorithm based on binomial hypothesis testing for spam filtering[J].Knowledge-Based Systems.2011(6)
  • 5Hiroshi Ogura,Hiromi Amano,Masato Kondo.Feature selection with a measure of deviations from Poisson in text categorization[J].Expert Systems With Applications.2008(3)
  • 6Wenqian Shang,Houkuan Huang,Haibin Zhu,Yongmin Lin,Youli Qu,Zhihai Wang.A novel feature selection algorithm for text categorization[J].Expert Systems With Applications.2006(1)
  • 7Zhiping Chen,Kevin Lü.A preprocess algorithm of filtering irrelevant information based on the minimum class difference[J].Knowledge-Based Systems.2006(6)
  • 8Dimitris Fragoudis,Dimitris Meretakis,Spiridon Likothanassis.Best terms: an efficient feature-selection algorithm for text categorization[J].Knowledge and Information Systems.2005(1)
  • 9Le Zhang,Jingbo Zhu,Tianshun Yao.An evaluation of statistical spam filtering techniques[J].ACM Transactions on Asian Language Information Processing (TALIP).2004(4)
  • 10Avrim L. Blum,Pat Langley.Selection of relevant features and examples in machine learning[J].Artificial Intelligence.1997(1)

引证文献4

二级引证文献11

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部