期刊文献+

CHI文本分类特征选择方法的改进与实现 被引量:1

Improvement and Realization of CHI Text Classification Feature Selection Method
下载PDF
导出
摘要 作为一种有效的数据挖掘方法,文本分类逐渐成为了关注热点。而文本分类过程繁杂,涉及关键技术多种多样,其中,特征选择在文本分类过程中起到了重要作用,而CHI正是一种常用的文本特征选择方法。针对该模型的不足之处,以特征项的词频因素及其正负相关的情况为依据,对CHI模型进行逐步优化,使得特征项频数和正负相关信息得到了有效利用,随后的文本分类实验证明了本文中改进的CHI文本特征选择方法的可行性。 As an effective data mining method, text categorization has become a hot spot nowadays. The text classification process is complicated, involving a variety of key technologies, of which feature selection has played an important role in the text classification process, and CHI is a commonly used method of text feature selection. In view of the deficiencies of the model, the CHI model is gradually optimized based on the word frequency factor of the feature term and its positive and negative correlations, which makes the frequency and positive and negative correlation information of the feature term effectively used. Subsequent text classification experiments proved the feasibility of the improved CHI text feature selection method in this paper.
作者 林智健 Lin Zhijian(College of Computer And Information Science, Chongqing Normal University, Chongqing 401331, China)
出处 《信息与电脑》 2018年第7期172-176,共5页 Information & Computer
关键词 文本分类 数据预处理 特征选择 text classification data preprocessing feature selection
  • 引文网络
  • 相关文献

参考文献4

二级参考文献37

  • 1苏金树,张博锋,徐昕.基于机器学习的文本分类技术研究进展[J].软件学报,2006,17(9):1848-1859. 被引量:389
  • 2张海龙,王莲芝.自动文本分类特征选择方法研究[J].计算机工程与设计,2006,27(20):3840-3841. 被引量:45
  • 3Yang Yiming,Pedersen J O.A Comparative Study on FeatureSelection in Text Categorization[C]//Proc.of the 14th Int’l Conf.on Machine Learning.Nashville,USA:Morgan KaufmannPublishers,1997:412-420.
  • 4Mladenic D,Grobelnk M.Feature Selection for Unbalanced ClassDistribution and Na ve Bayes[C]//Proc.of the 16th Int’l Conf.onMachine Learning.San Franciso,USA:Morgan Publishers,1999:258-267.
  • 5谭松波,王月粉.中文文本分类语料库TanCorpV1.0[EB/OL].(2010-05-18).http://www.searchforum.org.cn/tan-Songbo/corpusl.php.
  • 6蒋健.文本分类中特征提取和特征加权方法研究[D].重庆:重庆大学,2010.
  • 7JANA N,PETR S,MICHAL H. Conditional mutual information based feature selection for classification task[A].Beilin:Springer-Verlag,2007.417-426.
  • 8SANTANA L E A,De OLIVEIRA D F,CANUTO A M P. A comparative analysis of feature selection methods for ensembles with different combination methods[A].Piseataway:IEEE Press,2007.643-648.
  • 9LI Fang-tao,GUAN Tao,ZHANG,Xian. An aggressive feature selection method based on rough set theory[A].Washington,DC:IEEE Computer Society,2007.176-179.
  • 10熊忠阳,张鹏招,张玉芳.基于χ~2统计的文本分类特征选择方法的研究[J].计算机应用,2008,28(2):513-514. 被引量:44

共引文献52

同被引文献9

引证文献1

二级引证文献1

;
使用帮助 返回顶部