期刊文献+

基于χ~2统计的文本分类特征选择方法的研究 被引量:44

Improved approach to CHI in feature extraction
下载PDF
导出
摘要 特征提取是文本分类过程中的一个重要环节,它的好坏将直接影响文本分类的准确率。在研究文本分类特征提取方法的基础上,分析了χ2统计的不足,并提出将频度、集中度、分散度应用到χ2统计方法上,对χ2统计进行改进,并通过实验对比改进前后的方法对文本分类效果的影响。在实验中,改进方法的分类效果要好于传统方法,从而验证了改进方法的有效性和可行性。 Feature extraction technology is an essential part of text categorization, which directly affects the categorization precision. This paper comprehensively took frequency, distribution and concentration into account and proposed an improved Chi-square Statistic(CHI) approach. In order to verify the improved CHI approach, a eontrastive experiment was carried out. The experimental results show that improved CHI approach is superior to traditional CHI approach in feature selection, which verifies the efficiency and probability of the improved CHI approach.
出处 《计算机应用》 CSCD 北大核心 2008年第2期513-514,518,共3页 journal of Computer Applications
基金 重庆市科委自然科学基金资助项目(CSTC2006BB2021)
关键词 特征提取 x^2统计 频度 集中度 分散度 feature extraction CHI approach frequency concentration distribution
  • 相关文献

参考文献8

二级参考文献22

  • 1Yiming Yang, An evaluation of statistical approaches to text categorization[J]. In:Journal of Information Retrieval,1999,1(2) :67 - 88.
  • 2Jian-yun Nie, Jianfeng Gao etc. On the Use of Words and N-grams for Chinese Information Retrieval[A]. Fifth International Workshop on Information Retrieval with Asian Languages [ C ]. Hong Kong, September 30 - October 1,2000.
  • 3James Auen.Natural Language Understandin[M].The Benjamin/Cummings Publishing Company, 1991-05.
  • 4Apte C,Damerau F J,Weiss S M.Automated Learning of Decision Rules for Text Categorization[J].ACM Trans On Inform Syst,12(3): 233-251.
  • 5Salton G,Buckley B.Term-weighting Approaches in Automatic Text Retrieval[J].Information Processing and Management, 1998 ; 24(5 ) :513 -523.
  • 6Larkey L S.A Patent Search and Classification System[C].In:proceedings of DL-99,4th ACM Conference on Digital Libraries Berkeley,CA,1999:179-187.
  • 7Salton G,Lesk M E.Computer Evaluation of Indexing and Text Processing[J].Association for Computing Machinery, 1968 ; 15 ( 1 ) : 8-36.
  • 8Yang Yiming,ProceedingsoftheSeventeenthInternationalACMSIGIRConferenceonResearchandDevelopme,1994年,12页
  • 9Yang Y,http://citeseernjneccom/yang97comparativehtml,1997年
  • 10John G H,Kohavi R,Pfleger K,Irrelevant feature and the subset selection problem[EB/OL] ,http://www,stanford,edu/-kpfleger/copy/publications/relevance4,ps,gz,1994.

共引文献290

同被引文献334

引证文献44

二级引证文献260

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部