期刊文献+

基于特征权重与词间相关性的文本特征选择算法 被引量:3

TEXT FEATURE SELECTION ALGORITHM BASED ON CORRELATION OF FEATURES WEIGHT AND WORDS
下载PDF
导出
摘要 传统的ReliefF算法使用二值法不能体现离散特征差异大小,且不能去除冗余特征。针对这种情况提出了mRMR-ReliefF特征选择算法。该算法利用概率弥补特征差异度量上的不足,提出新的差异函数。此函数使提取出的特征更能体现文本的类内相关性和类间差异性。该算法还结合了词间相关性。词间相关性在考虑选择和类别相关性大的特征词的同时还考虑了特征冗余的消除。通过三种算法的对比实验,表明该算法为文本分类提供了更有效的特征子集。 Traditional ReliefF algorithm,by using the binary method,can neither reflect the differences of discrete characteristics nor remove the redundant features.In view of this situation,mRMR-ReliefF feature selection algorithm is proposed.The algorithm makes up for the deficiency of feature difference measurement by utilising the probability,and puts forward a new difference function.This function makes the extracted features better reflect both the relevancy within the class and difference among classes of the texts.The algorithm also combines the words relevancy,which not only considers the selection of characteristic words that has much to do with the class but also considers redundancy eliminating.According to the comparison of three algorithms,it shows that the algorithm our paper proposing can provide a more effective feature subset for the text classification.
出处 《计算机应用与软件》 CSCD 北大核心 2012年第9期33-36,共4页 Computer Applications and Software
基金 国家自然科学基金项目(60603047) 教育部留学回国人员科研启动基金资助项目 辽宁省科技计划项目(2008216014) 辽宁省教育厅高等学校科研基金项目(L2010229) 大连市优秀青年科技人才基金项目(2008J23JH026)
关键词 RELIEFF算法 mRMR-ReliefF算法 特征选择 差异函数 词间相关性 文本分类 ReliefF algorithm ,mRMR-ReliefF algorithm, Feature selection, Difference function ,Correlation between word, Text classification
  • 相关文献

参考文献10

  • 1Liu Y, Zheng Y F. A novel feature selection method for support vector machines [ J ]. Pattern Recognition,2006,39 : 1333 - 1345.
  • 2Sikonja R M, Kononenko I. Theoretical and empirical analys is of Re- liefF and RReliefF[ J]. Machine Learning,2003,53( 1 -2) :23 -69.
  • 3Kononenko I. Estimation: Analysis and extensions of relief [ C ]//Pro- ceedings of the 1994 European Conference on Machine Learning[ S. 1. J. ACM Press, 1997:273 - 324.
  • 4Kenji K, Rendell L A, Rendell A. A practical approach to feature selec- tion machine learning[ C ]//Proceedings of ICML' 92. Aberdeen, Scot- land, UK[ s. n. ] , 1992:249 - 256.
  • 5朱远枫,章晶,史娜.基于改进的ReliefF算法的神经网络集成分类模型[J].电脑知识与技术,2009,5(3):1699-1700. 被引量:1
  • 6万京,王建东.一种基于新的差异性度量的ReliefF方法[C]//'09年研究生学术交流会--通信与信息技术,2009.
  • 7张丽新,王家廞,赵雁南,杨泽红.基于Relief的组合式特征选择[J].复旦学报(自然科学版),2004,43(5):893-898. 被引量:44
  • 8Jin X, Li R G, Shen X. Automatic web page categorization with ReliefF" and hidden na''ve Bayes[ C]//Proceediugs of the 2007 ACM Symposi- um on Applied Computing, USA ,2007.
  • 9陈素萍,谢丽聪.一种文本特征选择方法的研究[J].计算机技术与发展,2009,19(2):112-115. 被引量:6
  • 10Peng H, Long F, D!ng C. Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy [ J ]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2005,27(8).

二级参考文献19

  • 1凌锦江,陈兆乾,周志华.基于特征选择的神经网络集成方法[J].复旦学报(自然科学版),2004,43(5):685-688. 被引量:11
  • 2吴浩苗,尹中航,孙富春.Relief算法在笔迹识别中的应用[J].计算机应用,2006,26(1):174-176. 被引量:18
  • 3赖冰凌,王新宇.Relief算法在关门车故障自动识别中的应用[J].铁路计算机应用,2007,16(1):21-23. 被引量:5
  • 4Yang Yirning, Pederson J O. A Comparative Study on Feature Selection in Text Categorization[C]//Proceedings of the 14th International Conferenee on Machine learning. Nashville: Morgan Kaufmann, 1997:412 - 420.
  • 5Ding C, Peng Hanchuan. Minimum redundancy feature selection from microarray gelle expression data[C]//Proceeding of Second IEEE Computational Systems Bioinformaties Conference.LosA Lamitos, USA: IEEE Computer Society Press, 2003: 523 - 528.
  • 6Peng Hanchuan,Long Fuhui,Ding C. Feature .Selection Based on Mutual Information Criteria of Max - Dependency Max - Relevance and Min-Redundancy[J ]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2005,27 (8):1226 - 1238.
  • 7Frakes W B. Steaming Algofithms[C]//Frakes W B,Baeze - Yates B. In Information Retrieval:Data Structure & Algorithms. [ s. l. ]:T P R Prentice Hall, 1992:131 - 160.
  • 8Salton G, Wong A, Yang C S. On the specification of term values in automatic Indexing[J]. Journal of Documentation, 1973,29(4) :351 - 372.
  • 9Fox C. Lexical Analysis and Stoplists[ C]//Frakes W B,Baeze - Yates R. In Information Retrieval:Data Structure & Algorithms[s.l. ] :P T R Prentice Hall, 1992:102- 130.
  • 10Langley P. Selection of relevant features in machine learning [A].In: Greiner R,eds.Proc AAAI Fall Symposium on Relevance [C].New Orleans:AAAI Press,1994.140-144.

共引文献48

同被引文献25

  • 1徐凤亚,罗振声.文本自动分类中特征权重算法的改进研究[J].计算机工程与应用,2005,41(1):181-184. 被引量:56
  • 2寇莎莎,魏振军.自动文本分类中权值公式的改进[J].计算机工程与设计,2005,26(6):1616-1618. 被引量:25
  • 3潘文峰.[D].北京.中国科学院计算技术研究所,2004.7.
  • 4谭金波.文本层次分类中特征项权重算法的比较研究[J].情报杂志,2007,26(9):87-88. 被引量:5
  • 5Sahon G, Buckley B. Term-weighting approaches in automatic text retrieval [ J ]. Information Processing and Management, 1988,24 (5) :513-523.
  • 6Naveenkmar N, Batri K. An empirical study on term weights for text categorization [ J ]. International Journal of Advanced Information Science and Technology,2012,11:43-46.
  • 7Lan M ,Tan C L, Su Jian, et al. Supervised and traditional term weighting methods for automatic text categorization [ J ]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2009,31 (4) :721-735.
  • 8宋惟然.中文文本分类中的特征选择和权重计算方法研究[D].北京:北京工业大学,2013.
  • 9Ducheneaut N, Watts L. In search of coherence : a review of e-mail re- search [ J ]. Human-Computer Interaction,2004 : 11 - 48.
  • 10中国反垃圾邮件状况调查报告[DB/OL],2010-07-15.http://ww.isc.org.cn/zxzx/xhdt/listinfo一1775.html.

引证文献3

二级引证文献20

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部