期刊文献+

基于线性组合文本特征选择方法 被引量:4

Feature selection method for text based on linear combination
下载PDF
导出
摘要 常用文本分类特征选择算法主要通过某种评价函数来计算单个特征对类别的区分能力,由于仅考虑了特征和类别之间的关联性,忽略了特征与特征之间的相关性,从而导致特征集存在冗余。针对这一问题,提出了一种新的用于文本分类的特征选择算法,该算法可以帮助选出类别区分能力强、特征之间关联性弱的特征。实验证实,该算法的性能要优于传统的特征选择算法。 At present,most of the feature selection algorithm is through some kind of evaluation function to calculate the individual characteristics of the distinction between categories of capacity.For the reason that it merely having considered the relevance between characteristic and category with ignoring the relevance among characteristic themselves,this leads to the redundancy in feature set.In consideration of this problem,this article put forward a new feature selection algorithm in the use of text categorization.This algorithm helped to select the characteristics with strong ability to distinguish category and weak relevance among characteristics.The experimental proves that this method has better performance than the traditional feature selection algorithm.
出处 《计算机应用研究》 CSCD 北大核心 2011年第6期2099-2101,共3页 Application Research of Computers
基金 国家自然科学基金资助项目(70971059) 辽宁省创新团队资助项目(2009T045) 辽宁省科技攻关资助项目(2007308003)
关键词 文本分类 特征选择 模糊相关 冗余性 text classification feature selection fuzzy related redundancy
  • 引文网络
  • 相关文献

参考文献9

  • 1王卫玲,初建崇,许立科.一种基于关联性的特征选择算法[J].计算机应用与软件,2009,26(8):259-261. 被引量:2
  • 2杨彦闯,杨炳儒,张克君.基于联合提取特征的粗糙集文本分类技术研究[J].计算机应用研究,2007,24(7):97-98. 被引量:4
  • 3刘海峰,王元元,姚泽清,张述祖.文本分类中一种混合型特征降维方法[J].计算机工程,2009,35(2):194-196. 被引量:11
  • 4YU Lei, LIU Huan. FCBF-feature selection for high-dimensional data [C]//Proc of the 20th International Conference on Machine Learn- ing. 2003 : 856- 863.
  • 5谭松波.高性能文本分类算法研究[D].北京:中国科学院计算技术研究所,2005.
  • 6MAKREHCHI M,KAMEL M S. Text classification using small num- ber of features[ C]//Proc of the 4th International Conference on Ma- chine Learning and Data Mining. Berlin: Springer-Verlag,2005:580- 589.
  • 7YANG Yi-ming, LIU Xin. A re-examination of text categorization methods [ C ]//Proc of SIGIR' 99. New York : ACM, 1999:42-49.
  • 8ZHANG H. The optimality of naive Bayes[ C]//Proc of the 17th In- ternational FLAIRS Conference. 2004.
  • 9YANG Yi-ming. An evaluation of statistical approaches to text categori- zation[J]. Journal of Information Retrieval,1999,1 (1/2): 67-88.

二级参考文献18

  • 1宋枫溪,刘树海,杨静宇,夏赛飞.最大散度差分类器及其在文本分类中的应用[J].计算机工程,2005,31(5):8-10. 被引量:8
  • 2陈彬,洪家荣,王亚东.最优特征子集选择问题[J].计算机学报,1997,20(2):133-138. 被引量:96
  • 3Cover T M. The Best Two Independent Measurements Are Not the Two Best[J]. 1EEE Transactions on Systems, Man, and Cybernetics, 1974, 4(1): 116-117.
  • 4Makrehchi M, Kamel M S. Text Classification Using Small Number of Features[C]//Proc. of the 4th Int'l Conf. on Machine Learning and Data Mining in Pattern Recognition. [S. l.]: IEEE Press, 2005: 580- 589.
  • 5Jin Zhong, Yang Jingyu, Hu Zhongshan, et al. Face Recognition Based on Uncorrelated Discriminant Transformation[J]. Pattern Recognition, 2001, 34(7): 1405-1416.
  • 6Yu L,Liu H.Feature Selection for high-dimensional data:a fast correlation-based filter solution[R].In Proceedings of the twentieth International Conference on Machine Learning,2003:856-863.
  • 7Lei Yu,Huan Liu.Efficient Feature Selection via Analysis of Relevance and Redundancy[J].Journal of Machine Research,2004(5):1205-1224.
  • 8Guyon I,Elisseeff A.An introduction to variable and feature selection[J].Journal of Machine Learning Research,2003(3):1157-1182.
  • 9Yi Wang,XiaoJing Wang.A New Approach to Feature Selection in Text Classification[R].Proceeding of the Fourth International Conference on Machine Learning and Cybernetics,Guangzhou,2005:18-21.
  • 10Fengxi Song,Shuhai Liu.A Comparative Study on Text Representation Schemes in Text Categorization[J].Pattern Anal Applic,2007.

共引文献15

同被引文献42

  • 1侯汉清 ,章成志 ,郑红 .Web概念挖掘中标引源加权方案初探[J].情报学报,2005,24(1):87-92. 被引量:32
  • 2任效江,胡于进.利用开源框架开发可复用级系统架构[J].计算机与数字工程,2007,35(5):45-48. 被引量:3
  • 3王圆.文本内容过滤的关键技术研究.长春:东北师范大学,2006;19-20.
  • 4田文颖.文本特征提取方法研究.http://blog.csdn.net/tvetve/archive/2008/04/14/229211.aspx,2010-11-06/2011-10-07.
  • 5Salton G,Lesk M E.Computer Evaluation of indexing and text pro-cessing.Journal of the ACM,1968;15(1):8-36.
  • 6Yang Y,Pedersen J Q.A comparative study on feature selection intext categorization.Proceeding of the 14th International Conference onMachine Learning(ICML),1997;412-420.
  • 7蒋健.文本分类中特征提取和特征加权方法研究[D].重庆:重庆大学,2010.
  • 8oyvind Hauge, Claudia Ayala, Reidar Conradi. Adoption of open source software in softwareqntensive organizations A systemat- ic literature review[J]. Information and Software Technology, 2010,52(11):1133 1154.
  • 9QualiPSo. http://www, qualipso, org/sites/default/files/A6. D1.6.3CMM LIKEMODELFOROSS. pdf[2011 12-06].
  • 10NEAOSS. http://www. {sstd. org. cn/getInde:& req? action = quary&req= modulenvpromote&id = 1568&type = 0 &moduleId = 896g&sid~4312011 12-07].

引证文献4

二级引证文献6

;
使用帮助 返回顶部