期刊文献+

基于互信息的分类属性数据特征选择算法 被引量:3

Mutual information-based feature selection algorithm for nominal data
下载PDF
导出
摘要 提出了一种针对分类属性数据特征选择的新算法。通过给出一种能够直接评价分类属性数据特征选择的评价函数新定义,重新构造能实现分类属性数据信息量、条件互信息、特征之间依赖度定义的计算公式,并在此基础上,提出了一种基于互信息较大相关、较小冗余的特征选择(MRLR)算法。MRLR算法在特征选择时不仅考虑了特征与类标签之间的相关性,而且还考虑了特征之间的冗余性。大量的仿真实验表明,MRLR算法在针对分类属性数据的特征选择时,能获得冗余度小且更具代表性的特征子集,具有较好的高效性和稳定性。 In this paper, a novel feature selection approach based on mutual information called More Relevance Less Redun-dancy(MRLR)algorithm for nominal data is proposed. By reconstructing the computation method of the amount of infor-mation, the conditional mutual information, the dependence between the features so that which can be suitable for compu-tation related the nominal data, and a new definition of the evaluation function of feature selection is given, as well as a new feature selection criterion is used to evaluate the importance of each feature, which takes into account both relevance and redundancy. In MRLR, experimental results show that the relevance and redundancy respectively use mutual informa-tion to measure the dependence of features on the latent class and the dependence between features, and it also enhance the correctness and the effectiveness of MRLR algorithm.
出处 《计算机工程与应用》 CSCD 2014年第16期135-139,共5页 Computer Engineering and Applications
关键词 分类属性数据 特征选择 互信息 nominal data feature selection mutual information
  • 相关文献

参考文献14

  • 1Last M,Kandel A,Maimon O.Information theoretic algo- rithm for feature selection[J],Pattern Recognation, 2001, 34(22) :799-811.
  • 2Agrawal R, Imilinski T, Swami A.Mining association rules between sets of items in large database[C]//Proc of the ACM SIGMOD Conference on Management of Data, 1993.
  • 3Hu Qinghua, Xie Zongxia, Yu Daren.Hybrid attribute reduction based on a novel fuzzy rough model and infor- mation granulation[J].Pattern Recognition, 2007, 40 (12) : 3509-3521.
  • 4陈思睿,张永,杨志勇.基于粗糙集的特征选择方法的研究[J].计算机工程与应用,2006,42(21):159-161. 被引量:7
  • 5唐亮,段建国,许洪波,梁玲.基于互信息最大化的特征选择算法及应用[J].计算机工程与应用,2008,44(13):130-133. 被引量:35
  • 6Fano R.Transmission of information: a statistical theory of communications[M].New York:Wiley, 1961.
  • 7Battiti R.Using mutual information for selecting features in Supervised neural net learning[J].IEEE Transactions on Neural Networks, 1994,5 : 537-550.
  • 8Kwak N, Choi C H.Input feature selection by mutual infor- mation based on Parzen window[J].IEEE Transactions on Pattern Analysis and Machine Intelligence, 2002,24(12) : 1667-1671.
  • 9Minho Kim, Ramakrishna R S.Projected clustering for categorical datasets[J].Pattern Recognition Letters,2006, 27: 1405-1417.
  • 10Amiri F,Rezaei M.Mutual information-based feature selection for intrusion detection systems[J].Journal of Network and Computer Applications, 2011,34 : 1184-1199.

二级参考文献15

  • 1和亚丽,陈立潮.Web文本挖掘中的特征选取方法研究[J].计算机工程,2005,31(5):181-182. 被引量:14
  • 2王珏,苗夺谦,周育健.关于Rough Set理论与应用的综述[J].模式识别与人工智能,1996,9(4):337-344. 被引量:264
  • 3Yang Yiming,Pedersen J O.A comparative study on feature selection in text categorization[C]//Proc of the 14th International Conference on Machine Learning ICML97,1997:412-420.
  • 4Karypis G,Han E.Fast supervised dimensionality reduction algorithm with applications to document categorization and retrieval[C]// Proc of the 9th ACM International Conference on Information and Knowledge Management CIKM-00.New York,US:ACM Press,2000: 228-233.
  • 5Baker L D,McCallum A K.Distributional clustering of words for text classification[C]//Proc of the 21st Annual International ACM SIGIR, 1998 :96-103.
  • 6谭松波语料库[DB/OL].http://lcc.software.ict.ac.cn/-tansongbo/corpusl.php.
  • 7Jolliffe I T.Principal component analysis[M].New York:Spriger Verlag, 1986.
  • 8Martinez A M,Kak A C.PCA versus LDA[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2001,23(2):228-233.
  • 9Yiming yang,Jan O Pederson.A comparative study on feature selection in text categorization[C].In:proceeding of the 14th international conference on Machine Learning(ICML'97),1997:412~420
  • 10Patrick perrin,Frederick E Petry.Extraction and representation of contextual information for knowledge discovery in texts.Information Sciences,2003; 151:125~152

共引文献40

同被引文献30

引证文献3

二级引证文献6

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部