期刊文献+

一种近似Markov Blanket最优特征选择算法 被引量:15

An Approximate Markov Blanket Feature Selection Algorithm
下载PDF
导出
摘要 特征选择可以有效改善分类效率和精度,传统方法通常只评价单个特征,较少评价特征子集.在研究特征相关性基础上,进一步划分特征为强相关、弱相关、无关和冗余四种特征,建立起Markov Blanket理论和特征相关性之间的联系,结合Chi-Square检验统计方法,提出了一种基于前向选择的近似Markov Blanket特征选择算法,获得近似最优的特征子集.实验结果证明文中方法选取的特征子集与原始特征子集相比,以远小于原始特征数的特征子集获得了高于或接近于原始特征集的分类结果.同时,在高维特征空间的文本分类领域,与其它的特征选择方法OCFS,DF,CHI,IG等方法的分类结果进行了比较,在20Newsgroup文本数据集上的分类实验结果表明文中提出的方法获得的特征子集在分类时优于其它方法. Feature selection(FS) can effectively improve the speed and accuracy of classification. The traditional FS approaches usually score a single feature, do not evaluate feature subset. Based on the research in feature relevance, features can be further divided into four categories: Strong relevance, weak relevance, irrelevance and redundancy. The paper proposes a forward selection algorithm-An approximate Markov Blanket (MB) feature selection by theory of MB and Chi-Square test, which obtain an approximate optimal feature subset. Experiments on the datasets suggest that, compared with original feature set, the feature subset obtained by the proposed approach is much less than original feature set and performance on actual classification is better than or as good as that by original feature set. Meanwhile, when used in high dimension feature space such as text categorization, compared with other traditional feature selection approaches. OCFS, DF, CHI, IG, the performance obtained by the proposed method is obviously superior to that of others on 20 Newsgroup dataset.
出处 《计算机学报》 EI CSCD 北大核心 2007年第12期2074-2081,共8页 Chinese Journal of Computers
基金 国家杰出青年科学基金(60425206) 国家自然科学基金(60503020) 江苏省高校自然科学研究计划项目基金(04kjb520096)资助~~
关键词 特征选择 相关性 MARKOV BLANKET CHI-Square检验 分类 feature selection relevance Markov Blanket CHI- Square test categorization
  • 相关文献

参考文献17

  • 1Mitchell T M. Machine Learning. New Jersey: McGraw Hill, 1997
  • 2Duda R O, Hart P E, Stork D G. Pattern Classification. 2nd Edition. New York: John Wiley & Sons, 2000
  • 3Rennie J D, Shih L, Teevan J, Karger D R. Tackling the poor assumptions of naive Bayes text classifiers//Proceedings of the 20th International Conference on Machine Learning. Washington DC, 2003 : 616-623
  • 4Joachims T. Text categorization with support vector machines: Learning with many relevant features//Proceedings of the 10th European Conference on Machine Learning. Chemnitz, DE, 1998:137-142
  • 5Dash M, Liu H. Feature selection for classification. International Journal of Intelligent data Analysis, 1997, 1:131-156
  • 6Kohavi R, John R C. Wrappers for feature subset selection. Artificial Intelligence, 1997, 97 : 273-324
  • 7Das S. Filters, wrappers and a boosting-based hybrid for feature seleetion//Proceedings of the 18th International Conference on Machine Learning. Williams College, 2001:74-81
  • 8Yang Y, Pedersen J O. A comparative study on feature selection in text categorization//Proceedings of the 14th International Conference on Machine Learning. Nashville, 1997 : 412-420
  • 9Yu L, Liu H. Efficient feature selection via analysis of relevance and redundancy. Journal of Machine Learning Research, 2004, 10:1205-1224
  • 10Qu G, Hariri S, Yousif M. A new dependency and correlation analysis for features. IEEE Transactions on Knowledge and Data Engineering, 2005, 17 : 1199-1207

同被引文献111

引证文献15

二级引证文献125

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部