期刊文献+

一种面向不平衡数据的半监督特征选择算法 被引量:2

A semi-supervised feature selection algorithm for imbalanced data
下载PDF
导出
摘要 针对不平衡数据中特征维数高、标记样本缺乏问题,提出一种基于遗传算法和BiasedSVM的不平衡数据半监督特征选择算法。该方法首先利用初始的标记样本集训练处理不平衡数据的Biased-SVM模型,然后用训练好的Biased-SVM模型为未标记样本加上标签,再把新标记样本加入到初始标记样本集中,得到新标记样本集,最后采用基于遗传算法的不平衡数据特征选择方法选出最优的特征子集。实验结果表明,所提方法在不同的标记样本率下均具有较高的平均特征子集缩减率和平均小类识别率。 Considering the scarcity of labeled samples and the high feature dimension for imbalanced data, a new semi-supervised feature selection algorithm based on GA and Biased-SVM is proposed. The biased-SVM model which can dispose the unbalanced samples data is trained by the initial labeled sample set and then the trained Biased-SVM model is used to add labels to the unlabeled samples,and add the new labeled samples to the initial labeled sample set. Finally, the optimal feature subset is selected by the GA-based feature selection method for imbalanced data. Experimental results show that the proposed method not only reduces the feature dimension, but also improves the precision of the minor class under the different labeled sample rates general- ly.
作者 杜利敏 徐扬
出处 《河南理工大学学报(自然科学版)》 CAS 北大核心 2017年第5期95-99,105,共6页 Journal of Henan Polytechnic University(Natural Science)
基金 国家自然科学基金青年科学基金资助项目(61305074)
关键词 遗传算法 Biased-SVM 不平衡数据 半监督学习 特征选择 genetic algorithm Biased-SVM imbalanced data semi-supervised learn feature selection
  • 相关文献

参考文献5

二级参考文献67

  • 1Yang K, Yoon H, Shahabi C. A Supervised Feature Subset Selection Technique for Multivariate Time Series.
  • 2Liu H, Yu L. Toward Integrating Feature Selection Algorithms for Classification and Clustering [J]. IEEE Transactions on Knowledge and Data Engineering, 2005,17(4) : 491-502.
  • 3Zhao Zheng, Liu Huan. Searching for Interacting Features[C]// ijcai 2007.
  • 4Seeger M. Leaming with labeled and unlabeled data[R]. 2000.
  • 5Houle M E. Clustering without data : the GreedyRSC heuristic [C]//Proc. International Workshop on Data-Mining and Statistical Science( DMSS 2006). Sapporo, Japan, September 2006: 62-69.
  • 6Houle M E, Grira N. A Correlation - Based Model for Unsupervised Feature Selection[C],//CIKM'07.
  • 7Izutani A , Uehara K . A Modeling Approach Using Multiple Graphs for Semi-Supervised Learning [J]. Discovery Science, 2008: 296-307.
  • 8Nakatani Y, Zhu K, Uehara K. Semisupervised learning using feature selection based on maximum density subgraphs[J]. Systems and Computers in Japan(SCJAPAN) ,2007,38(9) ,32-43.
  • 9Ren Jiangtao, Qiu Zhengyuan, Fan Wei, et al. Forward Semi-Supervised Feature Selection[C]//PAKDD08. 2008.
  • 10Zhao Z, Liu H. Semi-supervised Feature Selection via Spectral Analysis[C]//SIAM International Conference on Data Mining (SDM-07). 2007.

共引文献33

同被引文献16

引证文献2

二级引证文献7

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部