摘要
针对特征选择算法的鲁棒性和稳定性问题以及现实应用领域中大量的廉价未标签数据的利用问题,提出一种基于双重融合策略的半监督特征选择算法.该方法综合利用弱分类器融合技术和未标签数据包含的数据集的簇的结构信息来扩充标签数据集,然后在得到的标签数据集上采用不同的特征选择算法,对不同的特征结果进行简单的融合操作,得到最终的特征子集.在一些公共数据集和有毒性预测数据集上的实验结果表明该方法在改善学习精度上有很好的应用前景.
A Dual Ensemble based Semi-Supervised Feature Selection Method(ESSFS),where multiple feature selection methods are combined to yield more robust results,is proposed in this paper.It can deal with the problem of robustness or stability of feature selection techniques as well as the use of "cheaper" unlabeled data in several practical applications.At first,method of weak classifiers' ensemble combined with the information provided by clusters was used to enlarge the labeled dataset.Then,different feature selection methods were applied to this dataset,the final feature subset was obtained after the feature selection ensemble.Experimental results carried out on some public datasets collected from the UCI machine learning repository and predictive toxicology domain show that ESSFS has a promising performance on the improvement of the learning accuracy.
出处
《小型微型计算机系统》
CSCD
北大核心
2010年第8期1604-1608,共5页
Journal of Chinese Computer Systems
基金
福建省自然科学基金项目(2007J0016)资助
教育部回国留学人员基金项目(教外司留[2008]890号)资助
关键词
特征选择
半监督
双重融合
稳定性
feature selection
semi-supervised
dual ensemble
stability