摘要
特征选择是机器学习领域中的基本问题之一,在大规模数据处理中至关重要.目前大多数特征选择方法以单一值作为特征的分类能力评价标准,本文提出基于子类问题分类能力的特征选择方法,该方法用特征对各子类问题的分类能力及其加权平均值来度量特征的分类能力,既能保证总分类能力强的特征被选择,也能保证对子类问题分类能力强但总分类能力不强的特征被选择.将该方法与已有的3个特征选择方法在4个公开的基因表达数据集上进行比较,结果表明该方法是有效的,且可以提高分类预测准确率.
Feature selection is one of the basic problems in the field of machine learning,especially is important and nec-essary for processing large scale data. Most of existing feature selection methods generally compute only one discriminant value with respect to class variable for a feature to indicate its classfication ability. Aiming this problem,a feature selection method based on subproblem classification ability is proposed. The method uses the each subproblems classification ability of feature and their weighted average to measure the classification ability of each feature,which can not only ensure that the features with strong classification ability are selected,but also that the features with weak classification ability but strong subproblem classification ability are selected. The proposed method is compared with 3 related methods for feature selection on 4 open gene expression datasets. Experimental results demonstrate the effectiveness of the proposed method,and the classification and prediction accuracy rate are improved.
作者
刘磊
郑陶然
赵晨飞
刘林
王淑琴
何茂伟
LIU Lei;ZHENG Taoran;ZHAO Chenfei;LIU Lin;WANG Shuqin;HE Maowei(College of Computer and Information Engineering, Tianjin Normal University, Tianjin 300387, China;School of computer Science and Software Engineering, Tianjin Polytechnic University, Tianjin 300387, China)
出处
《天津师范大学学报(自然科学版)》
CAS
北大核心
2018年第2期77-80,共4页
Journal of Tianjin Normal University:Natural Science Edition
基金
国家自然科学基金资助项目(61070089)
天津市应用基础与前沿技术研究计划重点资助项目(15JCYBJC4600)
天津市科技计划资助项目(16ZLZDZF00150)
关键词
子类问题
特征选择
分类能力
subclass problem
feature selection
classification ability