摘要
为了从高维基因表达谱数据中识别出与肿瘤分类高相关的基因子集,提出一种基于最小冗余最大相关(minimal redundancy maximal relevance,mRMR)和改进磷虾群(improve krill herd,IKH)算法的两阶段混合特征选择算法,即采用最小冗余最大相关算法评价特征重要性以筛选出高相关、低冗余的基因子集,然后,结合改进磷虾群算法迭代寻优对特征进一步提取。采用支持向量机(support vector machine,SVM)算法作为分类器,在6个肿瘤基因数据集上进行实验分析和比较。实验结果表明,文中提出的方法在分类准确率和特征选择数量方面相比其他算法具有更好的表现。
In order to identify the subset of genes that are highly correlated with tumor classification from the high-dimensional gene expression profile data,a two-stage hybrid feature selection algorithm based on minimal redundancy maximal relevance(mRMR)and improved krill herd algorithm(IKH)is proposed,which uses the minimal redundancy maximal relevance algorithm to evaluate the importance of features to select high correlation and low redundancy gene subsets,and then combined with improved krill herd optimization iterative optimization to further extract the features.In this paper,support vector machine(SVM)algorithm is used as classifier,and experimental analysis and comparison are carried out on six gene datasets.Experimental results show that the proposed method has better performance than other algorithms in classification accuracy and number of feature selection.
作者
吴辰文
纪海斌
WU Chenwen;JI Haibin(School of Electronic and Information Engineering, Lanzhou Jiaotong University, Lanzhou 730070, China)
出处
《西北大学学报(自然科学版)》
CAS
CSCD
北大核心
2022年第2期262-269,共8页
Journal of Northwest University(Natural Science Edition)
基金
国家自然科学基金(6206070101)
甘肃省自然科学基金(21JR7RA293)。