期刊文献+

SC-BPSO:肝癌分类中一种融合过滤器的二进制粒子群算法特征的选择方法

SC-BPSO:A Filter Fused BPSO Feature Selection Method in Hepatocellular Carcinoma Classification
下载PDF
导出
摘要 癌症的早期诊断能够显著提高癌症患者的存活率,在肝细胞癌患者中这种情况更加明显。机器学习是癌症分类中的有效工具。如何在复杂和高维的癌症数据集中,选择出低维度、高分类精度的特征子集是癌症分类的难题。本文提出了一种二阶段的特征选择方法SC-BPSO:通过组合Spearman相关系数和卡方独立检验作为过滤器的评价函数,设计了一种新型的过滤器方法——SC过滤器,再组合SC过滤器方法和基于二进制粒子群算法(BPSO)的包裹器方法,从而实现两阶段的特征选择。并应用在高维数据的癌症分类问题中,区分正常样本和肝细胞癌样本。首先,对来自美国国家生物信息中心(NCBI)和欧洲生物信息研究所(EBI)的130个肝组织microRNA序列数据(64肝细胞癌,66正常肝组织)进行预处理,使用MiRME算法从原始序列文件中提取microRNA的表达量、编辑水平和编辑后表达量3类特征。然后,调整SC-BPSO算法在肝细胞癌分类场景中的参数,选择出关键特征子集。最后,建立分类模型,预测结果,并与信息增益过滤器、信息增益率过滤器、BPSO包裹器特征选择算法选出的特征子集,使用相同参数的随机森林、支持向量机、决策树、KNN四种分类器分类,对比分类结果。使用SC-BPSO算法选择出的特征子集,分类准确率高达98.4%。研究结果表明,与另外3个特征选择算法相比,SC-BPSO算法能有效地找到尺寸较小和精度更高的特征子集。这对于少量样本高维数据的癌症分类问题可能具有重要意义。 Early diagnosis of cancer can significantly improve the survival rate of cancer patients,especially in patients with hepatocellular carcinoma(HCC).Machine learning is an effective tool in cancer classification.How to select high-classification accuracy feature subsets with low dimension in complex and high-dimensional cancer datasets is a difficult problem in cancer classification.In this paper,we propose a novel feature selection method,SC-BPSO:a two-stage feature selection method implemented by combining the Spearman correlation coefficient,chi-square independent test-based filter method,and binary particle swarm optimal(BPSO)based wrapper method.It has been applied to the cancer classification of high-dimensional data to classify normal samples and HCC samples.The dataset in this paper is obtained from 130 liver tissue microRNA sequence data(64 hepatocellular carcinoma,66 normal liver tissue)from National Center for Bioinformatics(NCBI)and European Bioinformatics Institute(EBI).First,the liver tissue microRNA sequence data was preprocessed to extract the three types of features of microRNA expression,editing level and post-editing expression.Then,the parameters of the SC-BPSO algorithm in the liver cancer classification were adjusted to select a subset of key features.Finally,classifiers were used to establish classification models,predict the results,and compare the classification results with the feature subset selected by the information gain filter,the information gain ratio filter and the BPSO wrapper feature selection algorithm using the same classifier.Using the feature subset selected by the SC-BPSO algorithm,the classification accuracy is up to 98.4%.The experimental results showed that compared with the other three feature selection algorithms,the SC-BPSO algorithm can effectively find feature subsets with relatively small size and higher accuracy.This may have important implications for cancer classification with a small number of samples and high-dimension features.
作者 周楠 郑云 ZHOU Nan;ZHENG Yun(Basic Chemistry Laboratory Building Room 410,Faculty of Information Engineering and Automation,Kunming University of Science and Technology,Kunming 650500,China;State Key Laboratory of Non-Human Primate Biomedicine,Institute of Primate Translational Medicine,Kunming University of Science and Technology,Kunming 650500,China)
出处 《中国生物化学与分子生物学报》 CAS CSCD 北大核心 2022年第8期1106-1116,共11页 Chinese Journal of Biochemistry and Molecular Biology
基金 国家自然科学基金项目(No.31760314) 国家重点研发计划(No.2018YFA0108502)资助。
关键词 癌症分类 特征选择 机器学习 肝细胞癌 微RNA 二进制粒子群算法 cancer classification feature selection machine learning hepatocellular carcinoma(HCC) microRNA(miRNA) binary particle swarm optimal(BPSO)
  • 相关文献

参考文献1

二级参考文献5

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部