期刊文献+

基于粒子群优化算法的白血病基因表达样本分类研究 被引量:1

Classification of leukemia gene expression data based on particle swarm optimization algorithm
下载PDF
导出
摘要 目的基于分子生物学的微阵列基因表达数据和智能优化算法对白血病肿瘤样本进行分类研究。方法给出基于粒子群优化(PSO)算法用于分类模型的训练和测试,选取含7129个基因的72个白血病基因表达样本,从中选取包含50、100和200个特征基因的3组数据,在不同基因数条件下分别执行10次分类测试。建立基于K-均值算法的分类模型,在同等条件下验证PSO算法分类性能。使用准确率、精确率、召回率、F1值等机器学习指标及Boxplot和Heatmap图谱用于分析对比。结果PSO算法用于分类测试的数据分别含20例急性淋巴细胞白血病(ALL)和14例急性髓细胞白血病(AML)样本。10次分类结果的平均分类准确率均在90%左右;PSO算法的分类准确率并不稳定,10次分类测试中,准确率的平均值和最优值间存在明显差异;ALL亚型的召回率明显高于AML亚型,均接近100%,但AML亚型的精确率明显高于ALL亚型,均接近100%,F1值可比性不大。K-均值算法与PSO算法类似,分类性能随着基因数的增加而降低;K-均值算法在200基因数条件下分类结果较差,分类稳定性和准确率均出现大幅下降,且低于同等条件下PSO算法分类结果;100个基因数条件下,ALL亚型召回率为100%,高于AML亚型;AML亚型精确率为100%,高于ALL亚型;200个基因数条件下,平均值中ALL亚型召回率和F1值高于AML亚型,AML亚型精确率高于ALL亚型,其最优值的统计学指标差异不大。相同白血病肿瘤样本的不同特征基因数条件下,PSO算法可获得较高准确率的分类结果,但分类稳定性不足,整体上优于K-均值算法。结论PSO算法能够应用于白血病基因表达样本的分类研究。 Objective To study classification of microarray gene expression data and intelligent optimization algorithms for molecular biology based on leukemia tumor samples.Methods The classification model training and testing were used to based on particle swarm optimization(PSO)algorithm,72 leukemia gene expression samples of 7129 genes were selected,from which 3 sets of data contained 50,100 and 200 characteristic genes were selected,and 10 classification tests were performed in different gene count conditions.The classification model based on K-means was established to verify performance of PSO algorithm at the same conditions.The machine learning indicators such as accuracy,accuracy rate,recall rate,F1,Boxplot and Heatmap were used to analyze and compare.Results The data used by PSO algorithm for classification testing contained 20 acute lymphoblastic leukemia(ALL)and 14 acute myelocytic leukemia(AML)samples,respectively.The mean classification accuracy of 10 classification results was about 90%;The classification accuracy of PSO algorithm was unstable,the mean and optimal values of 10 classification accuracy were significantly different.The recall rate of ALL was significantly higher than that of AML,which was close to 100%,but accuracy of AML was significantly higher than that of ALL,which was close to100%,the F1 value was not comparable.The K-means algorithm was similar to PSO algorithm.The classification performance decreased with the increase of gene counts.The K-means algorithm showed poor classification results.In 200 genes count condition,the classification stability and accuracy were significantly reduced,which were lower than PSO algorithm classifica-tion.In 100 genes count condition,recall rate of ALL was 100%,which was higher than that of AML;AML accuracy rate was100%,which was higher than that of ALL;In 200 genes count condition,recall rate of All and F1 value were higher than those of AML,and accuracy rate of AML was higher than that of ALL,and statistical value of optimal value was not much different.In different characteristic gene count conditions of the same leukemia samples,PSO algorithm classification method was obtained higher accuracy classification results,but stability was insufficient,and the overall was better than that of K-means algorithm.Conclusion It is demonstrated that the PSO could be used as the classification algorithm for leukemia gene expression samples.
作者 刘亚杰 高莲 周杰 姚瑞晗 朱玲 王晓燕 LIU Ya-jie;GAO Lian;ZHOU Jie;YAO Rui-han;ZHU Ling;WANG Xiao-yan(The Third Affiliated Hospital of Kunming Medical University?Yunnan Cancer Hospital,Kunming 650118,Yunnan,China;Scholl of Information Science and Engineering,Kunming 650091,Yunnan,China;School of Information Engineering,Yunnan Agricultural University,Kunming 650201,Yunnan,China)
出处 《生物医学工程与临床》 CAS 2020年第1期75-80,共6页 Biomedical Engineering and Clinical Medicine
基金 云南省肿瘤医院博士科研启动基金资助(BSJJ201513)。
关键词 基因表达样本分类 白血病 粒子群优化算法 K-均值算法 classification of gene expression samples leukemia particle swarm optimization(PSO) K-means
  • 相关文献

同被引文献9

引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部