期刊文献+

融合遗传算法与XGBoost的玉米百粒重相关基因挖掘 被引量:1

The method of 100-kernel weight related genes mining in maize mixed with genetic algorithm and XGboost
下载PDF
导出
摘要 基于RNA-Seq的转录组测序数据特征维度较高,使用传统生信方法寻找表型相关基因需要大量计算资源,且差异分析所得候选基因范围较大,进一步筛选依赖已有的先验知识。针对这一问题,本文提出了融合遗传算法和XGBoost的转录组分析方法-GA-XGBoost,通过融入机器学习算法缩小了后续分析的候选基因范围。在一组高质量玉米数据集上对基因-百粒重性状的关联进行了对比实验和后续分析,结果显示,相比于分别使用全体基因和差异表达基因直接训练XGBoost模型,所提方法得到的候选基因训练的XGBoost模型在玉米百粒重的预测结果上具有最小的MSE;相比于差异表达分析结果的1542个差异表达基因,GA-XGBoost方法最终将候选基因范围减小至48个,范围缩小了31倍,表明所提方法能够有效提升对转录组数据的分析能力和效率。 The RNA-Seq-based transcriptome sequencing data has a high feature dimension that requires a lot of computing resources when using traditional methods to find phenotype related genes.Moreover,the range of candidate genes obtained by difference analysis is large,and further screening depends on existing a prior knowledge.A transcriptome analysis method combining genetic algorithm and XGBoost,GA-XGBoost,was proposed to narrow the range of candidate genes for subsequent analysis by incorporating machine learning algorithm.A comparative experiment and subsequent analysis of the gene-100-kernel weight trait association on a set of high-quality maize datasets showed that,compared with training the XGBoost model directly with whole genes and differentially expressed genes,the candidate gene training XGBoost model obtained by the proposed method had the minimum MSE in predicting the 100-kernel weight of maize.Compared with 1542 differentially expressed genes in the results of differential expression analysis,the range of candidate genes was reduced to 48 by the GA-XGBoost method,which was reduced by 31 times,indicating that the proposed method could effectively improve the ability and efficiency of transcriptome data analysis.
作者 杨帅 郭茂祖 赵玲玲 李阳 YANG Shuai;GUO Maozu;ZHAO Lingling;LI Yang(School of Electrical and Information Engineering,Beijing University of Civil Engineering and Architecture,Beijing 100044,China;Beijing Key Laboratory of Intelligent Processing for Building Big Data,Beijing 100044,China;School of Computer Science and Technology,Harbin Institute of Technology,Harbin 150001,China)
出处 《智能系统学报》 CSCD 北大核心 2022年第1期170-180,共11页 CAAI Transactions on Intelligent Systems
基金 国家自然科学基金项目(62031003,61871020) 北京市属高校高水平创新团队建设计划项目(IDHT20190506) 国家重点研发计划子课题(2020YFF0305501) 北京市教委科技计划重点项目(KZ201810016019).
关键词 遗传算法 极限梯度提升算法 机器学习 玉米 转录组分析 百粒重 基因本体 京都基因与基因组百科全书 genetic algorithm eXtreme gradient boosting machine learning maize transcriptome analysis 100-kernel weight gene ontology kyoto encyclopedia of genes and genomes
  • 相关文献

参考文献1

二级参考文献1

共引文献2

同被引文献11

引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部