期刊文献+

基于机器学习的中国北方汉族男性身高遗传预测研究

Machine learning based genetic prediction of height in Han Chinese males in northern China
原文传递
导出
摘要 目的选择报道的573个日本人群身高相关SNP位点,在1578名中国北方汉族男性样本中进行关联验证,并构建身高预测模型以及评估模型性能。方法基于多元线性回归,在573个SNP位点中验证与中国北方汉族人群身高相关联的SNP位点;基于赤池信息量准则进行双向逐步回归,优化SNP位点集。基于多基因风险评分,采用多重线性回归、逻辑回归、随机森林、K近邻和多层感知机这5种监督学习算法对身高进行遗传预测。结果47个SNP位点与北方汉族成年身高的关联性得到验证。对身高定量预测中,47个SNP位点的随机森林模型准确性最佳(R2=0.14(95%CI=0.04~0.24),MAE=4.87(95%CI=4.68~5.06))。在身高定性预测中,基于双向逐步回归筛选出的139个SNP位点,K近邻分类器对于高身高(≥178 cm)的预测效果最好(AUC=0.68,95%CI=0.66~0.71);逻辑回归分类器对于矮身高(<165 cm)的预测效果最好(AUC=0.70,95%CI=0.68~0.72)。结论本研究发现机器学习算法在复杂表型的定性预测中具有较高的应用潜力,需要发掘更多具有中国人群特异性的身高相关SNP位点实现预测模型准确度的精度提升。 Objective Based on previously reported 573 height-related SNPs in Japanese population,to verify their association validation in 1578 Han Chinese males from northern China,in order to construct a height prediction model and evaluate the performance of the model.Methods Based on multiple linear regression,the genetic association between 573 SNPs and the height traits of Han Chinese males from northern China were verified.The SNPs panel was optimize by a two-way stepwise regression based on the Akaike information criterion(AIC)Based on polygenic risk scores(PRS),five different supervised learning algorithms were used for genetic prediction of adult height,including multiple linear regression(MLR),logistic regression(LR),random forest(RF),k-nearest neighbor(KNN)and multilayer perceptron(MLP).Results The association between 47 SNPs and adult height in Northern Han Chinese was verifed.For quantitative prediction of adult height,47-SNP random forest model had the highest accuracy(R2=0.14(95%CI=0.04~0.24),MAE=4.87(95%CI=4.68~5.06).For the qualitative prediction of height,based on 139 SNP loci screened by two-way stepwise regression,the K-nearest neighbor classifier was the best for"tall stature"(≥178 cm)(AUC=0.68,95%CI=0.66~0.71);the logistic regression classifier was the best for"short stature"(<165 cm)(AUC=0.70,95%CI=0.68~0.72).Conclusion This study found that machine learning algorithms have high potential for qualitative prediction of complex phenotypes,and more height-related SNPs specific to Chinese populations need to be identified to improve the accuracy of prediction models.
作者 罗祎倩 孙亚男 李彩霞 范虹 赵雯婷 Luo Yiqian;Sun Yanan;Li Caixia;Fan Hong;Zhao Wenting(School of Computer Science,Shaanxi Normal University,Xi'an 710119,China;Key Laboratory of Forensic Genetics,Institute of Forensic Science of China,Bejing 100038,China;Jining Medical University,Jining,272067)
出处 《中国法医学杂志》 CSCD 2023年第3期290-297,共8页 Chinese Journal of Forensic Medicine
基金 公安部科技强警基础工作专项(2021JC15) 陕西省自然科学基金重点项目(2022ZJ-39) 法医遗传学公安部重点实验室开放课题(2021FGKFKT07) 北京市科技新星计划(20220484149)。
关键词 法医遗传学 身高预测 机器学习 中国北方汉族 Forensic genetics Height prediction Machine learning Northern Han population
  • 相关文献

参考文献1

二级参考文献3

共引文献2

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部