期刊文献+

南-北方汉族人、韩国人和日本人遗传划分机器学习模型优化方案 被引量:1

Optimization scheme of machine learning model for genetic division between northern Han,southern Han,Korean and Japanese
下载PDF
导出
摘要 中国汉族人、韩国人和日本人作为东亚主体人群,其中中国汉族人呈现由北向南的梯度混合,在遗传结构上存在不同程度的差异。为实现对中国南-北方汉族人、韩国人和日本人的高分辨率遗传划分,本研究收集和分析了文献报道和实验室前期数据筛选出的1185个东亚人群祖先信息性SNPs(ancestry informative SNPs,AISNPs),应用softmax与随机森林两种机器学习算法构建族群遗传划分模型,然后利用系统发育树、STRUCTURE和主成分分析方法进一步评估不同模型AISNPs位点组合的族群分类效果,最终筛选出234-AISNP的最优组合,softmax模型准确率为92%,实现了南方汉族人、北方汉族人、韩国人和日本人的高精度区分。本研究测试的两种机器学习算法模型为近距离人群的高分辨率划分提供了重要参考,可作为法医DNA族群推断体系位点开发的重要工具。 Han Chinese,Korean and Japanese are the main populations of East Asia,and Han Chinese presents a gradient admixture from north to south.There are differences among the East Asian populations in genetic structure.To achieve fine-scale genetic classification of southern(S-)and northern(N-)Han Chinese,Korean and Japanese individuals in this study,we collected and analyzed 1185 ancestry informative SNPs(AISNPs)from previous literature reports and our laboratory findings.First,two machine learning algorithms,softmax and randomForest,were used to build genetic classification models.Then,phylogenetic tree,STRUCTURE and principal component analysis were used to evaluate the performance of classification for different AISNP panels.The 234-AISNP panel achieved a fine-scale differentiation among the target populations in four classification schemes.The accuracy of the softmax model was 92%,which realized the accurate classification of the S-Han,N-Han,Korean and Japanese individuals.The two machine learning models tested in this study provided important references for the high-resolution discrimination of close-range populations and will be useful tools to optimize marker panels for developing forensic DNA ancestry inference systems.
作者 孔永强 刘金凯 顾佳琪 徐景怡 郑雨诺 魏以梁 伍少远 Yongqiang Kong;Jinkai Liu;Jiaqi Gu;Jingyi Xu;Yunuo Zheng;Yiliang Wei;Shaoyuan Wu(Key Laboratory of Tianjin for Epigenetics,Department of Biochemistry and Molecular Biology,School of Basic Medical Sciences,Tianjin Medical University,Tianjin 300070,China;Key Laboratory of Phylogeny and Comparative Genomics of Jiangsu Province,Jiangsu Normal University,Xuzhou 221116,China)
出处 《遗传》 CAS CSCD 北大核心 2022年第11期1028-1043,共16页 Hereditas(Beijing)
基金 法医遗传学公安部重点实验室开放课题(编号:2020FGKFKT01) 江苏省研究生科研与实践创新计划项目任务书(编号:KYCX20_2286,KYCX21_2597)资助。
关键词 法医遗传学 祖先信息位点 机器学习 东亚人群 南北方汉族 forensic genetics ancestry informative SNPs machine learning East Asia S-Han and N-Han
  • 相关文献

参考文献11

二级参考文献35

共引文献151

同被引文献12

引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部