摘要
基因组数据在畜禽遗传育种中的应用越来越广泛,基因型填充作为基因组数据处理的重要工具,填充结果的好坏直接影响后续分析,为了得到好的填充结果,需要制定完善的填充策略。本研究通过模拟数据探讨参考群体大小、目标群体与参考群体间遗传关系(距离)远近、目标位点数目(比例)、最小等位基因频率以及填充算法等因素对基因型填充效果的影响。结果表明,目标位点数目与填充效果呈显著的正相关(P<0.05),是影响基因型填充准确性的主要因素;参考群体大小是影响Beagle5.1填充错误率的主要因素,目标位点数目是影响Minimac4填充错误率的主要因素;目标群体和参考群体的遗传距离对Beagle5.1填充效果的影响较Minimac4更为显著;一般情况下,最小等位基因频率越高的位点填充错误率越高;在参考群体个体数量少且目标位点数目多的情况下,Minimac4的填充速度优于Beagle5.1,但随参考群体个体数目增加有逆趋势。在保证填充质量的前提下,Beagle5.1对本研究中几种因素的标准要求相对较低。相对地,当目标群体位点数目较低,参考群体个体数目较多时,Beagle5.1的填充效果更好,而Minimac4更适合参考群体个体数目较少,目标群体位点数目较高的填充中。本研究针对不同的填充目的制定了不同策略,为基因型填充标准提供了参考。
Genomic data is more and more widely used in livestock breeding.Genotype imputation is an important tool to handle missing values in genotypic data,and the quality of imputation results directly affects the subsequent analysis.To obtain good imputation results,a comprehensive imputation strategy needs to be formulated.We studied on the effects of several factors on genotype imputation by simulation.The factors included reference population size,genetic relationship(distance)between the target population and the reference population,the number of target sites(proportion),the minimum allele frequency(MAF),and the imputation algorithm.The results showed that the number of target sites was the main factor affecting the genotype imputation,and it showed significantly positive correlation with the quality of imputation(P<0.05).The reference population size was the main factor affecting the imputation error rate in Beagle5.1.Correspondingly,the number of target sites was the main factor affecting the imputation error rate in Minimac4.Genetic distance between the target population and the reference population had a more significant effect on the imputation quality of Beagle5.1 than Minimac4.In general,the imputation error rate increased as the increases of MAF in a site.When the number of individuals in the reference population was small and the number of target sites was large,the speed of Minimac4 was superior to Beagle5.1,but there was a reverse trend as the reference population size increased.On the premise of ensuring the imputation quality,Beagle5.1 had relatively lower requirements for the above factors.In contrast,when the number of target sites was low and reference population size was large,the imputation effect of Beagle5.1 was better,while Minimac4 was more suitable for the imputation of a small reference population size and a higher number of target sites.In this study,different strategies were formulated for different imputation purposes,and the study results would provide a reference for genotype imputation.
作者
邓天宇
杜立新
王立贤
赵福平
DENG Tianyu;DU Lixin;WANG Lixian;ZHAO Fuping(Key Laboratory of Animal Genetics,Breeding and Reproduction(poultry)of Ministry of Agriculture,Institute of Animal Science,Chinese Academy of Agricultural Sciences,Beijing 100193,China)
出处
《畜牧兽医学报》
CAS
CSCD
北大核心
2020年第9期2068-2078,共11页
ACTA VETERINARIA ET ZOOTECHNICA SINICA
基金
国家自然科学基金(31572357)
国家生猪产业技术体系(CARS-35)。
关键词
基因型填充
模拟数据
参考群体大小
填充算法
错误率
genotype imputation
simulation data
reference population size
imputation method
error rate