为了实现提高产量和抵抗病害等能力的目的,需要提高育种水平,通过设计交差验证(Cross-Validation)实验进行大豆基因型和表型数据的分组处理,根据数据的个体和mark的数量进行合理分配,采用gBLUP(genomic Best Linear Unbiased Prediction...为了实现提高产量和抵抗病害等能力的目的,需要提高育种水平,通过设计交差验证(Cross-Validation)实验进行大豆基因型和表型数据的分组处理,根据数据的个体和mark的数量进行合理分配,采用gBLUP(genomic Best Linear Unbiased Prediction)方法进行表型预测。根据对大豆数据多个性状通过不同分组的对比来得到精确值的范围,为后续的育种分析提供依据。对于只有大豆基因型数据而没有表型数据的情况,需要模拟表型,根据设定遗传力和模拟位点的个数(NQTN)进行模拟,然后再进行不同分组获取精准值,这样扩大了大豆数据的预测灵活性。展开更多
旨在比较不同方法对遗传参数估计的差异,为未来北京油鸡胴体和肉质性状选育方法的制定提供参考依据。本研究利用传统最佳线性无偏预测(best linear unbiased prediction,BLUP)和基因组最佳线性无偏预测(genomic best linear unbiased pr...旨在比较不同方法对遗传参数估计的差异,为未来北京油鸡胴体和肉质性状选育方法的制定提供参考依据。本研究利用传统最佳线性无偏预测(best linear unbiased prediction,BLUP)和基因组最佳线性无偏预测(genomic best linear unbiased prediction,GBLUP)两种方法对北京油鸡的胴体和肉质等性状进行了遗传参数估计。从系谱较为完整的北京油鸡群体中,选择100日龄体重相近的公鸡615只,测定其100日龄体重(BW)、屠宰率(EP)、胸肌率(BMP)、腿肌率(LMP)、腹脂率(AFP)、嫩度(T,以剪切力值表示)和肌内脂肪(IMF)等性状,并用SNP芯片(Illumina,60K)进行个体基因分型。结果表明,除IMF和剪切力(SF)遗传力基于两种方法的估值存在较大差异外,其余性状利用两种方法得到的遗传力估值差异较小;除嫩度外,GBLUP方法估计的遗传力均低于BLUP方法。所有胴体相关性状中,除屠宰率遗传力为低遗传力外,其余性状均属于中等遗传力性状。嫩度呈现低遗传力,而IMF基于BLUP法和GBLUP法的估计遗传力分别为中等(h^2=0.256)和低遗传力(h^2=0.107)。基于BLUP方法,IMF与BW、BMP和SF 3个性状间均呈高度遗传负相关(-0.572、-0.420、-0.682),与EP的遗传相关为中度负相关(-0.234),与AFP的遗传相关为中度正相关(0.420);基于GBLUP方法,IMF与BW、BMP和SF 3个性状间均呈高度遗传负相关(-0.808、-0.725、-0.784),与EP的遗传相关为高度负相关(-0.626),与AFP的遗传相关为低度正相关(0.097)。综上,对于某些性状,基于传统的BLUP方法与新的GBLUP方法得到的遗传力与遗传相关估值存在较大差异,实际育种工作中,为提高育种效率,需要综合考虑。展开更多
为了对黄河鲤体质量性状进行全基因组关联分析及全基因组选择模型的预测准确性比较,采用鲤250K高密度SNP芯片对613尾黄河鲤(Cyprinus carpio)进行基因分型,并通过测定其体质量性状的表型信息进行全基因组关联分析,以及基于体质量性状、...为了对黄河鲤体质量性状进行全基因组关联分析及全基因组选择模型的预测准确性比较,采用鲤250K高密度SNP芯片对613尾黄河鲤(Cyprinus carpio)进行基因分型,并通过测定其体质量性状的表型信息进行全基因组关联分析,以及基于体质量性状、全基因组关联分析(genome-wide association study,GWAS)的不同变异数据集对GBLUP、贝叶斯、RKHS和机器学习模型等10种全基因组选择模型的预测准确性进行比较,以筛选出适用于黄河鲤体质量性状的全基因组选择模型。结果表明:通过GWAS定位到与体质量性状相关的5个SNP,位于1号和21号染色体上,进一步筛选关联SNP所在区域的基因,定位到WBP1L、GPM6B、TIMMDC1、RCAN1、EOGT基因;当选取与黄河鲤体质量性状表型相关的前100个SNP作为数据集,分析全基因组选择模型预测准确性时,机器学习模型XGBoost的预测准确性最高,为0.26,当SNP的数量分别为500、1000、3000、5000、20000时,GBLUP模型的准确性均最高,分别为0.3084、0.3444、0.4393、0.4526、0.4007,而XGBoost、LightGBM和GBLUP模型的变异系数则较低,说明模型预测的稳定性相对可靠。研究表明,本研究中共鉴定到5个与黄河鲤体质量性状相关的候选基因,分别为WBP1L、GPM6B、TIMMDC1、RCAN1、EOGT,10种全基因组选择模型中GBLUP模型的预测准确性最高,可用于黄河鲤体质量性状的基因组选育。展开更多
【目的】为提高豫农黑猪体尺性状遗传参数估计的准确性,加快豫农黑猪选育进展。【方法】利用最佳线性无偏预测(best linear unbiased prediction,BLUP)和基因组最佳线性无偏预测(genomic best linear unbiased prediction,GBLUP)2种方法...【目的】为提高豫农黑猪体尺性状遗传参数估计的准确性,加快豫农黑猪选育进展。【方法】利用最佳线性无偏预测(best linear unbiased prediction,BLUP)和基因组最佳线性无偏预测(genomic best linear unbiased prediction,GBLUP)2种方法,构建3个单性状动物模型,即基于BLUP的模型1、基于GBLUP的模型2以及基于包含基因组近交系数GBLUP的模型3,采用平均信息约束性最大似然算法(average information restricted maximum likelihood,AIREML)对702头豫农黑猪体尺性状的遗传参数进行估计。【结果】在遗传参数估计的准确性方面,模型1估计的准确性低于模型2和3;模型3和模型2相比,提高了胸围、腿臀围和眼肌深度性状遗传参数估计的准确性。模型3估计体高、腿臀围、背膘厚和眼肌深度的遗传力为0.566、0.302、0.467和0.652,属于高遗传力性状;体长、胸围和管围的遗传力为0.152、0.122和0.255,属于中遗传力性状。体尺性状间的表型相关系数为-0.009~0.576,遗传相关系数为-0.108~0.985。【结论】在估计豫农黑猪体尺性状遗传参数时,采用近交系数的GBLUP模型可以提高遗传评估的准确性,本研究结果为生产实践中加快遗传进展提供了科学依据。展开更多
Background: A random multiple-regression model that simultaneously fit all allele substitution effects for additive markers or haplotypes as uncorrelated random effects was proposed for Best Linear Unbiased Predictio...Background: A random multiple-regression model that simultaneously fit all allele substitution effects for additive markers or haplotypes as uncorrelated random effects was proposed for Best Linear Unbiased Prediction, using whole-genome data. Leave-one-out cross validation can be used to quantify the predictive ability of a statistical model.Methods: Naive application of Leave-one-out cross validation is computationally intensive because the training and validation analyses need to be repeated n times, once for each observation. Efficient Leave-one-out cross validation strategies are presented here, requiring little more effort than a single analysis.Results: Efficient Leave-one-out cross validation strategies is 786 times faster than the naive application for a simulated dataset with 1,000 observations and 10,000 markers and 99 times faster with 1,000 observations and 100 markers. These efficiencies relative to the naive approach using the same model will increase with increases in the number of observations.Conclusions: Efficient Leave-one-out cross validation strategies are presented here, requiring little more effort than a single analysis.展开更多
Computational efficiency has become a key issue in genomic prediction(GP) owing to the massive historical datasets accumulated. We developed hereby a new super-fast GP approach(SHEAPY) combining randomized Haseman-Els...Computational efficiency has become a key issue in genomic prediction(GP) owing to the massive historical datasets accumulated. We developed hereby a new super-fast GP approach(SHEAPY) combining randomized Haseman-Elston regression(RHE-reg) with a modified Algorithm for Proven and Young(APY) in an additive-effect model, using the former to estimate heritability and then the latter to invert a large genomic relationship matrix for best linear prediction. In simulation results with varied sizes of training population, GBLUP, HEAPY|A and SHEAPY showed similar predictive performance when the size of a core population was half that of a large training population and the heritability was a fixed value, and the computational speed of SHEAPY was faster than that of GBLUP and HEAPY|A. In simulation results with varied heritability, SHEAPY showed better predictive ability than GBLUP in all cases and than HEAPY|A in most cases when the size of a core population was 4/5 that of a small training population and the training population size was a fixed value. As a proof of concept, SHEAPY was applied to the analysis of two real datasets. In an Arabidopsis thaliana F2 population, the predictive performance of SHEAPY was similar to or better than that of GBLUP and HEAPY|A in most cases when the size of a core population(2 0 0) was 2/3 of that of a small training population(3 0 0). In a sorghum multiparental population,SHEAPY showed higher predictive accuracy than HEAPY|A for all of three traits, and than GBLUP for two traits. SHEAPY may become the GP method of choice for large-scale genomic data.展开更多
文摘为了实现提高产量和抵抗病害等能力的目的,需要提高育种水平,通过设计交差验证(Cross-Validation)实验进行大豆基因型和表型数据的分组处理,根据数据的个体和mark的数量进行合理分配,采用gBLUP(genomic Best Linear Unbiased Prediction)方法进行表型预测。根据对大豆数据多个性状通过不同分组的对比来得到精确值的范围,为后续的育种分析提供依据。对于只有大豆基因型数据而没有表型数据的情况,需要模拟表型,根据设定遗传力和模拟位点的个数(NQTN)进行模拟,然后再进行不同分组获取精准值,这样扩大了大豆数据的预测灵活性。
文摘旨在比较不同方法对遗传参数估计的差异,为未来北京油鸡胴体和肉质性状选育方法的制定提供参考依据。本研究利用传统最佳线性无偏预测(best linear unbiased prediction,BLUP)和基因组最佳线性无偏预测(genomic best linear unbiased prediction,GBLUP)两种方法对北京油鸡的胴体和肉质等性状进行了遗传参数估计。从系谱较为完整的北京油鸡群体中,选择100日龄体重相近的公鸡615只,测定其100日龄体重(BW)、屠宰率(EP)、胸肌率(BMP)、腿肌率(LMP)、腹脂率(AFP)、嫩度(T,以剪切力值表示)和肌内脂肪(IMF)等性状,并用SNP芯片(Illumina,60K)进行个体基因分型。结果表明,除IMF和剪切力(SF)遗传力基于两种方法的估值存在较大差异外,其余性状利用两种方法得到的遗传力估值差异较小;除嫩度外,GBLUP方法估计的遗传力均低于BLUP方法。所有胴体相关性状中,除屠宰率遗传力为低遗传力外,其余性状均属于中等遗传力性状。嫩度呈现低遗传力,而IMF基于BLUP法和GBLUP法的估计遗传力分别为中等(h^2=0.256)和低遗传力(h^2=0.107)。基于BLUP方法,IMF与BW、BMP和SF 3个性状间均呈高度遗传负相关(-0.572、-0.420、-0.682),与EP的遗传相关为中度负相关(-0.234),与AFP的遗传相关为中度正相关(0.420);基于GBLUP方法,IMF与BW、BMP和SF 3个性状间均呈高度遗传负相关(-0.808、-0.725、-0.784),与EP的遗传相关为高度负相关(-0.626),与AFP的遗传相关为低度正相关(0.097)。综上,对于某些性状,基于传统的BLUP方法与新的GBLUP方法得到的遗传力与遗传相关估值存在较大差异,实际育种工作中,为提高育种效率,需要综合考虑。
文摘为了对黄河鲤体质量性状进行全基因组关联分析及全基因组选择模型的预测准确性比较,采用鲤250K高密度SNP芯片对613尾黄河鲤(Cyprinus carpio)进行基因分型,并通过测定其体质量性状的表型信息进行全基因组关联分析,以及基于体质量性状、全基因组关联分析(genome-wide association study,GWAS)的不同变异数据集对GBLUP、贝叶斯、RKHS和机器学习模型等10种全基因组选择模型的预测准确性进行比较,以筛选出适用于黄河鲤体质量性状的全基因组选择模型。结果表明:通过GWAS定位到与体质量性状相关的5个SNP,位于1号和21号染色体上,进一步筛选关联SNP所在区域的基因,定位到WBP1L、GPM6B、TIMMDC1、RCAN1、EOGT基因;当选取与黄河鲤体质量性状表型相关的前100个SNP作为数据集,分析全基因组选择模型预测准确性时,机器学习模型XGBoost的预测准确性最高,为0.26,当SNP的数量分别为500、1000、3000、5000、20000时,GBLUP模型的准确性均最高,分别为0.3084、0.3444、0.4393、0.4526、0.4007,而XGBoost、LightGBM和GBLUP模型的变异系数则较低,说明模型预测的稳定性相对可靠。研究表明,本研究中共鉴定到5个与黄河鲤体质量性状相关的候选基因,分别为WBP1L、GPM6B、TIMMDC1、RCAN1、EOGT,10种全基因组选择模型中GBLUP模型的预测准确性最高,可用于黄河鲤体质量性状的基因组选育。
文摘【目的】为提高豫农黑猪体尺性状遗传参数估计的准确性,加快豫农黑猪选育进展。【方法】利用最佳线性无偏预测(best linear unbiased prediction,BLUP)和基因组最佳线性无偏预测(genomic best linear unbiased prediction,GBLUP)2种方法,构建3个单性状动物模型,即基于BLUP的模型1、基于GBLUP的模型2以及基于包含基因组近交系数GBLUP的模型3,采用平均信息约束性最大似然算法(average information restricted maximum likelihood,AIREML)对702头豫农黑猪体尺性状的遗传参数进行估计。【结果】在遗传参数估计的准确性方面,模型1估计的准确性低于模型2和3;模型3和模型2相比,提高了胸围、腿臀围和眼肌深度性状遗传参数估计的准确性。模型3估计体高、腿臀围、背膘厚和眼肌深度的遗传力为0.566、0.302、0.467和0.652,属于高遗传力性状;体长、胸围和管围的遗传力为0.152、0.122和0.255,属于中遗传力性状。体尺性状间的表型相关系数为-0.009~0.576,遗传相关系数为-0.108~0.985。【结论】在估计豫农黑猪体尺性状遗传参数时,采用近交系数的GBLUP模型可以提高遗传评估的准确性,本研究结果为生产实践中加快遗传进展提供了科学依据。
基金supported by the US Department of Agriculture,Agriculture and Food Research Initiative National Institute of Food and Agriculture Competitive grant no.2015-67015-22947
文摘Background: A random multiple-regression model that simultaneously fit all allele substitution effects for additive markers or haplotypes as uncorrelated random effects was proposed for Best Linear Unbiased Prediction, using whole-genome data. Leave-one-out cross validation can be used to quantify the predictive ability of a statistical model.Methods: Naive application of Leave-one-out cross validation is computationally intensive because the training and validation analyses need to be repeated n times, once for each observation. Efficient Leave-one-out cross validation strategies are presented here, requiring little more effort than a single analysis.Results: Efficient Leave-one-out cross validation strategies is 786 times faster than the naive application for a simulated dataset with 1,000 observations and 10,000 markers and 99 times faster with 1,000 observations and 100 markers. These efficiencies relative to the naive approach using the same model will increase with increases in the number of observations.Conclusions: Efficient Leave-one-out cross validation strategies are presented here, requiring little more effort than a single analysis.
基金supported by the National Natural Science Foundation of China to Guo-Bo Chen(31771392)Zhejiang Provincial People’s Hospital Research Startup to Guo-Bo Chen(ZRY2018A004)。
文摘Computational efficiency has become a key issue in genomic prediction(GP) owing to the massive historical datasets accumulated. We developed hereby a new super-fast GP approach(SHEAPY) combining randomized Haseman-Elston regression(RHE-reg) with a modified Algorithm for Proven and Young(APY) in an additive-effect model, using the former to estimate heritability and then the latter to invert a large genomic relationship matrix for best linear prediction. In simulation results with varied sizes of training population, GBLUP, HEAPY|A and SHEAPY showed similar predictive performance when the size of a core population was half that of a large training population and the heritability was a fixed value, and the computational speed of SHEAPY was faster than that of GBLUP and HEAPY|A. In simulation results with varied heritability, SHEAPY showed better predictive ability than GBLUP in all cases and than HEAPY|A in most cases when the size of a core population was 4/5 that of a small training population and the training population size was a fixed value. As a proof of concept, SHEAPY was applied to the analysis of two real datasets. In an Arabidopsis thaliana F2 population, the predictive performance of SHEAPY was similar to or better than that of GBLUP and HEAPY|A in most cases when the size of a core population(2 0 0) was 2/3 of that of a small training population(3 0 0). In a sorghum multiparental population,SHEAPY showed higher predictive accuracy than HEAPY|A for all of three traits, and than GBLUP for two traits. SHEAPY may become the GP method of choice for large-scale genomic data.