Maize stalk rot reduces grain yield and quality.Information about the genetics of resistance to maize stalk rot could help breeders design effective breeding strategies for the trait.Genomic prediction may be a more e...Maize stalk rot reduces grain yield and quality.Information about the genetics of resistance to maize stalk rot could help breeders design effective breeding strategies for the trait.Genomic prediction may be a more effective breeding strategy for stalk-rot resistance than marker-assisted selection.We performed a genome-wide association study(GWAS)and genomic prediction of resistance in testcross hybrids of 677 inbred lines from the Tuxpe?o and non-Tuxpe?o heterotic pools grown in three environments and genotyped with 200,681 single-nucleotide polymorphisms(SNPs).Eighteen SNPs associated with stalk rot shared genomic regions with gene families previously associated with plant biotic and abiotic responses.More favorable SNP haplotypes traced to tropical than to temperate progenitors of the inbred lines.Incorporating genotype-by-environment(G×E)interaction increased genomic prediction accuracy.展开更多
Background:Survival from birth to slaughter is an important economic trait in commercial pig productions.Increasing survival can improve both economic efficiency and animal welfare.The aim of this study is to explore ...Background:Survival from birth to slaughter is an important economic trait in commercial pig productions.Increasing survival can improve both economic efficiency and animal welfare.The aim of this study is to explore the impact of genotyping strategies and statistical models on the accuracy of genomic prediction for survival in pigs during the total growing period from birth to slaughter.Results:We simulated pig populations with different direct and maternal heritabilities and used a linear mixed model,a logit model,and a probit model to predict genomic breeding values of pig survival based on data of individual survival records with binary outcomes(0,1).The results show that in the case of only alive animals having genotype data,unbiased genomic predictions can be achieved when using variances estimated from pedigreebased model.Models using genomic information achieved up to 59.2%higher accuracy of estimated breeding value compared to pedigree-based model,dependent on genotyping scenarios.The scenario of genotyping all individuals,both dead and alive individuals,obtained the highest accuracy.When an equal number of individuals(80%)were genotyped,random sample of individuals with genotypes achieved higher accuracy than only alive individuals with genotypes.The linear model,logit model and probit model achieved similar accuracy.Conclusions:Our conclusion is that genomic prediction of pig survival is feasible in the situation that only alive pigs have genotypes,but genomic information of dead individuals can increase accuracy of genomic prediction by 2.06%to 6.04%.展开更多
Background:Genomic selection(GS)has revolutionized animal and plant breeding after the first implementation via early selection before measuring phenotypes.Besides genome,transcriptome and metabolome information are i...Background:Genomic selection(GS)has revolutionized animal and plant breeding after the first implementation via early selection before measuring phenotypes.Besides genome,transcriptome and metabolome information are increasingly considered new sources for GS.Difficulties in building the model with multi-omics data for GS and the limit of specimen availability have both delayed the progress of investigating multi-omics.Results:We utilized the Cosine kernel to map genomic and transcriptomic data as n×n symmetric matrix(G matrix and T matrix),combined with the best linear unbiased prediction(BLUP)for GS.Here,we defined five kernel-based prediction models:genomic BLUP(GBLUP),transcriptome-BLUP(TBLUP),multi-omics BLUP(MBLUP,M=ratio×G+(1-ratio)×T),multi-omics single-step BLUP(mss BLUP),and weighted multi-omics single-step BLUP(wmss BLUP)to integrate transcribed individuals and genotyped resource population.The predictive accuracy evaluations in four traits of the Chinese Simmental beef cattle population showed that(1)MBLUP was far preferred to GBLUP(ratio=1.0),(2)the prediction accuracy of wmss BLUP and mss BLUP had 4.18%and 3.37%average improvement over GBLUP,(3)We also found the accuracy of wmss BLUP increased with the growing proportion of transcribed cattle in the whole resource population.Conclusions:We concluded that the inclusion of transcriptome data in GS had the potential to improve accuracy.Moreover,wmss BLUP is accepted to be a promising alternative for the present situation in which plenty of individuals are genotyped when fewer are transcribed.展开更多
Background Pork quality can directly affect customer purchase tendency and meat quality traits have become valu-able in modern pork production.However,genetic improvement has been slow due to high phenotyping costs.In...Background Pork quality can directly affect customer purchase tendency and meat quality traits have become valu-able in modern pork production.However,genetic improvement has been slow due to high phenotyping costs.In this study,whole genome sequence(WGS)data was used to evaluate the prediction accuracy of genomic best linear unbiased prediction(GBLUP)for meat quality in large-scale crossbred commercial pigs.Results We produced WGS data(18,695,907 SNPs and 2,106,902 INDELs exceed quality control)from 1,469 sequenced Duroc×(Landrace×Yorkshire)pigs and developed a reference panel for meat quality including meat color score,marbling score,L*(lightness),a*(redness),and b*(yellowness)of genomic prediction.The prediction accuracy was defined as the Pearson correlation coefficient between adjusted phenotypes and genomic estimated breeding values in the validation population.Using different marker density panels derived from WGS data,accuracy differed substantially among meat quality traits,varied from 0.08 to 0.47.Results showed that MultiBLUP outperform GBLUP and yielded accuracy increases ranging from 17.39%to 75%.We optimized the marker density and found medium-and high-density marker panels are beneficial for the estimation of heritability for meat quality.Moreover,we conducted genotype imputation from 50K chip to WGS level in the same population and found average concord-ance rate to exceed 95%and r^(2)=0.81.Conclusions Overall,estimation of heritability for meat quality traits can benefit from the use of WGS data.This study showed the superiority of using WGS data to genetically improve pork quality in genomic prediction.展开更多
Genomic prediction(GP)in plant breeding has the potential to predict and identify the best-performing hybrids based on the genotypes of their parental lines.In a GP experiment,34 elite inbred lines were selected to ma...Genomic prediction(GP)in plant breeding has the potential to predict and identify the best-performing hybrids based on the genotypes of their parental lines.In a GP experiment,34 elite inbred lines were selected to make 285 single-cross hybrids in a partial-diallel cross design.These lines represented a mini-core collection of Chinese maize germplasm and comprised 18 inbred lines from the Stiff Stalk heterotic group and 16 inbred lines from the Non-Stiff Stalk heterotic group.The parents were genotyped by sequencing and the 285 hybrids were phenotyped for nine yield and yield-related traits at two locations in the summer sowing area(SUS)and three locations in the spring sowing area(SPS)in the main maizeproducing regions of China.Multiple GP models were employed to assess the accuracy of trait prediction in the hybrids.By ten-fold cross-validation,the prediction accuracies of yield performance of the hybrids estimated by the genomic best linear unbiased prediction(GBLUP)model in SUS and SPS were 0.51 and 0.46,respectively.The prediction accuracies of the remaining yield-related traits estimated with GBLUP ranged from 0.49 to 0.86 and from 0.53 to 0.89 in SUS and SPS,respectively.When additive,dominance,epistasis effects,genotype-by-environment interaction,and multi-trait effects were incorporated into the prediction model,the prediction accuracy of hybrid yield performance was improved.The ratio of training to testing population and size of training population optimal for yield prediction were determined.Multiple prediction models can improve prediction accuracy in hybrid breeding.展开更多
Background Carcass traits are crucial for broiler ducks,but carcass traits can only be measured postmortem.Genomic selection(GS)is an effective approach in animal breeding to improve selection and reduce costs.However...Background Carcass traits are crucial for broiler ducks,but carcass traits can only be measured postmortem.Genomic selection(GS)is an effective approach in animal breeding to improve selection and reduce costs.However,the performance of genomic prediction in duck carcass traits remains largely unknown.Results In this study,we estimated the genetic parameters,performed GS using different models and marker densi-ties,and compared the estimation performance between GS and conventional BLUP on 35 carcass traits in an F2 population of ducks.Most of the cut weight traits and intestine length traits were estimated to be high and moder-ate heritabilities,respectively,while the heritabilities of percentage slaughter traits were dynamic.The reliability of genome prediction using GBLUP increased by an average of 0.06 compared to the conventional BLUP method.The Permutation studies revealed that 50K markers had achieved ideal prediction reliability,while 3K markers still achieved 90.7%predictive capability would further reduce the cost for duck carcass traits.The genomic relationship matrix nor-malized by our true variance method instead of the widely used 2pi(1-pi)could achieve an increase in prediction reliability in most traits.We detected most of the bayesian models had a better performance,especially for BayesN.Compared to GBLUP,BayesN can further improve the predictive reliability with an average of 0.06 for duck carcass traits.Conclusion This study demonstrates genomic selection for duck carcass traits is promising.The genomic prediction can be further improved by modifying the genomic relationship matrix using our proposed true variance method and several Bayesian models.Permutation study provides a theoretical basis for the fact that low-density arrays can be used to reduce genotype costs in duck genome selection.展开更多
Fusarium ear rot(FER)is a destructive maize fungal disease worldwide.In this study,three tropical maize populations consisting of 874 inbred lines were used to perform genomewide association study(GWAS)and genomic pre...Fusarium ear rot(FER)is a destructive maize fungal disease worldwide.In this study,three tropical maize populations consisting of 874 inbred lines were used to perform genomewide association study(GWAS)and genomic prediction(GP)analyses of FER resistance.Broad phenotypic variation and high heritability for FER were observed,although it was highly influenced by large genotype-by-environment interactions.In the 874 inbred lines,GWAS with general linear model(GLM)identified 3034 single-nucleotide polymorphisms(SNPs)significantly associated with FER resistance at the P-value threshold of 1×10^(-5),the average phenotypic variation explained(PVE)by these associations was 3%with a range from 2.33%to 6.92%,and 49 of these associations had PVE values greater than 5%.The GWAS analysis with mixed linear model(MLM)identified 19 significantly associated SNPs at the P-value threshold of 1×10^(-4),the average PVE of these associations was 1.60%with a range from 1.39%to 2.04%.Within each of the three populations,the number of significantly associated SNPs identified by GLM and MLM ranged from 25 to 41,and from 5 to 22,respectively.Overlapping SNP associations across populations were rare.A few stable genomic regions conferring FER resistance were identified,which located in bins 3.04/05,7.02/04,9.00/01,9.04,9.06/07,and 10.03/04.The genomic regions in bins 9.00/01 and 9.04 are new.GP produced moderate accuracies with genome-wide markers,and relatively high accuracies with SNP associations detected from GWAS.Moderate prediction accuracies were observed when the training and validation sets were closely related.These results implied that FER resistance in maize is controlled by minor QTL with small effects,and highly influenced by the genetic background of the populations studied.Genomic selection(GS)by incorporating SNP associations detected from GWAS is a promising tool for improving FER resistance in maize.展开更多
Background:Recently,machine learning(ML)has become attractive in genomic prediction,but its superiority in genomic prediction over conventional(ss)GBLUP methods and the choice of optimal ML methods need to be investig...Background:Recently,machine learning(ML)has become attractive in genomic prediction,but its superiority in genomic prediction over conventional(ss)GBLUP methods and the choice of optimal ML methods need to be investigated.Results:In this study,2566 Chinese Yorkshire pigs with reproduction trait records were genotyped with the GenoBaits Porcine SNP 50 K and PorcineSNP50 panels.Four ML methods,including support vector regression(SVR),kernel ridge regression(KRR),random forest(RF)and Adaboost.R2 were implemented.Through 20 replicates of fivefold cross-validation(CV)and one prediction for younger individuals,the utility of ML methods in genomic prediction was explored.In CV,compared with genomic BLUP(GBLUP),single-step GBLUP(ssGBLUP)and the Bayesian method BayesHE,ML methods significantly outperformed these conventional methods.ML methods improved the genomic prediction accuracy of GBLUP,ssGBLUP,and BayesHE by 19.3%,15.0% and 20.8%,respectively.In addition,ML methods yielded smaller mean squared error(MSE)and mean absolute error(MAE)in all scenarios.ssGBLUP yielded an improvement of 3.8% on average in accuracy compared to that of GBLUP,and the accuracy of BayesHE was close to that of GBLUP.In genomic prediction of younger individuals,RF and Adaboost.R2_KRR performed better than GBLUP and BayesHE,while ssGBLUP performed comparably with RF,and ssGBLUP yielded slightly higher accuracy and lower MSE than Adaboost.R2_KRR in the prediction of total number of piglets born,while for number of piglets born alive,Adaboost.R2_KRR performed significantly better than ssGBLUP.Among ML methods,Adaboost.R2_KRR consistently performed well in our study.Our findings also demonstrated that optimal hyperparameters are useful for ML methods.After tuning hyperparameters in CV and in predicting genomic outcomes of younger individuals,the average improvement was 14.3% and 21.8% over those using default hyperparameters,respectively.Conclusion:Our findings demonstrated that ML methods had better overall prediction performance than conventional genomic selection methods,and could be new options for genomic prediction.Among ML methods,Adaboost.R2_KRR consistently performed well in our study,and tuning hyperparameters is necessary for ML methods.The optimal hyperparameters depend on the character of traits,datasets etc.展开更多
Background:Genotyping by sequencing(GBS)still has problems with missing genotypes.Imputation is important for using GBS for genomic predictions,especially for low depths,due to the large number of missing genotypes.Mi...Background:Genotyping by sequencing(GBS)still has problems with missing genotypes.Imputation is important for using GBS for genomic predictions,especially for low depths,due to the large number of missing genotypes.Minor allele frequency(MAF)is widely used as a marker data editing criteria for genomic predictions.In this study,three imputation methods(Beagle,IMPUTE2 and FImpute software)based on four MAF editing criteria were investigated with regard to imputation accuracy of missing genotypes and accuracy of genomic predictions,based on simulated data of livestock population.Results:Four MAFs(no MAF limit,MAF≥0.001,MAF≥0.01 and MAF≥0.03)were used for editing marker data before imputation.Beagle,IMPUTE2 and FImpute software were applied to impute the original GBS.Additionally,IMPUTE2 also imputed the expected genotype dosage after genotype correction(GcIM).The reliability of genomic predictions was calculated using GBS and imputed GBS data.The results showed that imputation accuracies were the same for the three imputation methods,except for the data of sequencing read depth(depth)=2,where FImpute had a slightly lower imputation accuracy than Beagle and IMPUTE2.GcIM was observed to be the best for all of the imputations at depth=4,5 and 10,but the worst for depth=2.For genomic prediction,retaining more SNPs with no MAF limit resulted in higher reliability.As the depth increased to 10,the prediction reliabilities approached those using true genotypes in the GBS loci.Beagle and IMPUTE2 had the largest increases in prediction reliability of 5 percentage points,and FImpute gained 3 percentage points at depth=2.The best prediction was observed at depth=4,5 and 10 using GcIM,but the worst prediction was also observed using GcIM at depth=2.Conclusions:The current study showed that imputation accuracies were relatively low for GBS with low depths and high for GBS with high depths.Imputation resulted in larger gains in the reliability of genomic predictions for GBS with lower depths.These results suggest that the application of IMPUTE2,based on a corrected GBS(GcIM)to improve genomic predictions for higher depths,and FImpute software could be a good alternative for routine imputation.展开更多
Background: Genotyping by sequencing(GBS) is a robust method to genotype markers. Many factors can influence the genotyping quality. One is that heterozygous genotypes could be wrongly genotyped as homozygotes,depende...Background: Genotyping by sequencing(GBS) is a robust method to genotype markers. Many factors can influence the genotyping quality. One is that heterozygous genotypes could be wrongly genotyped as homozygotes,dependent on the genotyping depths. In this study, a method correcting this type of genotyping error was demonstrated. The efficiency of this correction method and its effect on genomic prediction were assessed using simulated data of livestock populations.Results: Chip array(Chip) and four depths of GBS data was simulated. After quality control(call rate ≥ 0.8 and MAF ≥ 0.01), the remaining number of Chip and GBS SNPs were both approximately 7,000, averaged over 10 replicates. GBS genotypes were corrected with the proposed method. The reliability of genomic prediction was calculated using GBS, corrected GBS(GBSc), true genotypes for the GBS loci(GBSr) and Chip data. The results showed that GBSc had higher rates of correct genotype calls and higher correlations with true genotypes than GBS. For genomic prediction, using Chip data resulted in the highest reliability. As the depth increased to 10, the prediction reliabilities using GBS and GBSc data approached those using true GBS data. The reliabilities of genomic prediction using GBSc data were 0.604, 0.672, 0.684 and 0.704 after genomic correction, with the improved values of 0.013, 0.009, 0.006 and 0.001 at depth = 2, 4, 5 and 10, respectively.Conclusions: The current study showed that a correction method for GBS data increased the genotype accuracies and, consequently, improved genomic predictions. These results suggest that a correction of GBS genotype is necessary, especially for the GBS data with low depths.展开更多
Many rice-growing areas are affected by high concentrations of arsenic(As).Rice varieties that prevent As uptake and/or accumulation can mitigate As threats to human health.Genomic selection is known to facilitate rap...Many rice-growing areas are affected by high concentrations of arsenic(As).Rice varieties that prevent As uptake and/or accumulation can mitigate As threats to human health.Genomic selection is known to facilitate rapid selection of superior genotypes for complex traits.We explored the predictive ability(PA)of genomic prediction with single-environment models,accounting or not for trait-specific markers,multi-environment models,and multi-trait and multi-environment models,using the genotypic(1600K SNPs)and phenotypic(grain As content,grain yield and days to flowering)data of the Bengal and Assam Aus Panel.Under the base-line single-environment model,PA of up to 0.707 and 0.654 was obtained for grain yield and grain As content,respectively;the three prediction methods(Bayesian Lasso,genomic best linear unbiased prediction and reproducing kernel Hilbert spaces)were considered to perform similarly,and marker selection based on linkage disequilibrium allowed to reduce the number of SNP to 17K,without negative effect on PA of genomic predictions.Single-environment models giving distinct weight to trait-specific markers in the genomic relationship matrix outperformed the base-line models up to 32%.Multi-environment models,accounting for genotype×environment interactions,and multi-trait and multi-environment models outperformed the base-line models by up to 47%and 61%,respectively.Among the multi-trait and multi-environment models,the Bayesian multi-output regressor stacking function obtained the highest predictive ability(0.831 for grain As)with much higher efficiency for computing time.These findings pave the way for breeding for As-tolerance in the progenies of biparental crosses involving members of the Bengal and Assam Aus Panel.Genomic prediction can also be applied to breeding for other complex traits under multiple environments.展开更多
Background:Presently,multi-omics data(e.g.,genomics,transcriptomics,proteomics,and metabolomics)are available to improve genomic predictors.Omics data not only offers new data layers for genomic prediction but also pr...Background:Presently,multi-omics data(e.g.,genomics,transcriptomics,proteomics,and metabolomics)are available to improve genomic predictors.Omics data not only offers new data layers for genomic prediction but also provides a bridge between organismal phenotypes and genome variation that cannot be readily captured at the genome sequence level.Therefore,using multi-omics data to select feature markers is a feasible strategy to improve the accuracy of genomic prediction.In this study,simultaneously using whole-genome sequencing(WGS)and gene expression level data,four strategies for single-nucleotide polymorphism(SNP)preselection were investigated for genomic predictions in the Drosophila Genetic Reference Panel.Results:Using genomic best linear unbiased prediction(GBLUP)with complete WGS data,the prediction accuracies were 0.208±0.020(0.181±0.022)for the startle response and 0.272±0.017(0.307±0.015)for starvation resistance in the female(male)lines.Compared with GBLUP using complete WGS data,both GBLUP and the genomic feature BLUP(GFBLUP)did not improve the prediction accuracy using SNPs preselected from complete WGS data based on the results of genome-wide association studies(GWASs)or transcriptome-wide association studies(TWASs).Furthermore,by using SNPs preselected from the WGS data based on the results of the expression quantitative trait locus(eQTL)mapping of all genes,only the startle response had greater accuracy than GBLUP with the complete WGS data.The best accuracy values in the female and male lines were 0.243±0.020 and 0.220±0.022,respectively.Importantly,by using SNPs preselected based on the results of the eQTL mapping of significant genes from TWAS,both GBLUP and GFBLUP resulted in great accuracy and small bias of genomic prediction.Compared with the GBLUP using complete WGS data,the best accuracy values represented increases of 60.66%and 39.09%for the starvation resistance and 27.40%and 35.36%for startle response in the female and male lines,respectively.Conclusions:Overall,multi-omics data can assist genomic feature preselection and improve the performance of genomic prediction.The new knowledge gained from this study will enrich the use of multi-omics in genomic prediction.展开更多
Genomic selection has been demonstrated as a powerful technology to revolutionize animal breeding. However, marker density and minor allele frequency can affect the predictive ability of genomic estimated breeding val...Genomic selection has been demonstrated as a powerful technology to revolutionize animal breeding. However, marker density and minor allele frequency can affect the predictive ability of genomic estimated breeding values (GEBVs). To investigate the impact of marker density and minor allele frequency on predictive ability, we estimated GEBVs by constructing the different subsets of single nucleotide polymorphisms (SNPs) based on varying markers densities and minor allele frequency (MAF) for average daily gain (ADG), live weight (LW) and carcass weight (CW) in 1 059 Chinese Simmental beef cattle. Two strategies were proposed for SNP selection to construct different marker densities: 1) select evenly-spaced SNPs (Strategy 1 ), and 2) select SNPs with large effects estimated from BayesB (Strategy 2). Furthermore, predictive ability was assessed in terms of the correlation between predicted genomic values and corrected phenotypes from 10-fold cross-validation. Predictive ability for ADG, LW and CW using autosomal SNPs were 0.13+0.002, 0.21+0.003 and 0.25+0.003, respectively. In our study, the predictive ability increased dramatically as more SNPs were included in analysis until 200K for Strategy 1. Under Strategy 2, we found the predictive ability slightly increased when marker densities increased from 5K to 20K, which indicated the predictive ability of 20K (3% of 770K) SNPs with large effects was equal to the predictive ability of using all SNPs. For different MAF bins, we obtained the highest predictive ability for three traits with MAF bin 0.01-0.1. Our result suggested that designing a low-density chip by selecting low frequency markers with large SNP effects sizes should be helpful for commercial application in Chinese Simmental cattle.展开更多
Germplasm conserved in gene banks is underutilized,owing mainly to the cost of characterization.Genomic prediction can be applied to predict the genetic merit of germplasm.Germplasm utilization could be greatly accele...Germplasm conserved in gene banks is underutilized,owing mainly to the cost of characterization.Genomic prediction can be applied to predict the genetic merit of germplasm.Germplasm utilization could be greatly accelerated if prediction accuracy were sufficiently high with a training population of practical size.Large-scale resequencing projects in rice have generated high quality genome-wide variation information for many diverse accessions,making it possible to investigate the potential of genomic prediction in rice germplasm management and exploitation.We phenotyped six traits in nearly 2000 indica(XI)and japonica(GJ)accessions from the Rice 3K project and investigated different scenarios for forming training populations.A composite core training set was considered in two levels which targets used for prediction of subpopulations within subspecies or prediction across subspecies.Composite training sets incorporating 400 or 200 accessions from either subpopulation of XI or GJ showed satisfactory prediction accuracy.A composite training set of 600 XI and GJ accessions showed sufficiently high prediction accuracy for both XI and GJ subspecies.Comparable or even higher prediction accuracy was observed for the composite training set than for the corresponding homogeneous training sets comprising accessions only of specific subpopulations of XI or GJ(within-subspecies level)or pure XI or GJ accessions(across-subspecies level)that were included in the composite training set.Validation using an independent population of 281 rice cultivars supported the predictive ability of the composite training set.Reliability,which reflects the robustness of a training set,was markedly higher for the composite training set than for the corresponding homogeneous training sets.A core training set formed from diverse accessions could accurately predict the genetic merit of rice germplasm.展开更多
Computational efficiency has become a key issue in genomic prediction(GP) owing to the massive historical datasets accumulated. We developed hereby a new super-fast GP approach(SHEAPY) combining randomized Haseman-Els...Computational efficiency has become a key issue in genomic prediction(GP) owing to the massive historical datasets accumulated. We developed hereby a new super-fast GP approach(SHEAPY) combining randomized Haseman-Elston regression(RHE-reg) with a modified Algorithm for Proven and Young(APY) in an additive-effect model, using the former to estimate heritability and then the latter to invert a large genomic relationship matrix for best linear prediction. In simulation results with varied sizes of training population, GBLUP, HEAPY|A and SHEAPY showed similar predictive performance when the size of a core population was half that of a large training population and the heritability was a fixed value, and the computational speed of SHEAPY was faster than that of GBLUP and HEAPY|A. In simulation results with varied heritability, SHEAPY showed better predictive ability than GBLUP in all cases and than HEAPY|A in most cases when the size of a core population was 4/5 that of a small training population and the training population size was a fixed value. As a proof of concept, SHEAPY was applied to the analysis of two real datasets. In an Arabidopsis thaliana F2 population, the predictive performance of SHEAPY was similar to or better than that of GBLUP and HEAPY|A in most cases when the size of a core population(2 0 0) was 2/3 of that of a small training population(3 0 0). In a sorghum multiparental population,SHEAPY showed higher predictive accuracy than HEAPY|A for all of three traits, and than GBLUP for two traits. SHEAPY may become the GP method of choice for large-scale genomic data.展开更多
Genome-wide association mapping studies(GWAS)based on Big Data are a potential approach to improve marker-assisted selection in plant breeding.The number of available phenotypic and genomic data sets in which medium-s...Genome-wide association mapping studies(GWAS)based on Big Data are a potential approach to improve marker-assisted selection in plant breeding.The number of available phenotypic and genomic data sets in which medium-sized populations of several hundred individuals have been studied is rapidly increasing.Combining these data and using them in GWAS could increase both the power of QTL discovery and the accuracy of estimation of underlying genetic effects,but is hindered by data heterogeneity and lack of interoperability.In this study,we used genomic and phenotypic data sets,focusing on Central European winter wheat populations evaluated for heading date.We explored strategies for integrating these data and subsequently the resulting potential for GWAS.Establishing interoperability between data sets was greatly aided by some overlapping genotypes and a linear relationship between the different phenotyping protocols,resulting in high quality integrated phenotypic data.In this context,genomic prediction proved to be a suitable tool to study relevance of interactions between genotypes and experimental series,which was low in our case.Contrary to expectations,fewer associations between markers and traits were found in the larger combined data than in the individual experimental series.However,the predictive power based on the marker-trait associations of the integrated data set was higher across data sets.Therefore,the results show that the integration of medium-sized to Big Data is an approach to increase the power to detect QTL in GWAS.The results encourage further efforts to standardize and share data in the plant breeding community.展开更多
Deep learning(DL)plays a critical role in processing and converting data into knowledge and decisions.DL technologies have been applied in a variety of applications,including image,video,and genome sequence analysis.I...Deep learning(DL)plays a critical role in processing and converting data into knowledge and decisions.DL technologies have been applied in a variety of applications,including image,video,and genome sequence analysis.In deep learning the most widely utilized architecture is Convolutional Neural Networks(CNN)are taught discriminatory traits in a supervised environment.In comparison to other classic neural networks,CNN makes use of a limited number of artificial neurons,therefore it is ideal for the recognition and processing of wheat gene sequences.Wheat is an essential crop of cereals for people around the world.Wheat Genotypes identification has an impact on the possible development of many countries in the agricultural sector.In quantitative genetics prediction of genetic values is a central issue.Wheat is an allohexaploid(AABBDD)with three distinct genomes.The sizes of the wheat genome are quite large compared to many other kinds and the availability of a diversity of genetic knowledge and normal structure at breeding lines of wheat,Therefore,genome sequence approaches based on techniques of Artificial Intelligence(AI)are necessary.This paper focuses on using the Wheat genome sequence will assist wheat producers in making better use of their genetic resources and managing genetic variation in their breeding program,as well as propose a novel model based on deep learning for offering a fundamental overview of genomic prediction theory and current constraints.In this paper,the hyperparameters of the network are optimized in the CNN to decrease the requirement for manual search and enhance network performance using a new proposed model built on an optimization algorithm and Convolutional Neural Networks(CNN).展开更多
In genomic selection, prediction accuracy is highly driven by the size of animals in the reference population(RP).Combining related populations from different countries and regions or using a related population with l...In genomic selection, prediction accuracy is highly driven by the size of animals in the reference population(RP).Combining related populations from different countries and regions or using a related population with large size of RP has been considered to be viable strategies in cattle breeding. The genetic relationship between related populations is important for improving the genomic predictive ability. In this study, we used 122 French bulls as test individuals. The genomic estimated breeding values(GEBVs) evaluated using French RP, America RP and Chinese RP were compared.The results showed that the GEBVs were in higher concordance using French RP and American RP compared with using Chinese population. The persistence analysis, kinship analysis and the principal component analysis(PCA) were performed for 270 French bulls, 270 American bulls and 270 Chinese bulls to interpret the results. All the analyses illustrated that the genetic relationship between French bulls and American bulls was closer compared with Chinese bulls. Another reason could be the size of RP in China was smaller than the other two RPs. In conclusion, using RP of a related population to predict GEBVs of the animals in a target population is feasible when these two populations have a close genetic relationship and the related population is large.展开更多
Wheat is a staple foodfor more than 35%of the world's population,with wheatflourused to make hundreds of baked goods.Superior end-use quality is a major breeding target;however,improving it is especially time-cons...Wheat is a staple foodfor more than 35%of the world's population,with wheatflourused to make hundreds of baked goods.Superior end-use quality is a major breeding target;however,improving it is especially time-consuming and expensive.Furthermore,genes encoding seed-storage proteins(ssPs)form multigene families and are repetitive,with gaps commonplace in several genome assemblies.To overcome these barriers and efficiently identify superior wheat SSP alleles,we developed"PanSK"(Pan-SSP k-mer)for genotype-to-phenotype prediction based on an SsP-based pangenome resource.PanSK uses 29-mer sequences that represent each ssP gene at the pangenomic level to reveal untapped diversity across landraces and modern cultivars.Genome-wide association studies with k-mers identified 23 Ssp genes associated with end-use quality that represent novel targets for improvement.We evaluated the effect of rye secalin genes on end-use quality and found that removal of w-secalins from 1BL/1RS wheat translocation lines is associated with enhanced end-use quality.Finally,using machine-learning-based prediction inspired by PanSK,we predicted the quality phenotypes with high accuracy from genotypes alone.This study provides an effective approach for genome design based on ssP genes,enabling the breeding of wheat varieties with superior processing capabilities and improved end-use quality.展开更多
Genomic selection,the application of genomic prediction(GP)models to select candidate individuals,has significantly advanced in the past two decades,effectively accelerating genetic gains in plant breeding.This articl...Genomic selection,the application of genomic prediction(GP)models to select candidate individuals,has significantly advanced in the past two decades,effectively accelerating genetic gains in plant breeding.This article provides a holistic overview of key factors that have influenced GP in plant breeding during this period.We delved into the pivotal roles of training population size and genetic diversity,and their relationship with the breeding population,in determining GP accuracy.Special emphasis was placed on optimizing training population size.We explored its benefits and the associated diminishing returns beyond an optimum size.This was done while considering the balance between resource allocation and maximizing prediction accuracy through current optimization algorithms.The density and distribution of single-nucleotide polymorphisms,level of linkage disequilibrium,genetic complexity,trait heritability,statistical machine-learning methods,and non-additive effects are the other vital factors.Using wheat,maize,and potato as examples,we summarize the effect of these factors on the accuracy of GP for various traits.The search for high accuracy in GP—theoretically reaching one when using the Pearson’s correlation as a metric—is an active research area as yet far from optimal for various traits.We hypothesize that with ultra-high sizes of genotypic and phenotypic datasets,effective training population optimization methods and support from other omics approaches(transcriptomics,metabolomics and proteomics)coupled with deep-learning algorithms could overcome the boundaries of current limitations to achieve the highest possible prediction accuracy,making genomic selection an effective tool in plant breeding.展开更多
基金funded by the CGIAR Research Program(CRP)on MAIZEthe USAID through the Accelerating Genetic Gains Supplemental Project(Amend.No.9 MTO 069033),and the One CGIAR Initiative on Accelerated Breeding+1 种基金funding from the governments of Australia,Belgium,Canada,China,France,India,Japan,the Republic of Korea,Mexico,the Netherlands,New Zealand,Norway,Sweden,Switzerland,the United Kingdom,the United States,and the World Banksupported by the China Scholarship Council。
文摘Maize stalk rot reduces grain yield and quality.Information about the genetics of resistance to maize stalk rot could help breeders design effective breeding strategies for the trait.Genomic prediction may be a more effective breeding strategy for stalk-rot resistance than marker-assisted selection.We performed a genome-wide association study(GWAS)and genomic prediction of resistance in testcross hybrids of 677 inbred lines from the Tuxpe?o and non-Tuxpe?o heterotic pools grown in three environments and genotyped with 200,681 single-nucleotide polymorphisms(SNPs).Eighteen SNPs associated with stalk rot shared genomic regions with gene families previously associated with plant biotic and abiotic responses.More favorable SNP haplotypes traced to tropical than to temperate progenitors of the inbred lines.Incorporating genotype-by-environment(G×E)interaction increased genomic prediction accuracy.
基金funded by the"Genetic improvement of pig survival"project from Danish Pig Levy Foundation (Aarhus,Denmark)The China Scholarship Council (CSC)for providing scholarship to the first author。
文摘Background:Survival from birth to slaughter is an important economic trait in commercial pig productions.Increasing survival can improve both economic efficiency and animal welfare.The aim of this study is to explore the impact of genotyping strategies and statistical models on the accuracy of genomic prediction for survival in pigs during the total growing period from birth to slaughter.Results:We simulated pig populations with different direct and maternal heritabilities and used a linear mixed model,a logit model,and a probit model to predict genomic breeding values of pig survival based on data of individual survival records with binary outcomes(0,1).The results show that in the case of only alive animals having genotype data,unbiased genomic predictions can be achieved when using variances estimated from pedigreebased model.Models using genomic information achieved up to 59.2%higher accuracy of estimated breeding value compared to pedigree-based model,dependent on genotyping scenarios.The scenario of genotyping all individuals,both dead and alive individuals,obtained the highest accuracy.When an equal number of individuals(80%)were genotyped,random sample of individuals with genotypes achieved higher accuracy than only alive individuals with genotypes.The linear model,logit model and probit model achieved similar accuracy.Conclusions:Our conclusion is that genomic prediction of pig survival is feasible in the situation that only alive pigs have genotypes,but genomic information of dead individuals can increase accuracy of genomic prediction by 2.06%to 6.04%.
基金funds from the National Natural Science Foundations of China(32172693)the Program of National Beef Cattle and Yak Industrial Technology System(CARS-37)。
文摘Background:Genomic selection(GS)has revolutionized animal and plant breeding after the first implementation via early selection before measuring phenotypes.Besides genome,transcriptome and metabolome information are increasingly considered new sources for GS.Difficulties in building the model with multi-omics data for GS and the limit of specimen availability have both delayed the progress of investigating multi-omics.Results:We utilized the Cosine kernel to map genomic and transcriptomic data as n×n symmetric matrix(G matrix and T matrix),combined with the best linear unbiased prediction(BLUP)for GS.Here,we defined five kernel-based prediction models:genomic BLUP(GBLUP),transcriptome-BLUP(TBLUP),multi-omics BLUP(MBLUP,M=ratio×G+(1-ratio)×T),multi-omics single-step BLUP(mss BLUP),and weighted multi-omics single-step BLUP(wmss BLUP)to integrate transcribed individuals and genotyped resource population.The predictive accuracy evaluations in four traits of the Chinese Simmental beef cattle population showed that(1)MBLUP was far preferred to GBLUP(ratio=1.0),(2)the prediction accuracy of wmss BLUP and mss BLUP had 4.18%and 3.37%average improvement over GBLUP,(3)We also found the accuracy of wmss BLUP increased with the growing proportion of transcribed cattle in the whole resource population.Conclusions:We concluded that the inclusion of transcriptome data in GS had the potential to improve accuracy.Moreover,wmss BLUP is accepted to be a promising alternative for the present situation in which plenty of individuals are genotyped when fewer are transcribed.
基金supported by a Technical Innovation of Crossbred in Swine and Breed High Fertility Lines Project(2022B0202090002)a Local Innovative and Research Teams Project of Guangdong Province(2019BT02N630)+1 种基金a Natural Science Foundation of Guangdong Province project(2018B030313011)Innovative Teams of Modern Agriculture and Industry Technology System of Guangdong Province(2022KJ26).
文摘Background Pork quality can directly affect customer purchase tendency and meat quality traits have become valu-able in modern pork production.However,genetic improvement has been slow due to high phenotyping costs.In this study,whole genome sequence(WGS)data was used to evaluate the prediction accuracy of genomic best linear unbiased prediction(GBLUP)for meat quality in large-scale crossbred commercial pigs.Results We produced WGS data(18,695,907 SNPs and 2,106,902 INDELs exceed quality control)from 1,469 sequenced Duroc×(Landrace×Yorkshire)pigs and developed a reference panel for meat quality including meat color score,marbling score,L*(lightness),a*(redness),and b*(yellowness)of genomic prediction.The prediction accuracy was defined as the Pearson correlation coefficient between adjusted phenotypes and genomic estimated breeding values in the validation population.Using different marker density panels derived from WGS data,accuracy differed substantially among meat quality traits,varied from 0.08 to 0.47.Results showed that MultiBLUP outperform GBLUP and yielded accuracy increases ranging from 17.39%to 75%.We optimized the marker density and found medium-and high-density marker panels are beneficial for the estimation of heritability for meat quality.Moreover,we conducted genotype imputation from 50K chip to WGS level in the same population and found average concord-ance rate to exceed 95%and r^(2)=0.81.Conclusions Overall,estimation of heritability for meat quality traits can benefit from the use of WGS data.This study showed the superiority of using WGS data to genetically improve pork quality in genomic prediction.
基金the National Natural Science Foundation of China(32272049,32261143757)Sustainable Development International Cooperation Program from Bill&Melinda Gates Foundation(2022YFAG1002)+2 种基金the National Key Research and Development Program of China(2020YFE0202300)the Agricultural Science&Technology Innovation Program(CAASZDRW202109)the China Scholarship Council.
文摘Genomic prediction(GP)in plant breeding has the potential to predict and identify the best-performing hybrids based on the genotypes of their parental lines.In a GP experiment,34 elite inbred lines were selected to make 285 single-cross hybrids in a partial-diallel cross design.These lines represented a mini-core collection of Chinese maize germplasm and comprised 18 inbred lines from the Stiff Stalk heterotic group and 16 inbred lines from the Non-Stiff Stalk heterotic group.The parents were genotyped by sequencing and the 285 hybrids were phenotyped for nine yield and yield-related traits at two locations in the summer sowing area(SUS)and three locations in the spring sowing area(SPS)in the main maizeproducing regions of China.Multiple GP models were employed to assess the accuracy of trait prediction in the hybrids.By ten-fold cross-validation,the prediction accuracies of yield performance of the hybrids estimated by the genomic best linear unbiased prediction(GBLUP)model in SUS and SPS were 0.51 and 0.46,respectively.The prediction accuracies of the remaining yield-related traits estimated with GBLUP ranged from 0.49 to 0.86 and from 0.53 to 0.89 in SUS and SPS,respectively.When additive,dominance,epistasis effects,genotype-by-environment interaction,and multi-trait effects were incorporated into the prediction model,the prediction accuracy of hybrid yield performance was improved.The ratio of training to testing population and size of training population optimal for yield prediction were determined.Multiple prediction models can improve prediction accuracy in hybrid breeding.
基金supported by grants from the Key Technologies Research on New Breed of Broiler Poultry by Integration of Breeding,Reproduction and Promotion(2021CXGC010805-02)Taishan Industry Leadership Talent Project of Shandong province in China(TSCY20190108)+1 种基金China Agriculture Research System of MOF and MARA(CARS-42)the Science and Technology Innovation Project of the Chinese Academy of Agricultural Sciences(CXGC-IAS-09).
文摘Background Carcass traits are crucial for broiler ducks,but carcass traits can only be measured postmortem.Genomic selection(GS)is an effective approach in animal breeding to improve selection and reduce costs.However,the performance of genomic prediction in duck carcass traits remains largely unknown.Results In this study,we estimated the genetic parameters,performed GS using different models and marker densi-ties,and compared the estimation performance between GS and conventional BLUP on 35 carcass traits in an F2 population of ducks.Most of the cut weight traits and intestine length traits were estimated to be high and moder-ate heritabilities,respectively,while the heritabilities of percentage slaughter traits were dynamic.The reliability of genome prediction using GBLUP increased by an average of 0.06 compared to the conventional BLUP method.The Permutation studies revealed that 50K markers had achieved ideal prediction reliability,while 3K markers still achieved 90.7%predictive capability would further reduce the cost for duck carcass traits.The genomic relationship matrix nor-malized by our true variance method instead of the widely used 2pi(1-pi)could achieve an increase in prediction reliability in most traits.We detected most of the bayesian models had a better performance,especially for BayesN.Compared to GBLUP,BayesN can further improve the predictive reliability with an average of 0.06 for duck carcass traits.Conclusion This study demonstrates genomic selection for duck carcass traits is promising.The genomic prediction can be further improved by modifying the genomic relationship matrix using our proposed true variance method and several Bayesian models.Permutation study provides a theoretical basis for the fact that low-density arrays can be used to reduce genotype costs in duck genome selection.
基金The authors gratefully acknowledge the financial support from the MasAgro project funded by Mexico’s Secretary of Agriculture and Rural Development(SADER),the Genomic Open-source Breeding Informatics Initiative(GOBII)(grant number OPP1093167)supported by the Bill&Melinda Gates Foundation,and the CGIAR Research Program(CRP)on maize(MAIZE)MAIZE receives W1&W2 support from the Governments of Australia,Belgium,Canada,China,France,India,Japan,the Republic of Korea,Mexico,Netherlands,New Zealand,Norway,Sweden,Switzerland,the United Kingdom,USA,and the World Bank+2 种基金The authors also thank the National Natural Science Foundation of China(grant number 31801442)the CIMMYT–China Specialty Maize Research Center Project funded by the Shanghai Municipal Finance Bureauthe China Scholarship Council.
文摘Fusarium ear rot(FER)is a destructive maize fungal disease worldwide.In this study,three tropical maize populations consisting of 874 inbred lines were used to perform genomewide association study(GWAS)and genomic prediction(GP)analyses of FER resistance.Broad phenotypic variation and high heritability for FER were observed,although it was highly influenced by large genotype-by-environment interactions.In the 874 inbred lines,GWAS with general linear model(GLM)identified 3034 single-nucleotide polymorphisms(SNPs)significantly associated with FER resistance at the P-value threshold of 1×10^(-5),the average phenotypic variation explained(PVE)by these associations was 3%with a range from 2.33%to 6.92%,and 49 of these associations had PVE values greater than 5%.The GWAS analysis with mixed linear model(MLM)identified 19 significantly associated SNPs at the P-value threshold of 1×10^(-4),the average PVE of these associations was 1.60%with a range from 1.39%to 2.04%.Within each of the three populations,the number of significantly associated SNPs identified by GLM and MLM ranged from 25 to 41,and from 5 to 22,respectively.Overlapping SNP associations across populations were rare.A few stable genomic regions conferring FER resistance were identified,which located in bins 3.04/05,7.02/04,9.00/01,9.04,9.06/07,and 10.03/04.The genomic regions in bins 9.00/01 and 9.04 are new.GP produced moderate accuracies with genome-wide markers,and relatively high accuracies with SNP associations detected from GWAS.Moderate prediction accuracies were observed when the training and validation sets were closely related.These results implied that FER resistance in maize is controlled by minor QTL with small effects,and highly influenced by the genetic background of the populations studied.Genomic selection(GS)by incorporating SNP associations detected from GWAS is a promising tool for improving FER resistance in maize.
基金supported by grants from the National Key Research and Development Project(2019YFE0106800)Modern Agriculture Science and Technology Key Project of Hebei Province(19226376D)China Agriculture Research System of MOF and MARA.
文摘Background:Recently,machine learning(ML)has become attractive in genomic prediction,but its superiority in genomic prediction over conventional(ss)GBLUP methods and the choice of optimal ML methods need to be investigated.Results:In this study,2566 Chinese Yorkshire pigs with reproduction trait records were genotyped with the GenoBaits Porcine SNP 50 K and PorcineSNP50 panels.Four ML methods,including support vector regression(SVR),kernel ridge regression(KRR),random forest(RF)and Adaboost.R2 were implemented.Through 20 replicates of fivefold cross-validation(CV)and one prediction for younger individuals,the utility of ML methods in genomic prediction was explored.In CV,compared with genomic BLUP(GBLUP),single-step GBLUP(ssGBLUP)and the Bayesian method BayesHE,ML methods significantly outperformed these conventional methods.ML methods improved the genomic prediction accuracy of GBLUP,ssGBLUP,and BayesHE by 19.3%,15.0% and 20.8%,respectively.In addition,ML methods yielded smaller mean squared error(MSE)and mean absolute error(MAE)in all scenarios.ssGBLUP yielded an improvement of 3.8% on average in accuracy compared to that of GBLUP,and the accuracy of BayesHE was close to that of GBLUP.In genomic prediction of younger individuals,RF and Adaboost.R2_KRR performed better than GBLUP and BayesHE,while ssGBLUP performed comparably with RF,and ssGBLUP yielded slightly higher accuracy and lower MSE than Adaboost.R2_KRR in the prediction of total number of piglets born,while for number of piglets born alive,Adaboost.R2_KRR performed significantly better than ssGBLUP.Among ML methods,Adaboost.R2_KRR consistently performed well in our study.Our findings also demonstrated that optimal hyperparameters are useful for ML methods.After tuning hyperparameters in CV and in predicting genomic outcomes of younger individuals,the average improvement was 14.3% and 21.8% over those using default hyperparameters,respectively.Conclusion:Our findings demonstrated that ML methods had better overall prediction performance than conventional genomic selection methods,and could be new options for genomic prediction.Among ML methods,Adaboost.R2_KRR consistently performed well in our study,and tuning hyperparameters is necessary for ML methods.The optimal hyperparameters depend on the character of traits,datasets etc.
基金This study was funded by the Genomic Selection in Animals and Plants(GenSAP)research project financed by the Danish Council of Strategic Research(Aarhus,Denmark).Xiao Wang received Ph.D.stipends from the Technical University of Denmark(DTU Bioinformatics and DTU Compute),Denmark,and the China Scholarship Council,China.
文摘Background:Genotyping by sequencing(GBS)still has problems with missing genotypes.Imputation is important for using GBS for genomic predictions,especially for low depths,due to the large number of missing genotypes.Minor allele frequency(MAF)is widely used as a marker data editing criteria for genomic predictions.In this study,three imputation methods(Beagle,IMPUTE2 and FImpute software)based on four MAF editing criteria were investigated with regard to imputation accuracy of missing genotypes and accuracy of genomic predictions,based on simulated data of livestock population.Results:Four MAFs(no MAF limit,MAF≥0.001,MAF≥0.01 and MAF≥0.03)were used for editing marker data before imputation.Beagle,IMPUTE2 and FImpute software were applied to impute the original GBS.Additionally,IMPUTE2 also imputed the expected genotype dosage after genotype correction(GcIM).The reliability of genomic predictions was calculated using GBS and imputed GBS data.The results showed that imputation accuracies were the same for the three imputation methods,except for the data of sequencing read depth(depth)=2,where FImpute had a slightly lower imputation accuracy than Beagle and IMPUTE2.GcIM was observed to be the best for all of the imputations at depth=4,5 and 10,but the worst for depth=2.For genomic prediction,retaining more SNPs with no MAF limit resulted in higher reliability.As the depth increased to 10,the prediction reliabilities approached those using true genotypes in the GBS loci.Beagle and IMPUTE2 had the largest increases in prediction reliability of 5 percentage points,and FImpute gained 3 percentage points at depth=2.The best prediction was observed at depth=4,5 and 10 using GcIM,but the worst prediction was also observed using GcIM at depth=2.Conclusions:The current study showed that imputation accuracies were relatively low for GBS with low depths and high for GBS with high depths.Imputation resulted in larger gains in the reliability of genomic predictions for GBS with lower depths.These results suggest that the application of IMPUTE2,based on a corrected GBS(GcIM)to improve genomic predictions for higher depths,and FImpute software could be a good alternative for routine imputation.
基金supported by the Genomic Selection in PlantsAnimals(GenSAP)research project financed by the Danish Council of Strategic Research(Aarhus,Denmark)the scholarship provided by the China Scholarship Council(CSC)
文摘Background: Genotyping by sequencing(GBS) is a robust method to genotype markers. Many factors can influence the genotyping quality. One is that heterozygous genotypes could be wrongly genotyped as homozygotes,dependent on the genotyping depths. In this study, a method correcting this type of genotyping error was demonstrated. The efficiency of this correction method and its effect on genomic prediction were assessed using simulated data of livestock populations.Results: Chip array(Chip) and four depths of GBS data was simulated. After quality control(call rate ≥ 0.8 and MAF ≥ 0.01), the remaining number of Chip and GBS SNPs were both approximately 7,000, averaged over 10 replicates. GBS genotypes were corrected with the proposed method. The reliability of genomic prediction was calculated using GBS, corrected GBS(GBSc), true genotypes for the GBS loci(GBSr) and Chip data. The results showed that GBSc had higher rates of correct genotype calls and higher correlations with true genotypes than GBS. For genomic prediction, using Chip data resulted in the highest reliability. As the depth increased to 10, the prediction reliabilities using GBS and GBSc data approached those using true GBS data. The reliabilities of genomic prediction using GBSc data were 0.604, 0.672, 0.684 and 0.704 after genomic correction, with the improved values of 0.013, 0.009, 0.006 and 0.001 at depth = 2, 4, 5 and 10, respectively.Conclusions: The current study showed that a correction method for GBS data increased the genotype accuracies and, consequently, improved genomic predictions. These results suggest that a correction of GBS genotype is necessary, especially for the GBS data with low depths.
文摘Many rice-growing areas are affected by high concentrations of arsenic(As).Rice varieties that prevent As uptake and/or accumulation can mitigate As threats to human health.Genomic selection is known to facilitate rapid selection of superior genotypes for complex traits.We explored the predictive ability(PA)of genomic prediction with single-environment models,accounting or not for trait-specific markers,multi-environment models,and multi-trait and multi-environment models,using the genotypic(1600K SNPs)and phenotypic(grain As content,grain yield and days to flowering)data of the Bengal and Assam Aus Panel.Under the base-line single-environment model,PA of up to 0.707 and 0.654 was obtained for grain yield and grain As content,respectively;the three prediction methods(Bayesian Lasso,genomic best linear unbiased prediction and reproducing kernel Hilbert spaces)were considered to perform similarly,and marker selection based on linkage disequilibrium allowed to reduce the number of SNP to 17K,without negative effect on PA of genomic predictions.Single-environment models giving distinct weight to trait-specific markers in the genomic relationship matrix outperformed the base-line models up to 32%.Multi-environment models,accounting for genotype×environment interactions,and multi-trait and multi-environment models outperformed the base-line models by up to 47%and 61%,respectively.Among the multi-trait and multi-environment models,the Bayesian multi-output regressor stacking function obtained the highest predictive ability(0.831 for grain As)with much higher efficiency for computing time.These findings pave the way for breeding for As-tolerance in the progenies of biparental crosses involving members of the Bengal and Assam Aus Panel.Genomic prediction can also be applied to breeding for other complex traits under multiple environments.
基金supported by the National Natural Science Foundation of China(31772556)the Local Innovative and Research Teams Project of Guangdong Province(2019BT02N630)+1 种基金the grants from the earmarked fund for China Agriculture Research System(CARS-35)the Science and Technology Innovation Strategy projects of Guangdong Province(Grant No.2018B020203002).
文摘Background:Presently,multi-omics data(e.g.,genomics,transcriptomics,proteomics,and metabolomics)are available to improve genomic predictors.Omics data not only offers new data layers for genomic prediction but also provides a bridge between organismal phenotypes and genome variation that cannot be readily captured at the genome sequence level.Therefore,using multi-omics data to select feature markers is a feasible strategy to improve the accuracy of genomic prediction.In this study,simultaneously using whole-genome sequencing(WGS)and gene expression level data,four strategies for single-nucleotide polymorphism(SNP)preselection were investigated for genomic predictions in the Drosophila Genetic Reference Panel.Results:Using genomic best linear unbiased prediction(GBLUP)with complete WGS data,the prediction accuracies were 0.208±0.020(0.181±0.022)for the startle response and 0.272±0.017(0.307±0.015)for starvation resistance in the female(male)lines.Compared with GBLUP using complete WGS data,both GBLUP and the genomic feature BLUP(GFBLUP)did not improve the prediction accuracy using SNPs preselected from complete WGS data based on the results of genome-wide association studies(GWASs)or transcriptome-wide association studies(TWASs).Furthermore,by using SNPs preselected from the WGS data based on the results of the expression quantitative trait locus(eQTL)mapping of all genes,only the startle response had greater accuracy than GBLUP with the complete WGS data.The best accuracy values in the female and male lines were 0.243±0.020 and 0.220±0.022,respectively.Importantly,by using SNPs preselected based on the results of the eQTL mapping of significant genes from TWAS,both GBLUP and GFBLUP resulted in great accuracy and small bias of genomic prediction.Compared with the GBLUP using complete WGS data,the best accuracy values represented increases of 60.66%and 39.09%for the starvation resistance and 27.40%and 35.36%for startle response in the female and male lines,respectively.Conclusions:Overall,multi-omics data can assist genomic feature preselection and improve the performance of genomic prediction.The new knowledge gained from this study will enrich the use of multi-omics in genomic prediction.
基金supported by the National Natural Science Foundation of China(31201782,31672384 and 31372294)the Agricultural Science and Technology Innovation Program of Chinese Academy of Agricultural Sciences(ASTIPIAS03)+3 种基金the Cattle Breeding Innovative Research Team of Chinese Academy of Agricultural Sciences(cxgc-ias-03)the Key Technology R&D Program of China during the 12th Five-Year Plan period(2011BAD28B04)the National High Technology Research and Development Program of China(863 Program 2013AA102505-4)the Beijing Natural Science Foundation,China(6154032)
文摘Genomic selection has been demonstrated as a powerful technology to revolutionize animal breeding. However, marker density and minor allele frequency can affect the predictive ability of genomic estimated breeding values (GEBVs). To investigate the impact of marker density and minor allele frequency on predictive ability, we estimated GEBVs by constructing the different subsets of single nucleotide polymorphisms (SNPs) based on varying markers densities and minor allele frequency (MAF) for average daily gain (ADG), live weight (LW) and carcass weight (CW) in 1 059 Chinese Simmental beef cattle. Two strategies were proposed for SNP selection to construct different marker densities: 1) select evenly-spaced SNPs (Strategy 1 ), and 2) select SNPs with large effects estimated from BayesB (Strategy 2). Furthermore, predictive ability was assessed in terms of the correlation between predicted genomic values and corrected phenotypes from 10-fold cross-validation. Predictive ability for ADG, LW and CW using autosomal SNPs were 0.13+0.002, 0.21+0.003 and 0.25+0.003, respectively. In our study, the predictive ability increased dramatically as more SNPs were included in analysis until 200K for Strategy 1. Under Strategy 2, we found the predictive ability slightly increased when marker densities increased from 5K to 20K, which indicated the predictive ability of 20K (3% of 770K) SNPs with large effects was equal to the predictive ability of using all SNPs. For different MAF bins, we obtained the highest predictive ability for three traits with MAF bin 0.01-0.1. Our result suggested that designing a low-density chip by selecting low frequency markers with large SNP effects sizes should be helpful for commercial application in Chinese Simmental cattle.
基金funded by National Key Research and Development Program of China(2020YFE0202300)International Postdoctoral Exchange Fellowship Program(Talent-Introduction Program)in 2020.
文摘Germplasm conserved in gene banks is underutilized,owing mainly to the cost of characterization.Genomic prediction can be applied to predict the genetic merit of germplasm.Germplasm utilization could be greatly accelerated if prediction accuracy were sufficiently high with a training population of practical size.Large-scale resequencing projects in rice have generated high quality genome-wide variation information for many diverse accessions,making it possible to investigate the potential of genomic prediction in rice germplasm management and exploitation.We phenotyped six traits in nearly 2000 indica(XI)and japonica(GJ)accessions from the Rice 3K project and investigated different scenarios for forming training populations.A composite core training set was considered in two levels which targets used for prediction of subpopulations within subspecies or prediction across subspecies.Composite training sets incorporating 400 or 200 accessions from either subpopulation of XI or GJ showed satisfactory prediction accuracy.A composite training set of 600 XI and GJ accessions showed sufficiently high prediction accuracy for both XI and GJ subspecies.Comparable or even higher prediction accuracy was observed for the composite training set than for the corresponding homogeneous training sets comprising accessions only of specific subpopulations of XI or GJ(within-subspecies level)or pure XI or GJ accessions(across-subspecies level)that were included in the composite training set.Validation using an independent population of 281 rice cultivars supported the predictive ability of the composite training set.Reliability,which reflects the robustness of a training set,was markedly higher for the composite training set than for the corresponding homogeneous training sets.A core training set formed from diverse accessions could accurately predict the genetic merit of rice germplasm.
基金supported by the National Natural Science Foundation of China to Guo-Bo Chen(31771392)Zhejiang Provincial People’s Hospital Research Startup to Guo-Bo Chen(ZRY2018A004)。
文摘Computational efficiency has become a key issue in genomic prediction(GP) owing to the massive historical datasets accumulated. We developed hereby a new super-fast GP approach(SHEAPY) combining randomized Haseman-Elston regression(RHE-reg) with a modified Algorithm for Proven and Young(APY) in an additive-effect model, using the former to estimate heritability and then the latter to invert a large genomic relationship matrix for best linear prediction. In simulation results with varied sizes of training population, GBLUP, HEAPY|A and SHEAPY showed similar predictive performance when the size of a core population was half that of a large training population and the heritability was a fixed value, and the computational speed of SHEAPY was faster than that of GBLUP and HEAPY|A. In simulation results with varied heritability, SHEAPY showed better predictive ability than GBLUP in all cases and than HEAPY|A in most cases when the size of a core population was 4/5 that of a small training population and the training population size was a fixed value. As a proof of concept, SHEAPY was applied to the analysis of two real datasets. In an Arabidopsis thaliana F2 population, the predictive performance of SHEAPY was similar to or better than that of GBLUP and HEAPY|A in most cases when the size of a core population(2 0 0) was 2/3 of that of a small training population(3 0 0). In a sorghum multiparental population,SHEAPY showed higher predictive accuracy than HEAPY|A for all of three traits, and than GBLUP for two traits. SHEAPY may become the GP method of choice for large-scale genomic data.
基金funding within the Wheat BigData Project(German Federal Ministry of Food and Agriculture,FKZ2818408B18)。
文摘Genome-wide association mapping studies(GWAS)based on Big Data are a potential approach to improve marker-assisted selection in plant breeding.The number of available phenotypic and genomic data sets in which medium-sized populations of several hundred individuals have been studied is rapidly increasing.Combining these data and using them in GWAS could increase both the power of QTL discovery and the accuracy of estimation of underlying genetic effects,but is hindered by data heterogeneity and lack of interoperability.In this study,we used genomic and phenotypic data sets,focusing on Central European winter wheat populations evaluated for heading date.We explored strategies for integrating these data and subsequently the resulting potential for GWAS.Establishing interoperability between data sets was greatly aided by some overlapping genotypes and a linear relationship between the different phenotyping protocols,resulting in high quality integrated phenotypic data.In this context,genomic prediction proved to be a suitable tool to study relevance of interactions between genotypes and experimental series,which was low in our case.Contrary to expectations,fewer associations between markers and traits were found in the larger combined data than in the individual experimental series.However,the predictive power based on the marker-trait associations of the integrated data set was higher across data sets.Therefore,the results show that the integration of medium-sized to Big Data is an approach to increase the power to detect QTL in GWAS.The results encourage further efforts to standardize and share data in the plant breeding community.
基金This research was supported by Korea Institute for Advancement of Technology(KIAT)grant funded by the Korea Government(MOTIE)(P0012724,The Competency Development Program for Industry Specialist)the National Research Foundation of Korea(NRF)grant funded by theKorea government(MSIT)(No.RS-2023-00218176)the Soonchunhyang University Research Fund.
文摘Deep learning(DL)plays a critical role in processing and converting data into knowledge and decisions.DL technologies have been applied in a variety of applications,including image,video,and genome sequence analysis.In deep learning the most widely utilized architecture is Convolutional Neural Networks(CNN)are taught discriminatory traits in a supervised environment.In comparison to other classic neural networks,CNN makes use of a limited number of artificial neurons,therefore it is ideal for the recognition and processing of wheat gene sequences.Wheat is an essential crop of cereals for people around the world.Wheat Genotypes identification has an impact on the possible development of many countries in the agricultural sector.In quantitative genetics prediction of genetic values is a central issue.Wheat is an allohexaploid(AABBDD)with three distinct genomes.The sizes of the wheat genome are quite large compared to many other kinds and the availability of a diversity of genetic knowledge and normal structure at breeding lines of wheat,Therefore,genome sequence approaches based on techniques of Artificial Intelligence(AI)are necessary.This paper focuses on using the Wheat genome sequence will assist wheat producers in making better use of their genetic resources and managing genetic variation in their breeding program,as well as propose a novel model based on deep learning for offering a fundamental overview of genomic prediction theory and current constraints.In this paper,the hyperparameters of the network are optimized in the CNN to decrease the requirement for manual search and enhance network performance using a new proposed model built on an optimization algorithm and Convolutional Neural Networks(CNN).
基金supported by the earmarked fund for China Agriculture Research System(CARS-36)the National Natural Science Foundation of China(31671327,31701077,31371258)+2 种基金the Program for Changjiang Scholar and Innovation Research Team in University(Grant No.IRT1191)Anhui Science and Technology Key Project(17030701008)Anhui Academy of Agricultural Sciences Key Laboratory Project(18S0404)
文摘In genomic selection, prediction accuracy is highly driven by the size of animals in the reference population(RP).Combining related populations from different countries and regions or using a related population with large size of RP has been considered to be viable strategies in cattle breeding. The genetic relationship between related populations is important for improving the genomic predictive ability. In this study, we used 122 French bulls as test individuals. The genomic estimated breeding values(GEBVs) evaluated using French RP, America RP and Chinese RP were compared.The results showed that the GEBVs were in higher concordance using French RP and American RP compared with using Chinese population. The persistence analysis, kinship analysis and the principal component analysis(PCA) were performed for 270 French bulls, 270 American bulls and 270 Chinese bulls to interpret the results. All the analyses illustrated that the genetic relationship between French bulls and American bulls was closer compared with Chinese bulls. Another reason could be the size of RP in China was smaller than the other two RPs. In conclusion, using RP of a related population to predict GEBVs of the animals in a target population is feasible when these two populations have a close genetic relationship and the related population is large.
基金STI 2030-Major Projects(2023ZD04069)the National Natural Science Foundation of China(grant no.32125030)+1 种基金the Pinduoduo-China Agricultural University Research Fund(PC2023A01003)the Major Program of the National Agricultural Science and Technology of China(NK20220601).
文摘Wheat is a staple foodfor more than 35%of the world's population,with wheatflourused to make hundreds of baked goods.Superior end-use quality is a major breeding target;however,improving it is especially time-consuming and expensive.Furthermore,genes encoding seed-storage proteins(ssPs)form multigene families and are repetitive,with gaps commonplace in several genome assemblies.To overcome these barriers and efficiently identify superior wheat SSP alleles,we developed"PanSK"(Pan-SSP k-mer)for genotype-to-phenotype prediction based on an SsP-based pangenome resource.PanSK uses 29-mer sequences that represent each ssP gene at the pangenomic level to reveal untapped diversity across landraces and modern cultivars.Genome-wide association studies with k-mers identified 23 Ssp genes associated with end-use quality that represent novel targets for improvement.We evaluated the effect of rye secalin genes on end-use quality and found that removal of w-secalins from 1BL/1RS wheat translocation lines is associated with enhanced end-use quality.Finally,using machine-learning-based prediction inspired by PanSK,we predicted the quality phenotypes with high accuracy from genotypes alone.This study provides an effective approach for genome design based on ssP genes,enabling the breeding of wheat varieties with superior processing capabilities and improved end-use quality.
基金supported by SLU Grogrund(#SLU-LTV.2020.1.1.1-654)an Einar and Inga Nilsson Foundation grant.J.I.y.S.was supported by grant PID2021-123718OB-I00+4 种基金funded by MCIN/AEI/10.13039/501100011033by“ERDF A way of making Europe,”CEX2020-000999-S.R.R.V.supported by Novo Nordisk Fonden(0074727)SLU’s Centre for Biological ControlIn addition,J.I.y.S.and J.F.-G.were supported by the Beatriz Galindo Program BEAGAL 18/00115.
文摘Genomic selection,the application of genomic prediction(GP)models to select candidate individuals,has significantly advanced in the past two decades,effectively accelerating genetic gains in plant breeding.This article provides a holistic overview of key factors that have influenced GP in plant breeding during this period.We delved into the pivotal roles of training population size and genetic diversity,and their relationship with the breeding population,in determining GP accuracy.Special emphasis was placed on optimizing training population size.We explored its benefits and the associated diminishing returns beyond an optimum size.This was done while considering the balance between resource allocation and maximizing prediction accuracy through current optimization algorithms.The density and distribution of single-nucleotide polymorphisms,level of linkage disequilibrium,genetic complexity,trait heritability,statistical machine-learning methods,and non-additive effects are the other vital factors.Using wheat,maize,and potato as examples,we summarize the effect of these factors on the accuracy of GP for various traits.The search for high accuracy in GP—theoretically reaching one when using the Pearson’s correlation as a metric—is an active research area as yet far from optimal for various traits.We hypothesize that with ultra-high sizes of genotypic and phenotypic datasets,effective training population optimization methods and support from other omics approaches(transcriptomics,metabolomics and proteomics)coupled with deep-learning algorithms could overcome the boundaries of current limitations to achieve the highest possible prediction accuracy,making genomic selection an effective tool in plant breeding.