Population aging has become a major challenge for the healthcare in China. More than 23 million Chinese are cur- rently ≥ 80 years, with an annual increase of 5%. The Chi- nese population of 80 years or older is expe...Population aging has become a major challenge for the healthcare in China. More than 23 million Chinese are cur- rently ≥ 80 years, with an annual increase of 5%. The Chi- nese population of 80 years or older is expected to reach 30.67 million by 2020 and 74 million by 2040.展开更多
Multilocus genome-wide association study has become the state-of-the-art tool for dissecting the genetic architecture of complex and multiomic traits.However,most existing multilocus methods require relatively long co...Multilocus genome-wide association study has become the state-of-the-art tool for dissecting the genetic architecture of complex and multiomic traits.However,most existing multilocus methods require relatively long computational time when analyzing large datasets.To address this issue,in this study,we proposed a fast mrMLM method,namely,best linear unbiased prediction multilocus random-SNP-effect mixed linear model(BLUPmrMLM).First,genome-wide single-marker scanning in mrMLM was replaced by vectorized Wald tests based on the best linear unbiased prediction(BLUP)values of marker effects and their variances in BLUPmrMLM.Then,adaptive best subset selection(ABESS)was used to identify potentially associated markers on each chromosome to reduce computational time when estimating marker effects via empirical Bayes.Finally,shared memory and parallel computing schemes were used to reduce the computational time.In simulation studies,BLUPmrMLM outperformed GEMMA,EMMAX,mrMLM,and FarmCPU as well as the control method(BLUPmrMLM with ABESS removed),in terms of computational time,power,accuracy for estimating quantitative trait nucleotide positions and effects,false positive rate,false discovery rate,false negative rate,and F1 score.In the reanalysis of two large rice datasets,BLUPmrMLM significantly reduced the computational time and identified more previously reported genes,compared with the aforementioned methods.This study provides an excellent multilocus model method for the analysis of large-scale and multiomic datasets.The software mrMLM v5.1 is available at BioCode(https://ngdc.cncb.ac.cn/biocode/tool/BT007388)or GitHub(https://github.com/YuanmingZhang65/mrMLM).展开更多
Deciphering the genetic mechanisms underlying agronomic traits is of great importance for crop improvement. Most of these traits are controlled by multiple quantitative trait loci (QTLs), and identifying the underlyin...Deciphering the genetic mechanisms underlying agronomic traits is of great importance for crop improvement. Most of these traits are controlled by multiple quantitative trait loci (QTLs), and identifying the underlying genes by conventional QTL fine-mapping is time-consuming and labor-intensive. Here, we devised a new method, named quantitative trait gene sequencing (QTG-seq), to accelerate QTL fine-mapping. QTGseq combines QTL partitioning to convert a quantitative trait into a near-qualitative trait, sequencing of bulked segregant pools from a large segregating population, and the use of a robust new algorithm for identifying candidate genes. Using QTG-seq, we fine-mapped a plant-height QTL in maize (Zea mays L.), qPH7, to a 300-kb genomic interval and verified that a gene encoding an NF-YC transcription factor was the functional gene. Functional analysis suggested that qPH7-encoding protein might influence plant height by interacting with a CO-like protein and an AP2 domain-containing protein. Selection footprint ana卜 ysis indicated that qPH7 was subject to strong selection during maize improvement. In summary, QTG-seq provides an efficient method for QTL fine-mapping in the era of “big data".展开更多
Previous studies have reported that some important loci are missed in single-locus genome-wide association studies(GWAS),especially because of the large phenotypic error in field experiments.To solve this issue,multi-...Previous studies have reported that some important loci are missed in single-locus genome-wide association studies(GWAS),especially because of the large phenotypic error in field experiments.To solve this issue,multi-locus GWAS methods have been recommended.However,only a few software packages for multi-locus GWAS are available.Therefore,we developed an R software named mr MLM v4.0.2.This software integrates mr MLM,FASTmr MLM,FASTmr EMMA,p LARm EB,p KWm EB,and ISIS EM-BLASSO methods developed by our lab.There are four components in mr MLM v4.0.2,including dataset input,parameter setting,software running,and result output.The fread function in data.table is used to quickly read datasets,especially big datasets,and the do Parallel package is used to conduct parallel computation using multiple CPUs.In addition,the graphical user interface software mr MLM.GUI v4.0.2,built upon Shiny,is also available.To confirm the correctness of the aforementioned programs,all the methods in mr MLM v4.0.2 and three widely-used methods were used to analyze real and simulated datasets.The results confirm the superior performance of mr MLM v4.0.2 to other methods currently available.False positive rates are effectively controlled,albeit with a less stringent significance threshold.mr MLM v4.0.2 is publicly available at Bio Code(https://bigd.big.ac.cn/biocode/tools/BT007077)or R(https://cran.r-project.org/web/packages/mr MLM.GUI/index.html)as an open-source software.展开更多
Although genome-wide association studies are widely used to mine genes for quantitative traits,the effects to be estimated are confounded,and the methodologies for detecting interactions are imperfect.To address these...Although genome-wide association studies are widely used to mine genes for quantitative traits,the effects to be estimated are confounded,and the methodologies for detecting interactions are imperfect.To address these issues,the mixed model proposed here first estimates the genotypic effects for AA,Aa,and aa,and the genotypic polygenic background replaces additive and dominance polygenic backgrounds.Then,the estimated genotypic effects are partitioned into additive and dominance effects using a one-way analysis of variance model.This strategy was further expanded to cover QTN-by-environment interactions(QEIs)and QTN-by-QTN interactions(QQIs)using the same mixed-model framework.Thus,a three-variance-component mixed model was integrated with our multi-locus random-SNP-effect mixed linear model(mrMLM)method to establish a new methodological framework,3VmrMLM,that detects all types of loci and estimates their effects.In Monte Carlo studies,3VmrMLM correctly detected all types of loci and almost unbiasedly estimated their effects,with high powers and accuracies and a low false positive rate.In re-analyses of 10 traits in 1439 rice hybrids,detection of 269 known genes,45 known gene-by-environment interactions,and 20 known gene-by-gene interactions strongly validated 3VmrMLM.Further analyses of known genes showed more small(67.49%),minor-allele-frequency(35.52%),and pleiotropic(30.54%)genes,with higher repeatability across datasets(54.36%)and more dominance loci.In addition,a heteroscedasticity mixed model in multiple environments and dimension reduction methods in quite a number of environments were developed to detect QEIs,and variable selection under a polygenic background was proposed for QQI detection.This study provides a new approach for revealing the genetic architecture of quantitative traits.展开更多
Theoretical and applied studies demonstrate the difficulty of detecting extremely over-dominant and smalleffect genes for quantitative traits via bulked segregant analysis(BSA)in an F_(2)population.To address this iss...Theoretical and applied studies demonstrate the difficulty of detecting extremely over-dominant and smalleffect genes for quantitative traits via bulked segregant analysis(BSA)in an F_(2)population.To address this issue,we proposed an integrated strategy for mapping various types of quantitative trait loci(QTLs)for quantitative traits via a combination of BSA and whole-genome sequencing.In this strategy,the numbers of read counts of marker alleles in two extreme pools were used to predict the numbers of read counts of marker genotypes.These observed and predicted numbers were used to construct a new statistic,G_(w),for detecting quantitative trait genes(QTGs),and the method was named dQTG-seq1.This method was significantly better than existing BSA methods.If the goal was to identify extremely over-dominant and smalleffect genes,another reserved DNA/RNA sample from each extreme phenotype F_(2)plant was sequenced,and the observed numbers of marker alleles and genotypes were used to calculate G_(w)to detect QTGs;this method was named dQTG-seq2.In simulated and real rice dataset analyses,dQTG-seq2 could identify many more extremely over-dominant and small-effect genes than BSA and QTL mapping methods.dQTGseq2 may be extended to other heterogeneous mapping populations.The significance threshold of G_(w)in this study was determined by permutation experiments.In addition,a handbook for the R software dQTG.seq,which is available at https://cran.r-project.org/web/packages/dQTG.seq/index.html,has been provided in the supplemental materials for the users’convenience.This study provides a new strategy for identifying all types of QTLs for quantitative traits in an F_(2)population.展开更多
Inmost existingmethods and softwares of genome-wide association studies(GWAS)fordetecting quantitative trait nucleotides(QTNs),QTN-by-environment interactions(QEls),and QTN-by-QTN interactions(QQIs),only the allele su...Inmost existingmethods and softwares of genome-wide association studies(GWAS)fordetecting quantitative trait nucleotides(QTNs),QTN-by-environment interactions(QEls),and QTN-by-QTN interactions(QQIs),only the allele substitution effect and its interaction-related effects are detected and estimated,conditional on method-specific polygenic background control,leading to confounding in effect estimation and insufficient polygenic background control(Li et al.,2022;Supplemental Tables 1-3).展开更多
文摘Population aging has become a major challenge for the healthcare in China. More than 23 million Chinese are cur- rently ≥ 80 years, with an annual increase of 5%. The Chi- nese population of 80 years or older is expected to reach 30.67 million by 2020 and 74 million by 2040.
基金supported by the National Natural Science Foundation of China(Grant Nos.32070557 and 32270673)the Huazhong Agricultural University Scientific&Technological Self-innovation Foundation,China(Grant No.2014RC020).
文摘Multilocus genome-wide association study has become the state-of-the-art tool for dissecting the genetic architecture of complex and multiomic traits.However,most existing multilocus methods require relatively long computational time when analyzing large datasets.To address this issue,in this study,we proposed a fast mrMLM method,namely,best linear unbiased prediction multilocus random-SNP-effect mixed linear model(BLUPmrMLM).First,genome-wide single-marker scanning in mrMLM was replaced by vectorized Wald tests based on the best linear unbiased prediction(BLUP)values of marker effects and their variances in BLUPmrMLM.Then,adaptive best subset selection(ABESS)was used to identify potentially associated markers on each chromosome to reduce computational time when estimating marker effects via empirical Bayes.Finally,shared memory and parallel computing schemes were used to reduce the computational time.In simulation studies,BLUPmrMLM outperformed GEMMA,EMMAX,mrMLM,and FarmCPU as well as the control method(BLUPmrMLM with ABESS removed),in terms of computational time,power,accuracy for estimating quantitative trait nucleotide positions and effects,false positive rate,false discovery rate,false negative rate,and F1 score.In the reanalysis of two large rice datasets,BLUPmrMLM significantly reduced the computational time and identified more previously reported genes,compared with the aforementioned methods.This study provides an excellent multilocus model method for the analysis of large-scale and multiomic datasets.The software mrMLM v5.1 is available at BioCode(https://ngdc.cncb.ac.cn/biocode/tool/BT007388)or GitHub(https://github.com/YuanmingZhang65/mrMLM).
基金the National Key Research and Development Program of China (2016YFD0100404)the National Basic Research Program of China (2014CB138200)+4 种基金the National Natural Science Foundation of China (91735305,1571268)the Fundamental Research Funds of the Central Non-profit Scientific Institution (Y2018LM04)the Xinjiang Key R&D Program (2018B01006-3)and the Huazhong Agricultural University Scientific & Technological Self-innovation Foundation (2662016PY096014RC020).This research was also partly supported by the open funds of the National Key Laboratory of Crop Genetic Improvement.
文摘Deciphering the genetic mechanisms underlying agronomic traits is of great importance for crop improvement. Most of these traits are controlled by multiple quantitative trait loci (QTLs), and identifying the underlying genes by conventional QTL fine-mapping is time-consuming and labor-intensive. Here, we devised a new method, named quantitative trait gene sequencing (QTG-seq), to accelerate QTL fine-mapping. QTGseq combines QTL partitioning to convert a quantitative trait into a near-qualitative trait, sequencing of bulked segregant pools from a large segregating population, and the use of a robust new algorithm for identifying candidate genes. Using QTG-seq, we fine-mapped a plant-height QTL in maize (Zea mays L.), qPH7, to a 300-kb genomic interval and verified that a gene encoding an NF-YC transcription factor was the functional gene. Functional analysis suggested that qPH7-encoding protein might influence plant height by interacting with a CO-like protein and an AP2 domain-containing protein. Selection footprint ana卜 ysis indicated that qPH7 was subject to strong selection during maize improvement. In summary, QTG-seq provides an efficient method for QTL fine-mapping in the era of “big data".
基金supported by the National Natural Science Foundation of China(Grant Nos.31871242,U1602261,31701071,21873034,and 31571268)the Huazhong Agricultural University Scientific&Technological Self-innovation Foundation,China(Grant No.2014RC020)the State Key Laboratory of Cotton Biology Open Fund,China(Grant No.CB2019B01)
文摘Previous studies have reported that some important loci are missed in single-locus genome-wide association studies(GWAS),especially because of the large phenotypic error in field experiments.To solve this issue,multi-locus GWAS methods have been recommended.However,only a few software packages for multi-locus GWAS are available.Therefore,we developed an R software named mr MLM v4.0.2.This software integrates mr MLM,FASTmr MLM,FASTmr EMMA,p LARm EB,p KWm EB,and ISIS EM-BLASSO methods developed by our lab.There are four components in mr MLM v4.0.2,including dataset input,parameter setting,software running,and result output.The fread function in data.table is used to quickly read datasets,especially big datasets,and the do Parallel package is used to conduct parallel computation using multiple CPUs.In addition,the graphical user interface software mr MLM.GUI v4.0.2,built upon Shiny,is also available.To confirm the correctness of the aforementioned programs,all the methods in mr MLM v4.0.2 and three widely-used methods were used to analyze real and simulated datasets.The results confirm the superior performance of mr MLM v4.0.2 to other methods currently available.False positive rates are effectively controlled,albeit with a less stringent significance threshold.mr MLM v4.0.2 is publicly available at Bio Code(https://bigd.big.ac.cn/biocode/tools/BT007077)or R(https://cran.r-project.org/web/packages/mr MLM.GUI/index.html)as an open-source software.
基金supported by the National Natural Science Foundation of China(32070557 and 31871242)the Fundamental Research Funds for the Central Universities(2662020ZKPY017)+1 种基金the Huazhong Agricultural University Scientific&Technological Self-Innovation Foundation(2014RC020)the State Key Laboratory of Cotton Biology Open Fund(CB2021B01).
文摘Although genome-wide association studies are widely used to mine genes for quantitative traits,the effects to be estimated are confounded,and the methodologies for detecting interactions are imperfect.To address these issues,the mixed model proposed here first estimates the genotypic effects for AA,Aa,and aa,and the genotypic polygenic background replaces additive and dominance polygenic backgrounds.Then,the estimated genotypic effects are partitioned into additive and dominance effects using a one-way analysis of variance model.This strategy was further expanded to cover QTN-by-environment interactions(QEIs)and QTN-by-QTN interactions(QQIs)using the same mixed-model framework.Thus,a three-variance-component mixed model was integrated with our multi-locus random-SNP-effect mixed linear model(mrMLM)method to establish a new methodological framework,3VmrMLM,that detects all types of loci and estimates their effects.In Monte Carlo studies,3VmrMLM correctly detected all types of loci and almost unbiasedly estimated their effects,with high powers and accuracies and a low false positive rate.In re-analyses of 10 traits in 1439 rice hybrids,detection of 269 known genes,45 known gene-by-environment interactions,and 20 known gene-by-gene interactions strongly validated 3VmrMLM.Further analyses of known genes showed more small(67.49%),minor-allele-frequency(35.52%),and pleiotropic(30.54%)genes,with higher repeatability across datasets(54.36%)and more dominance loci.In addition,a heteroscedasticity mixed model in multiple environments and dimension reduction methods in quite a number of environments were developed to detect QEIs,and variable selection under a polygenic background was proposed for QQI detection.This study provides a new approach for revealing the genetic architecture of quantitative traits.
基金supported by National Natural Science Foundation of China(42007099,U2003214 and 41977099)West Light Foundation of The Chinese Academy of Sciences(2018-XBQNXz-B-016).
基金This work was supported by the National Natural Science Foundation of China(32070557 and 31871242)the Fundamental Research Funds for the Central Universities(2662020ZKPY017)the Huazhong Agricul-tural University Scientific and Technological Self-Innovation Foundation(2014RC020).
文摘Theoretical and applied studies demonstrate the difficulty of detecting extremely over-dominant and smalleffect genes for quantitative traits via bulked segregant analysis(BSA)in an F_(2)population.To address this issue,we proposed an integrated strategy for mapping various types of quantitative trait loci(QTLs)for quantitative traits via a combination of BSA and whole-genome sequencing.In this strategy,the numbers of read counts of marker alleles in two extreme pools were used to predict the numbers of read counts of marker genotypes.These observed and predicted numbers were used to construct a new statistic,G_(w),for detecting quantitative trait genes(QTGs),and the method was named dQTG-seq1.This method was significantly better than existing BSA methods.If the goal was to identify extremely over-dominant and smalleffect genes,another reserved DNA/RNA sample from each extreme phenotype F_(2)plant was sequenced,and the observed numbers of marker alleles and genotypes were used to calculate G_(w)to detect QTGs;this method was named dQTG-seq2.In simulated and real rice dataset analyses,dQTG-seq2 could identify many more extremely over-dominant and small-effect genes than BSA and QTL mapping methods.dQTGseq2 may be extended to other heterogeneous mapping populations.The significance threshold of G_(w)in this study was determined by permutation experiments.In addition,a handbook for the R software dQTG.seq,which is available at https://cran.r-project.org/web/packages/dQTG.seq/index.html,has been provided in the supplemental materials for the users’convenience.This study provides a new strategy for identifying all types of QTLs for quantitative traits in an F_(2)population.
基金This work was supported by the National Natural Science Foundation of China(32070557,31871242)the Huazhong Agricultural University Scientific&Technological Self-InnovationFoundation(2014RC020).
文摘Inmost existingmethods and softwares of genome-wide association studies(GWAS)fordetecting quantitative trait nucleotides(QTNs),QTN-by-environment interactions(QEls),and QTN-by-QTN interactions(QQIs),only the allele substitution effect and its interaction-related effects are detected and estimated,conditional on method-specific polygenic background control,leading to confounding in effect estimation and insufficient polygenic background control(Li et al.,2022;Supplemental Tables 1-3).