In the past few years, genome-wide association study (GWAS) has made great successes in identifying genetic susceptibility loci underlying many complex diseases and traits. The findings provide important genetic ins...In the past few years, genome-wide association study (GWAS) has made great successes in identifying genetic susceptibility loci underlying many complex diseases and traits. The findings provide important genetic insights into understanding pathogenesis of diseases. In this paper, we present an overview of widely used approaches and strategies for analysis of GWAS, offered a general consideration to deal with GWAS data. The issues regarding data quality control, population structure, association analysis, multiple comparison and visual presentation of GWAS results are discussed; other advanced topics including the issue of missing heritability, meta-analysis, setbased association analysis, copy number variation analysis and GWAS cohort analysis are also briefly introduced.展开更多
Although genome-wide association studies are widely used to mine genes for quantitative traits,the effects to be estimated are confounded,and the methodologies for detecting interactions are imperfect.To address these...Although genome-wide association studies are widely used to mine genes for quantitative traits,the effects to be estimated are confounded,and the methodologies for detecting interactions are imperfect.To address these issues,the mixed model proposed here first estimates the genotypic effects for AA,Aa,and aa,and the genotypic polygenic background replaces additive and dominance polygenic backgrounds.Then,the estimated genotypic effects are partitioned into additive and dominance effects using a one-way analysis of variance model.This strategy was further expanded to cover QTN-by-environment interactions(QEIs)and QTN-by-QTN interactions(QQIs)using the same mixed-model framework.Thus,a three-variance-component mixed model was integrated with our multi-locus random-SNP-effect mixed linear model(mrMLM)method to establish a new methodological framework,3VmrMLM,that detects all types of loci and estimates their effects.In Monte Carlo studies,3VmrMLM correctly detected all types of loci and almost unbiasedly estimated their effects,with high powers and accuracies and a low false positive rate.In re-analyses of 10 traits in 1439 rice hybrids,detection of 269 known genes,45 known gene-by-environment interactions,and 20 known gene-by-gene interactions strongly validated 3VmrMLM.Further analyses of known genes showed more small(67.49%),minor-allele-frequency(35.52%),and pleiotropic(30.54%)genes,with higher repeatability across datasets(54.36%)and more dominance loci.In addition,a heteroscedasticity mixed model in multiple environments and dimension reduction methods in quite a number of environments were developed to detect QEIs,and variable selection under a polygenic background was proposed for QQI detection.This study provides a new approach for revealing the genetic architecture of quantitative traits.展开更多
Previous studies have reported that some important loci are missed in single-locus genome-wide association studies(GWAS),especially because of the large phenotypic error in field experiments.To solve this issue,multi-...Previous studies have reported that some important loci are missed in single-locus genome-wide association studies(GWAS),especially because of the large phenotypic error in field experiments.To solve this issue,multi-locus GWAS methods have been recommended.However,only a few software packages for multi-locus GWAS are available.Therefore,we developed an R software named mr MLM v4.0.2.This software integrates mr MLM,FASTmr MLM,FASTmr EMMA,p LARm EB,p KWm EB,and ISIS EM-BLASSO methods developed by our lab.There are four components in mr MLM v4.0.2,including dataset input,parameter setting,software running,and result output.The fread function in data.table is used to quickly read datasets,especially big datasets,and the do Parallel package is used to conduct parallel computation using multiple CPUs.In addition,the graphical user interface software mr MLM.GUI v4.0.2,built upon Shiny,is also available.To confirm the correctness of the aforementioned programs,all the methods in mr MLM v4.0.2 and three widely-used methods were used to analyze real and simulated datasets.The results confirm the superior performance of mr MLM v4.0.2 to other methods currently available.False positive rates are effectively controlled,albeit with a less stringent significance threshold.mr MLM v4.0.2 is publicly available at Bio Code(https://bigd.big.ac.cn/biocode/tools/BT007077)or R(https://cran.r-project.org/web/packages/mr MLM.GUI/index.html)as an open-source software.展开更多
Backfat thickness is a good predictor of carcass lean content,an economically important trait,and a main breeding target in pig improvement.In this study,the candidate genes and genomic regions associated with the ten...Backfat thickness is a good predictor of carcass lean content,an economically important trait,and a main breeding target in pig improvement.In this study,the candidate genes and genomic regions associated with the tenth rib backfat thickness trait were identified in two independent pig populations,using a genome-wide association study of porcine 60K SNP genotype data applying the compressed mixed linear model(CMLM)statistical method.For each population,30 most significant single-nucleotide polymorphisms(SNPs)were selected and SNP annotation implemented using Sus scrofa Build 10.2.In the first population,25 significant SNPs were distributed on seven chromosomes,and SNPs on SSC1 and SSC7 showed great significance for fat deposition.The most significant SNP(ALGA0006623)was located on SSC1,upstream of the MC4R gene.In the second population,27 significant SNPs were recognized by annotation,and 12 SNPs on SSC12 were related to fat deposition.Two haplotype blocks,M1GA0016251-MARC0075799 and ALGA0065251-MARC0014203-M1GA0016298-ALGA0065308,were detected in significant regions where the PIPNC1 and GH1 genes were identified as contributing to fat metabolism.The results indicated that genetic mechanism regulating backfat thickness is complex,and that genome-wide associations can be affected by populations with different genetic backgrounds.展开更多
The Chinese tree shrew(Tupaia belangeri chinensis)has emerged as a promising model for investigating adrenal steroid synthesis,but it is unclear whether the same cells produce steroid hormones and whether their produc...The Chinese tree shrew(Tupaia belangeri chinensis)has emerged as a promising model for investigating adrenal steroid synthesis,but it is unclear whether the same cells produce steroid hormones and whether their production is regulated in the same way as in humans.Here,we comprehensively mapped the cell types and pathways of steroid metabolism in the adrenal gland of Chinese tree shrews using single-cell RNA sequencing,spatial transcriptome analysis,mass spectrometry,and immunohistochemistry.We compared the transcriptomes of various adrenal cell types across tree shrews,humans,macaques,and mice.Results showed that tree shrew adrenal glands expressed many of the same key enzymes for steroid synthesis as humans,including CYP11B2,CYP11B1,CYB5A,and CHGA.Biochemical analysis confirmed the production of aldosterone,cortisol,and dehydroepiandrosterone but not dehydroepiandrosterone sulfate in the tree shrew adrenal glands.Furthermore,genes in adrenal cell types in tree shrews were correlated with genetic risk factors for polycystic ovary syndrome,primary aldosteronism,hypertension,and related disorders in humans based on genome-wide association studies.Overall,this study suggests that the adrenal glands of Chinese tree shrews may consist of closely related cell populations with functional similarity to those of the human adrenal gland.Our comprehensive results(publicly available at http://gxmujyzmolab.cn:16245/scAGMap/)should facilitate the advancement of this animal model for the investigation of adrenal gland disorders.展开更多
Multilocus genome-wide association study has become the state-of-the-art tool for dissecting the genetic architecture of complex and multiomic traits.However,most existing multilocus methods require relatively long co...Multilocus genome-wide association study has become the state-of-the-art tool for dissecting the genetic architecture of complex and multiomic traits.However,most existing multilocus methods require relatively long computational time when analyzing large datasets.To address this issue,in this study,we proposed a fast mrMLM method,namely,best linear unbiased prediction multilocus random-SNP-effect mixed linear model(BLUPmrMLM).First,genome-wide single-marker scanning in mrMLM was replaced by vectorized Wald tests based on the best linear unbiased prediction(BLUP)values of marker effects and their variances in BLUPmrMLM.Then,adaptive best subset selection(ABESS)was used to identify potentially associated markers on each chromosome to reduce computational time when estimating marker effects via empirical Bayes.Finally,shared memory and parallel computing schemes were used to reduce the computational time.In simulation studies,BLUPmrMLM outperformed GEMMA,EMMAX,mrMLM,and FarmCPU as well as the control method(BLUPmrMLM with ABESS removed),in terms of computational time,power,accuracy for estimating quantitative trait nucleotide positions and effects,false positive rate,false discovery rate,false negative rate,and F1 score.In the reanalysis of two large rice datasets,BLUPmrMLM significantly reduced the computational time and identified more previously reported genes,compared with the aforementioned methods.This study provides an excellent multilocus model method for the analysis of large-scale and multiomic datasets.The software mrMLM v5.1 is available at BioCode(https://ngdc.cncb.ac.cn/biocode/tool/BT007388)or GitHub(https://github.com/YuanmingZhang65/mrMLM).展开更多
全基因组关联分析(genome-wide association study,GWAS)是定位基因组中与性状显著关联的变异位点的有效方法。随着表型记录的完善、高通量基因型分型技术的发展,以及统计方法的改进,全基因组关联分析在人类疾病、动物植物遗传等领域得...全基因组关联分析(genome-wide association study,GWAS)是定位基因组中与性状显著关联的变异位点的有效方法。随着表型记录的完善、高通量基因型分型技术的发展,以及统计方法的改进,全基因组关联分析在人类疾病、动物植物遗传等领域得到了广泛的应用。假阳性是影响全基因组关联分析结果可靠性的重要因素之一。为了控制假阳性,除了校正P值,GWAS模型从最简单的方差分析(或用于质量性状的卡方检验)到加入固定效应协变量的普通线性模型(general linear model,GLM),再到加入随机效应的混合线性模型(mixed linear model,MLM)持续改进,控制了多种混杂因素导致的假阳性。将个体的遗传效应拟合为由基因组亲缘关系矩阵(genomic relationships matrix,GRM)定义的随机效应是目前常用的方法。由于MLM的参数估计大量消耗计算资源,研究人员不断尝试模型求解优化和GRM的构建优化(GRM的构建优化同时也提高了计算效率),最终将基于MLM计算的时间复杂度由O(MN3)逐步改进到O(MN),实现了计算速度与统计功效的飞跃。针对质量性状病例对照比失衡带来的假阳性问题,研究人员进一步对广义混合线性模型(generalized linear mixed model,GLMM)进行了校正。本文较全面地介绍了GWAS的基本原理和发展,着重阐述了GWAS中MLM模型的改进和优化细节,同时,列举了GWAS在农业中的应用,包括在植物、动物和微生物方面的研究成果,以及基于单倍型的GWAS应用。最后,从进一步提高GWAS统计功效和GWAS试验设计2个角度对GWAS未来的发展进行了展望。展开更多
多位点关联分析在人和动植物遗传研究中的应用日益广泛。本文综述了以混合线性模型(mixed linear model,MLM)为框架下多位点关联分析的主要方法及重要软件平台,包括全基因组关联分析(genome-wide association study,GWAS)混合线性模型...多位点关联分析在人和动植物遗传研究中的应用日益广泛。本文综述了以混合线性模型(mixed linear model,MLM)为框架下多位点关联分析的主要方法及重要软件平台,包括全基因组关联分析(genome-wide association study,GWAS)混合线性模型方法学的建立与发展,多位点模型方法的发展,多位点GWAS混合线性模型方法的发展,以及GWAS方法学研究的影响因素,最后展望了关联分析的发展方向。展开更多
It is well-known that gender differences exist in the onset, progression, and prognosis of cardiovascular diseases (CVDs), and that risk factors such as high blood pressure and lipid profiles vary between men and wo...It is well-known that gender differences exist in the onset, progression, and prognosis of cardiovascular diseases (CVDs), and that risk factors such as high blood pressure and lipid profiles vary between men and women, Cur- rently, sex differences are stressed as important variables to take into account when examining the etiology of CVD. Genome-wide association studies of CVD have employed the sex as a covariate in their analytical models, but generally disregarded potential genetic heterogeneity (GHS) attributable to sex.展开更多
Frequent traffic accidents constitute a major danger to human beings.The accident-prone driver who has the stable physiological,psychological,and behavioral characteristics is one of the most prominent causes of traff...Frequent traffic accidents constitute a major danger to human beings.The accident-prone driver who has the stable physiological,psychological,and behavioral characteristics is one of the most prominent causes of traffic accidents.The internal link between the individual characteristics and the accident proneness has been a difficult point in the accident prevention research.The authors selected accident-prone drivers as cases and safe drivers as controls(case-control group) from 18,360 drivers who were enrolled from three public transportation incorporations of China using area stratified sampling method.The case-control groups were 1:1 matched.The authors performed genome-wide association study(GWAS) by 179 cases and 179 controls using the U.S.Affymetrix Genome-Wide Human Mapping SNP 6.0Array.The authors observed that the gene frequencies of34 single-nucleotide polymorphisms(SNPs) in three regions of cases were higher than those in the control(P < 10^(–4)).The authors then tested two independent replication sets for strong association 6 SNPs in 349 pairs of case-control drivers using the U.S.ABI 3730 sequencing method.The results indicated that SNP rs6069499 within linked CBLN4 gene are strongly associated with accident proneness(Pcombined= 6.37×10^(-10)).According to CBLN4 gene mainly involved in adrenal development and the regulation of secretion,the authors performed 12 biochemical parameters of the blood using radioimmunoassay.The levels of dopamine(DA) and adrenocorticotropic(ACTH)hormone showed significant differences between accidentprone drivers and safe drivers(P_(DA)= 0.03,P_(ACTH)= 0.01).It is suggested that the accident-prone drivers may have the idiosyncrasy of susceptibility.展开更多
Precise mapping of quantitative trait loci(QTLs)is critical for assessing genetic effects and identifying candidate genes for quantitative traits.Interval and composite interval mappings have been the methods of choic...Precise mapping of quantitative trait loci(QTLs)is critical for assessing genetic effects and identifying candidate genes for quantitative traits.Interval and composite interval mappings have been the methods of choice for several decades,which have provided tools for identifying genomic regions harboring causal genes for quantitative traits.Historically,the concept was developed on the basis of sparse marker maps where genotypes of loci within intervals could not be observed.Currently,genomes of many organisms have been saturated with markers due to the new sequencing technologies.Genotyping by sequencing usually generates hundreds of thousands of single nucleotide polymorphisms(SNPs),which often include the causal polymorphisms.The concept of interval no longer exists,prompting the necessity of a norm change in QTL mapping technology to make use of the high-volume genomic data.Here we developed a statistical method and a software package to map QTLs by binning markers into haplotype blocks,called bins.The new method detects associations of bins with quantitative traits.It borrows the mixed model methodology with a polygenic control from genome-wide association studies(GWAS)and can handle all kinds of experimental populations under the linear mixed model(LMM)framework.We tested the method using both simulated data and data from populations of rice.The results showed that this method has higher power than the current methods.An R package named binQTL is available from GitHub.展开更多
Genome-wide association studies(GWASs)have shown that the genetic architecture of cancers are highly polygenic and enabled researchers to identify genetic risk loci for cancers.The genetic variants associated with a c...Genome-wide association studies(GWASs)have shown that the genetic architecture of cancers are highly polygenic and enabled researchers to identify genetic risk loci for cancers.The genetic variants associated with a cancer can be combined into a polygenic risk score(PRS),which captures part of an individual’s genetic susceptibility to cancer.Recently,PRSs have been widely used in cancer risk prediction and are shown to be capable of identifying groups of individuals who could benefit from the knowledge of their probabilistic susceptibility to cancer,which leads to an increased interest in understanding the potential utility of PRSs that might further refine the assessment and management of cancer risk.In this context,we provide an overview of the major discoveries from cancer GWASs.We then review the methodologies used for PRS construction,and describe steps for the development and evaluation of risk prediction models that include PRS and/or conventional risk factors.Potential utility of PRSs in cancer risk prediction,screening,and precision prevention are illustrated.Challenges and practical considerations relevant to the implementation of PRSs in health care settings are discussed.展开更多
基金supported by National Natural Science Foundation of China(No.81072389,81373102,81473070 and 81402765)Research Found for the Doctoral Program of Higher Education of China(No.20113234110002)+4 种基金Key Grant of Natural Science Foundation of the Jiangsu Higher Education Institutions of China(No.10KJA330034)College Philosophy and Social Science Foundation from Education Department of Jiangsu Province of China(No.2013SJB790059,2013SJD790032)Research Foundation from Xuzhou Medical College(No.2012KJ02)Research and Innovation Project for College Graduates of Jiangsu Province of China(No.CXLX13_574)the Priority Academic Program Development of Jiangsu Higher Education Institutions(PAPD)
文摘In the past few years, genome-wide association study (GWAS) has made great successes in identifying genetic susceptibility loci underlying many complex diseases and traits. The findings provide important genetic insights into understanding pathogenesis of diseases. In this paper, we present an overview of widely used approaches and strategies for analysis of GWAS, offered a general consideration to deal with GWAS data. The issues regarding data quality control, population structure, association analysis, multiple comparison and visual presentation of GWAS results are discussed; other advanced topics including the issue of missing heritability, meta-analysis, setbased association analysis, copy number variation analysis and GWAS cohort analysis are also briefly introduced.
基金supported by the National Natural Science Foundation of China(32070557 and 31871242)the Fundamental Research Funds for the Central Universities(2662020ZKPY017)+1 种基金the Huazhong Agricultural University Scientific&Technological Self-Innovation Foundation(2014RC020)the State Key Laboratory of Cotton Biology Open Fund(CB2021B01).
文摘Although genome-wide association studies are widely used to mine genes for quantitative traits,the effects to be estimated are confounded,and the methodologies for detecting interactions are imperfect.To address these issues,the mixed model proposed here first estimates the genotypic effects for AA,Aa,and aa,and the genotypic polygenic background replaces additive and dominance polygenic backgrounds.Then,the estimated genotypic effects are partitioned into additive and dominance effects using a one-way analysis of variance model.This strategy was further expanded to cover QTN-by-environment interactions(QEIs)and QTN-by-QTN interactions(QQIs)using the same mixed-model framework.Thus,a three-variance-component mixed model was integrated with our multi-locus random-SNP-effect mixed linear model(mrMLM)method to establish a new methodological framework,3VmrMLM,that detects all types of loci and estimates their effects.In Monte Carlo studies,3VmrMLM correctly detected all types of loci and almost unbiasedly estimated their effects,with high powers and accuracies and a low false positive rate.In re-analyses of 10 traits in 1439 rice hybrids,detection of 269 known genes,45 known gene-by-environment interactions,and 20 known gene-by-gene interactions strongly validated 3VmrMLM.Further analyses of known genes showed more small(67.49%),minor-allele-frequency(35.52%),and pleiotropic(30.54%)genes,with higher repeatability across datasets(54.36%)and more dominance loci.In addition,a heteroscedasticity mixed model in multiple environments and dimension reduction methods in quite a number of environments were developed to detect QEIs,and variable selection under a polygenic background was proposed for QQI detection.This study provides a new approach for revealing the genetic architecture of quantitative traits.
基金supported by the National Natural Science Foundation of China(Grant Nos.31871242,U1602261,31701071,21873034,and 31571268)the Huazhong Agricultural University Scientific&Technological Self-innovation Foundation,China(Grant No.2014RC020)the State Key Laboratory of Cotton Biology Open Fund,China(Grant No.CB2019B01)
文摘Previous studies have reported that some important loci are missed in single-locus genome-wide association studies(GWAS),especially because of the large phenotypic error in field experiments.To solve this issue,multi-locus GWAS methods have been recommended.However,only a few software packages for multi-locus GWAS are available.Therefore,we developed an R software named mr MLM v4.0.2.This software integrates mr MLM,FASTmr MLM,FASTmr EMMA,p LARm EB,p KWm EB,and ISIS EM-BLASSO methods developed by our lab.There are four components in mr MLM v4.0.2,including dataset input,parameter setting,software running,and result output.The fread function in data.table is used to quickly read datasets,especially big datasets,and the do Parallel package is used to conduct parallel computation using multiple CPUs.In addition,the graphical user interface software mr MLM.GUI v4.0.2,built upon Shiny,is also available.To confirm the correctness of the aforementioned programs,all the methods in mr MLM v4.0.2 and three widely-used methods were used to analyze real and simulated datasets.The results confirm the superior performance of mr MLM v4.0.2 to other methods currently available.False positive rates are effectively controlled,albeit with a less stringent significance threshold.mr MLM v4.0.2 is publicly available at Bio Code(https://bigd.big.ac.cn/biocode/tools/BT007077)or R(https://cran.r-project.org/web/packages/mr MLM.GUI/index.html)as an open-source software.
基金This study was supported by the National Science Foundation of China(31172192)New Century Excellent Talents(NCET-11-0646)Fundamental Research Funds for the Central Universities(2011JQ009,2012PY009).
文摘Backfat thickness is a good predictor of carcass lean content,an economically important trait,and a main breeding target in pig improvement.In this study,the candidate genes and genomic regions associated with the tenth rib backfat thickness trait were identified in two independent pig populations,using a genome-wide association study of porcine 60K SNP genotype data applying the compressed mixed linear model(CMLM)statistical method.For each population,30 most significant single-nucleotide polymorphisms(SNPs)were selected and SNP annotation implemented using Sus scrofa Build 10.2.In the first population,25 significant SNPs were distributed on seven chromosomes,and SNPs on SSC1 and SSC7 showed great significance for fat deposition.The most significant SNP(ALGA0006623)was located on SSC1,upstream of the MC4R gene.In the second population,27 significant SNPs were recognized by annotation,and 12 SNPs on SSC12 were related to fat deposition.Two haplotype blocks,M1GA0016251-MARC0075799 and ALGA0065251-MARC0014203-M1GA0016298-ALGA0065308,were detected in significant regions where the PIPNC1 and GH1 genes were identified as contributing to fat metabolism.The results indicated that genetic mechanism regulating backfat thickness is complex,and that genome-wide associations can be affected by populations with different genetic backgrounds.
基金supported by the Key Research and Development Program of Guangxi(2021AB13014)Major Project of Guangxi Innovation Driven(AA18118016)+7 种基金National Key Research and Development Program of China(2017YFC0908000)Natural Key Research and Development Project(2020YFA0113200)National Natural Science Foundation of China(81770759,82060145,31970814)Natural Science Foundation of Guangxi Zhuang Autonomous Region(2021JJA140912)Advanced Innovation Teams and Xinghu Scholars Program of Guangxi Medical University,Guangxi Key Laboratory for Genomic and Personalized Medicine(19-050-22,19-185-33,20-065-33,22-35-17)Major Project of Scientific Research and Technology Development Plan of Nanning(20221023)Guangxi Natural Science Foundation(2022GXNSFAA035641)Self-funded Project of Health Commission of Guangxi Zhuang Autonomous Region(Z-A20230620)。
文摘The Chinese tree shrew(Tupaia belangeri chinensis)has emerged as a promising model for investigating adrenal steroid synthesis,but it is unclear whether the same cells produce steroid hormones and whether their production is regulated in the same way as in humans.Here,we comprehensively mapped the cell types and pathways of steroid metabolism in the adrenal gland of Chinese tree shrews using single-cell RNA sequencing,spatial transcriptome analysis,mass spectrometry,and immunohistochemistry.We compared the transcriptomes of various adrenal cell types across tree shrews,humans,macaques,and mice.Results showed that tree shrew adrenal glands expressed many of the same key enzymes for steroid synthesis as humans,including CYP11B2,CYP11B1,CYB5A,and CHGA.Biochemical analysis confirmed the production of aldosterone,cortisol,and dehydroepiandrosterone but not dehydroepiandrosterone sulfate in the tree shrew adrenal glands.Furthermore,genes in adrenal cell types in tree shrews were correlated with genetic risk factors for polycystic ovary syndrome,primary aldosteronism,hypertension,and related disorders in humans based on genome-wide association studies.Overall,this study suggests that the adrenal glands of Chinese tree shrews may consist of closely related cell populations with functional similarity to those of the human adrenal gland.Our comprehensive results(publicly available at http://gxmujyzmolab.cn:16245/scAGMap/)should facilitate the advancement of this animal model for the investigation of adrenal gland disorders.
基金supported by the National Natural Science Foundation of China(Grant Nos.32070557 and 32270673)the Huazhong Agricultural University Scientific&Technological Self-innovation Foundation,China(Grant No.2014RC020).
文摘Multilocus genome-wide association study has become the state-of-the-art tool for dissecting the genetic architecture of complex and multiomic traits.However,most existing multilocus methods require relatively long computational time when analyzing large datasets.To address this issue,in this study,we proposed a fast mrMLM method,namely,best linear unbiased prediction multilocus random-SNP-effect mixed linear model(BLUPmrMLM).First,genome-wide single-marker scanning in mrMLM was replaced by vectorized Wald tests based on the best linear unbiased prediction(BLUP)values of marker effects and their variances in BLUPmrMLM.Then,adaptive best subset selection(ABESS)was used to identify potentially associated markers on each chromosome to reduce computational time when estimating marker effects via empirical Bayes.Finally,shared memory and parallel computing schemes were used to reduce the computational time.In simulation studies,BLUPmrMLM outperformed GEMMA,EMMAX,mrMLM,and FarmCPU as well as the control method(BLUPmrMLM with ABESS removed),in terms of computational time,power,accuracy for estimating quantitative trait nucleotide positions and effects,false positive rate,false discovery rate,false negative rate,and F1 score.In the reanalysis of two large rice datasets,BLUPmrMLM significantly reduced the computational time and identified more previously reported genes,compared with the aforementioned methods.This study provides an excellent multilocus model method for the analysis of large-scale and multiomic datasets.The software mrMLM v5.1 is available at BioCode(https://ngdc.cncb.ac.cn/biocode/tool/BT007388)or GitHub(https://github.com/YuanmingZhang65/mrMLM).
文摘全基因组关联分析(genome-wide association study,GWAS)是定位基因组中与性状显著关联的变异位点的有效方法。随着表型记录的完善、高通量基因型分型技术的发展,以及统计方法的改进,全基因组关联分析在人类疾病、动物植物遗传等领域得到了广泛的应用。假阳性是影响全基因组关联分析结果可靠性的重要因素之一。为了控制假阳性,除了校正P值,GWAS模型从最简单的方差分析(或用于质量性状的卡方检验)到加入固定效应协变量的普通线性模型(general linear model,GLM),再到加入随机效应的混合线性模型(mixed linear model,MLM)持续改进,控制了多种混杂因素导致的假阳性。将个体的遗传效应拟合为由基因组亲缘关系矩阵(genomic relationships matrix,GRM)定义的随机效应是目前常用的方法。由于MLM的参数估计大量消耗计算资源,研究人员不断尝试模型求解优化和GRM的构建优化(GRM的构建优化同时也提高了计算效率),最终将基于MLM计算的时间复杂度由O(MN3)逐步改进到O(MN),实现了计算速度与统计功效的飞跃。针对质量性状病例对照比失衡带来的假阳性问题,研究人员进一步对广义混合线性模型(generalized linear mixed model,GLMM)进行了校正。本文较全面地介绍了GWAS的基本原理和发展,着重阐述了GWAS中MLM模型的改进和优化细节,同时,列举了GWAS在农业中的应用,包括在植物、动物和微生物方面的研究成果,以及基于单倍型的GWAS应用。最后,从进一步提高GWAS统计功效和GWAS试验设计2个角度对GWAS未来的发展进行了展望。
基金国家自然科学基金项目(31301004)中央高校基本科研业务费项目(KJQN201422)资助+1 种基金supported by National Natural Science Foundation of China(31301004)Fundamental Research Funds for the Central Universities(KJQN201422)
文摘多位点关联分析在人和动植物遗传研究中的应用日益广泛。本文综述了以混合线性模型(mixed linear model,MLM)为框架下多位点关联分析的主要方法及重要软件平台,包括全基因组关联分析(genome-wide association study,GWAS)混合线性模型方法学的建立与发展,多位点模型方法的发展,多位点GWAS混合线性模型方法的发展,以及GWAS方法学研究的影响因素,最后展望了关联分析的发展方向。
文摘It is well-known that gender differences exist in the onset, progression, and prognosis of cardiovascular diseases (CVDs), and that risk factors such as high blood pressure and lipid profiles vary between men and women, Cur- rently, sex differences are stressed as important variables to take into account when examining the etiology of CVD. Genome-wide association studies of CVD have employed the sex as a covariate in their analytical models, but generally disregarded potential genetic heterogeneity (GHS) attributable to sex.
基金supported by grants from the Research Foundation of Anhui Sanlian University
文摘Frequent traffic accidents constitute a major danger to human beings.The accident-prone driver who has the stable physiological,psychological,and behavioral characteristics is one of the most prominent causes of traffic accidents.The internal link between the individual characteristics and the accident proneness has been a difficult point in the accident prevention research.The authors selected accident-prone drivers as cases and safe drivers as controls(case-control group) from 18,360 drivers who were enrolled from three public transportation incorporations of China using area stratified sampling method.The case-control groups were 1:1 matched.The authors performed genome-wide association study(GWAS) by 179 cases and 179 controls using the U.S.Affymetrix Genome-Wide Human Mapping SNP 6.0Array.The authors observed that the gene frequencies of34 single-nucleotide polymorphisms(SNPs) in three regions of cases were higher than those in the control(P < 10^(–4)).The authors then tested two independent replication sets for strong association 6 SNPs in 349 pairs of case-control drivers using the U.S.ABI 3730 sequencing method.The results indicated that SNP rs6069499 within linked CBLN4 gene are strongly associated with accident proneness(Pcombined= 6.37×10^(-10)).According to CBLN4 gene mainly involved in adrenal development and the regulation of secretion,the authors performed 12 biochemical parameters of the blood using radioimmunoassay.The levels of dopamine(DA) and adrenocorticotropic(ACTH)hormone showed significant differences between accidentprone drivers and safe drivers(P_(DA)= 0.03,P_(ACTH)= 0.01).It is suggested that the accident-prone drivers may have the idiosyncrasy of susceptibility.
基金supported by the National Key Research and Development Program (2016YFD0100802)the National Science Foundation Collaborative Research grant (DBI-1458515)
文摘Precise mapping of quantitative trait loci(QTLs)is critical for assessing genetic effects and identifying candidate genes for quantitative traits.Interval and composite interval mappings have been the methods of choice for several decades,which have provided tools for identifying genomic regions harboring causal genes for quantitative traits.Historically,the concept was developed on the basis of sparse marker maps where genotypes of loci within intervals could not be observed.Currently,genomes of many organisms have been saturated with markers due to the new sequencing technologies.Genotyping by sequencing usually generates hundreds of thousands of single nucleotide polymorphisms(SNPs),which often include the causal polymorphisms.The concept of interval no longer exists,prompting the necessity of a norm change in QTL mapping technology to make use of the high-volume genomic data.Here we developed a statistical method and a software package to map QTLs by binning markers into haplotype blocks,called bins.The new method detects associations of bins with quantitative traits.It borrows the mixed model methodology with a polygenic control from genome-wide association studies(GWAS)and can handle all kinds of experimental populations under the linear mixed model(LMM)framework.We tested the method using both simulated data and data from populations of rice.The results showed that this method has higher power than the current methods.An R package named binQTL is available from GitHub.
基金the National Natural Science Foundation of China(81820108028,81922061,82003530).
文摘Genome-wide association studies(GWASs)have shown that the genetic architecture of cancers are highly polygenic and enabled researchers to identify genetic risk loci for cancers.The genetic variants associated with a cancer can be combined into a polygenic risk score(PRS),which captures part of an individual’s genetic susceptibility to cancer.Recently,PRSs have been widely used in cancer risk prediction and are shown to be capable of identifying groups of individuals who could benefit from the knowledge of their probabilistic susceptibility to cancer,which leads to an increased interest in understanding the potential utility of PRSs that might further refine the assessment and management of cancer risk.In this context,we provide an overview of the major discoveries from cancer GWASs.We then review the methodologies used for PRS construction,and describe steps for the development and evaluation of risk prediction models that include PRS and/or conventional risk factors.Potential utility of PRSs in cancer risk prediction,screening,and precision prevention are illustrated.Challenges and practical considerations relevant to the implementation of PRSs in health care settings are discussed.