Neurodegenerative diseases cause great medical and economic burdens for both patients and society;however, the complex molecular mechanisms thereof are not yet well understood. With the development of high-coverage se...Neurodegenerative diseases cause great medical and economic burdens for both patients and society;however, the complex molecular mechanisms thereof are not yet well understood. With the development of high-coverage sequencing technology, researchers have started to notice that genomic repeat regions, previously neglected in search of disease culprits, are active contributors to multiple neurodegenerative diseases. In this review, we describe the association between repeat element variants and multiple degenerative diseases through genome-wide association studies and targeted sequencing. We discuss the identification of disease-relevant repeat element variants, further powered by the advancement of long-read sequencing technologies and their related tools, and summarize recent findings in the molecular mechanisms of repeat element variants in brain degeneration, such as those causing transcriptional silencing or RNA-mediated gain of toxic function. Furthermore, we describe how in silico predictions using innovative computational models, such as deep learning language models, could enhance and accelerate our understanding of the functional impact of repeat element variants. Finally, we discuss future directions to advance current findings for a better understanding of neurodegenerative diseases and the clinical applications of genomic repeat elements.展开更多
Common carp are among the oldest domesticated fish in the world.As such,there are many food and ornamental carp strains with abundant phenotypic variations due to natural and artificial selection.Hebao red carp(HB,Cyp...Common carp are among the oldest domesticated fish in the world.As such,there are many food and ornamental carp strains with abundant phenotypic variations due to natural and artificial selection.Hebao red carp(HB,Cyprinus carpio wuyuanensis),an indigenous strain in China,is renowned for its unique body morphology and reddish skin.To reveal the genetic basis underlying the distinct skin color of HB,we constructed an improved highfidelity(HiFi) HB genome with good contiguity,completeness,and correctness.Genome structure comparison was conducted between HB and a representative wild strain,Yellow River carp(YR,C.carpio haematopterus),to identify structural variants and genes under positive selection.Signatures of artificial selection during domestication were identified in HB and YR populations,while phenotype mapping was performed in a segregating population generated by HB×YR crosses.Body color in HB was associated with regions with fixed mutations.The simultaneous mutation and superposition of a pair of homologous genes(mitfa) in chromosomes A06 and B06 conferred the reddish color in domesticated HB.Transcriptome analysis of common carp with different alleles of the mitfa mutation confirmed that gene duplication can buffer the deleterious effects of mutation in allotetraploids.This study provides new insights into genotype-phenotype associations in allotetraploid species and lays a foundation for future breeding of common carp.展开更多
Due to the difficulty in accurately identifying structural variants(SVs) across genomes,their impact on cisregulato ry diverge n ce of closely related species,especially fish,remains to be explored.Recently identified...Due to the difficulty in accurately identifying structural variants(SVs) across genomes,their impact on cisregulato ry diverge n ce of closely related species,especially fish,remains to be explored.Recently identified broad H3K4me3 domains are essential for the regulation of genes involved in several biological processes.However,the role of broad H3K4me3 domains in phenotypic divergence remains poorly understood.Siniperca chuatsi and S.scherzeri are closely related but divergent in several phenotypic traits,making them an ideal model to study cis-regulatory evolution in sister species.Here,we generated chromosome-level genomes of S.chuatsi and S.scherzeri,with assembled genome sizes of 716.35 and740.54 Mb,respectively.The evolutionary histories of S.chuatsi and S.scherzeri were studied by inferring dynamic changes in ancestral population sizes.To explore the genetic basis of adaptation in S.chuatsi and S.scherzeri,we performed gene family expansion and contraction analysis and identified positively selected genes(PSGs).To investigate the role of SVs in cis-regulatory divergence of closely related fish species,we identified high-quality SVs as well as divergent H3K27ac and H3K4me3 domains in the genomes of S.chuatsi and S.scherzeri.Integrated analysis revealed that cis-regulatory divergence caused by SVs played an essential role in phenotypic divergence between S.chuatsi and S.scherzeri.Additionally,divergent broad H3K4me3 domains were mostly associated with cancer-related genes in S.chuatsi and S.scherzeri and contributed to their phenotypic divergence.展开更多
The domestication of Brassica oleracea has resulted in diverse morphological types with distinct patterns of organ development.Here we report a graph-based pan-genome of B.oleracea constructed from high-quality genome...The domestication of Brassica oleracea has resulted in diverse morphological types with distinct patterns of organ development.Here we report a graph-based pan-genome of B.oleracea constructed from high-quality genome assemblies of different morphotypes.The pan-genome harbors over 200 structural variant hotspot regions enriched in auxin-andflowering-related genes.Population genomic analyses revealed that early domestication of B.oleracea focused on leaf or stem development.Geneflows resulting from agricultural practices and variety improvement were detected among different morphotypes.Selective-sweep and pan-genome analyses identified an auxin-responsive small auxin up-regulated RNA gene and a CLAV-ATA3/ESR-RELATED family gene as crucial players in leaf–stem differentiation during the early stage of B.oleracea domestication and the BoKAN1 gene as instrumental in shaping the leafy heads of cabbage and Brussels sprouts.Our pan-genome and functional analyses further revealed that variations in the BoFLC2 gene play key roles in the divergence of vernalization andflowering characteristics among different morphotypes,and variations in thefirst intron of BoFLC3 are involved infine-tuning theflowering process in cauliflower.This study provides a comprehensive understanding of the pan-genome of B.oleracea and sheds light on the domestication and differential organ development of this globally important crop species.展开更多
Autism spectrum disorder(ASD)is a neurodevelopmental disorder with high genetic heritability but heterogeneity.Fully understanding its genetics requires whole-genome sequencing(WGS),but the ASD studies utilizing WGS d...Autism spectrum disorder(ASD)is a neurodevelopmental disorder with high genetic heritability but heterogeneity.Fully understanding its genetics requires whole-genome sequencing(WGS),but the ASD studies utilizing WGS data in Chinese population are limited.In this study,we present a WGS study for 334 individuals,including 112 ASD patients and their non-ASD parents.We identified 146 de novo variants in coding regions in 85 cases and 60 inherited variants in coding regions.By integrating these variants with an association model,we identified 33 potential risk genes(P<0.001)enriched in neuron and regulation related biological process.Besides the well-known ASD genes(SCN2A,NF1,SHANK3,CHD8 etc.),several high confidence genes were highlighted by a series of functional analyses,including CTNND1,DGKZ,LRP1,DDN,ZNF483,NR4A2,SMAD6,INTS1,and MRPL12,with more supported evidence from GO enrichment,expression and network analysis.We also integrated RNA-seq data to analyze the effect of the variants on the gene expression and found 12 genes in the individuals with the related variants had relatively biased expression.We further presented the clinical phenotypes of the proband carrying the risk genes in both our samples and Caucasian samples to show the effect of the risk genes on phenotype.Regarding variants in noncoding regions,a total of 74 de novo variants and 30 inherited variants were predicted as pathogenic with high confidence,which were mapped to specific genes or regulatory features.The number of de novo variants found in patient was significantly associated with the parents’ages at the birth of the child,and gender with trend.We also identified small de novo structural variants in ASD trios.The results in this study provided important evidence for understanding the genetic mechanism of ASD.展开更多
Complex structural variants(CSVs) are genomic alterations that have more than two breakpoints and are considered as the simultaneous occurrence of simple structural variants.However,detecting the compounded mutational...Complex structural variants(CSVs) are genomic alterations that have more than two breakpoints and are considered as the simultaneous occurrence of simple structural variants.However,detecting the compounded mutational signals of CSVs is challenging through a commonly used model-match strategy.As a result,there has been limited progress for CSV discovery compared with simple structural variants.Here,we systematically analyzed the multi-breakpoint connection feature of CSVs,and proposed Mako,utilizing a bottom-up guided model-free strategy,to detect CSVs from paired-end short-read sequencing.Specifically,we implemented a graph-based pattern growth approach,where the graph depicts potential breakpoint connections,and pattern growth enables CSV detection without pre-defined models.Comprehensive evaluations on both simulated and real datasets revealed that Mako outperformed other algorithms.Notably,validation rates of CSVs on real data based on experimental and computational validations as well as manual inspections are around 70%,where the medians of experimental and computational breakpoint shift are 13 bp and 26 bp,respectively.Moreover,the Mako CSV subgraph effectively characterized the breakpoint connections of a CSV event and uncovered a total of 15 CSV types,including two novel types of adjacent segment swap and tandem dispersed duplication.Further analysis of these CSVs also revealed the impact of sequence homology on the formation of CSVs.Mako is publicly available at https://github.com/xjtu-omics/Mako.展开更多
Short interspersed elements (SINEs), which are mainly composed of Bm1, are abundant in the domesticated silkworm. A 294 bp novel SINE family, designated as BmSE, was identified by mining the database of the complete...Short interspersed elements (SINEs), which are mainly composed of Bm1, are abundant in the domesticated silkworm. A 294 bp novel SINE family, designated as BmSE, was identified by mining the database of the complete Bombyx mori genome. A representational BmSE element is flanked by an 11 bp target site duplication sequence posterior poly (A) at the 3′ end and has the sequence motifs of an internal promoter of RNA polymerase III, which are similar to that of Bm1. The repetitive elements of BmSE are widely distributed in all 28 chromosomes of the genome and share the common (ATTT) repeats at the ends. GC-content distribution shows that BmSE tends to accumulate preferably in the region of higher AT content than that of Bm1. A high proportion of the BmSEs are mapped to the coding sequence introns, whereas several elements are also present in the UTR of some transcripts, indicating that BmSEs are indeed exonized with UTRs. Of the 615 identified structural variants (SVs) of BmSE among the 40 domesticated and wild silkworms, only 230 SVs were found in the domesticated silkworms, indicating that many recent SV events of BmSE occurred after domestication, which was probably due to its mobilization. Our analysis might assist in developing BmSE as a potential marker and in understanding the evolutionary roles of SINEs in the domesticated silkworm.展开更多
基金supported by the National Natural Science Foundation of China, No.61932008Natural Science Foundation of Shanghai, No.21ZR1403200 (both to JC)。
文摘Neurodegenerative diseases cause great medical and economic burdens for both patients and society;however, the complex molecular mechanisms thereof are not yet well understood. With the development of high-coverage sequencing technology, researchers have started to notice that genomic repeat regions, previously neglected in search of disease culprits, are active contributors to multiple neurodegenerative diseases. In this review, we describe the association between repeat element variants and multiple degenerative diseases through genome-wide association studies and targeted sequencing. We discuss the identification of disease-relevant repeat element variants, further powered by the advancement of long-read sequencing technologies and their related tools, and summarize recent findings in the molecular mechanisms of repeat element variants in brain degeneration, such as those causing transcriptional silencing or RNA-mediated gain of toxic function. Furthermore, we describe how in silico predictions using innovative computational models, such as deep learning language models, could enhance and accelerate our understanding of the functional impact of repeat element variants. Finally, we discuss future directions to advance current findings for a better understanding of neurodegenerative diseases and the clinical applications of genomic repeat elements.
基金supported by the National Key R&D Program of China (2019YFE0119000)National Natural Science Foundation of China (31872561)+1 种基金National Science Fund for Distinguished Young Scholars (32225049)Alliance of International Science Organizations (ANSO-CR-PP-2021-03)。
文摘Common carp are among the oldest domesticated fish in the world.As such,there are many food and ornamental carp strains with abundant phenotypic variations due to natural and artificial selection.Hebao red carp(HB,Cyprinus carpio wuyuanensis),an indigenous strain in China,is renowned for its unique body morphology and reddish skin.To reveal the genetic basis underlying the distinct skin color of HB,we constructed an improved highfidelity(HiFi) HB genome with good contiguity,completeness,and correctness.Genome structure comparison was conducted between HB and a representative wild strain,Yellow River carp(YR,C.carpio haematopterus),to identify structural variants and genes under positive selection.Signatures of artificial selection during domestication were identified in HB and YR populations,while phenotype mapping was performed in a segregating population generated by HB×YR crosses.Body color in HB was associated with regions with fixed mutations.The simultaneous mutation and superposition of a pair of homologous genes(mitfa) in chromosomes A06 and B06 conferred the reddish color in domesticated HB.Transcriptome analysis of common carp with different alleles of the mitfa mutation confirmed that gene duplication can buffer the deleterious effects of mutation in allotetraploids.This study provides new insights into genotype-phenotype associations in allotetraploid species and lays a foundation for future breeding of common carp.
基金supported by the National Natural Science Foundation of China (31900309)Guangdong Basic and Applied Basic Research Foundation (2019A1515011644)+2 种基金Key-Area Research and Development Program of Guangdong Province (2021B0202020001)Seed Industry Development Project of Agricultural and Rural Department of Guangdong Province (2022)Innovation Group Project of Southern Marine Science and Engineering Guangdong Laboratory (Zhuhai)(311021006)。
文摘Due to the difficulty in accurately identifying structural variants(SVs) across genomes,their impact on cisregulato ry diverge n ce of closely related species,especially fish,remains to be explored.Recently identified broad H3K4me3 domains are essential for the regulation of genes involved in several biological processes.However,the role of broad H3K4me3 domains in phenotypic divergence remains poorly understood.Siniperca chuatsi and S.scherzeri are closely related but divergent in several phenotypic traits,making them an ideal model to study cis-regulatory evolution in sister species.Here,we generated chromosome-level genomes of S.chuatsi and S.scherzeri,with assembled genome sizes of 716.35 and740.54 Mb,respectively.The evolutionary histories of S.chuatsi and S.scherzeri were studied by inferring dynamic changes in ancestral population sizes.To explore the genetic basis of adaptation in S.chuatsi and S.scherzeri,we performed gene family expansion and contraction analysis and identified positively selected genes(PSGs).To investigate the role of SVs in cis-regulatory divergence of closely related fish species,we identified high-quality SVs as well as divergent H3K27ac and H3K4me3 domains in the genomes of S.chuatsi and S.scherzeri.Integrated analysis revealed that cis-regulatory divergence caused by SVs played an essential role in phenotypic divergence between S.chuatsi and S.scherzeri.Additionally,divergent broad H3K4me3 domains were mostly associated with cancer-related genes in S.chuatsi and S.scherzeri and contributed to their phenotypic divergence.
基金supported by grants from the National Key Research and Development Program of China (2022YFF1003001)the National Natural Science Foundation of China (32072576)+3 种基金the National Modern Agriculture Industry Technology System (CARS-23-G42)the Jiangsu Provincial Key Research and Development Program (BE2021376)the Innovation Program of the Beijing Academy of Agricultural and Forestry Sciences (KJCX20230121)the Collaborative Innovation Program for Leafy and Root Vegetables of the Beijing Vegetable Research Center,Beijing Academy of Agricultural and Forestry Sciences (XTCX202302).
文摘The domestication of Brassica oleracea has resulted in diverse morphological types with distinct patterns of organ development.Here we report a graph-based pan-genome of B.oleracea constructed from high-quality genome assemblies of different morphotypes.The pan-genome harbors over 200 structural variant hotspot regions enriched in auxin-andflowering-related genes.Population genomic analyses revealed that early domestication of B.oleracea focused on leaf or stem development.Geneflows resulting from agricultural practices and variety improvement were detected among different morphotypes.Selective-sweep and pan-genome analyses identified an auxin-responsive small auxin up-regulated RNA gene and a CLAV-ATA3/ESR-RELATED family gene as crucial players in leaf–stem differentiation during the early stage of B.oleracea domestication and the BoKAN1 gene as instrumental in shaping the leafy heads of cabbage and Brussels sprouts.Our pan-genome and functional analyses further revealed that variations in the BoFLC2 gene play key roles in the divergence of vernalization andflowering characteristics among different morphotypes,and variations in thefirst intron of BoFLC3 are involved infine-tuning theflowering process in cauliflower.This study provides a comprehensive understanding of the pan-genome of B.oleracea and sheds light on the domestication and differential organ development of this globally important crop species.
基金supported by the National Program for Brain Science and Brain-like Intelligence Technology of China (2021ZD0200800)Beijing Municipal Science and Technology Commission (Z181100001518005)+1 种基金the National Natural Science Foundation of China (31401139, 32170613, 81671358, 81873803)the Natural Science Foundation of Beijing Municipality (7232225)
文摘Autism spectrum disorder(ASD)is a neurodevelopmental disorder with high genetic heritability but heterogeneity.Fully understanding its genetics requires whole-genome sequencing(WGS),but the ASD studies utilizing WGS data in Chinese population are limited.In this study,we present a WGS study for 334 individuals,including 112 ASD patients and their non-ASD parents.We identified 146 de novo variants in coding regions in 85 cases and 60 inherited variants in coding regions.By integrating these variants with an association model,we identified 33 potential risk genes(P<0.001)enriched in neuron and regulation related biological process.Besides the well-known ASD genes(SCN2A,NF1,SHANK3,CHD8 etc.),several high confidence genes were highlighted by a series of functional analyses,including CTNND1,DGKZ,LRP1,DDN,ZNF483,NR4A2,SMAD6,INTS1,and MRPL12,with more supported evidence from GO enrichment,expression and network analysis.We also integrated RNA-seq data to analyze the effect of the variants on the gene expression and found 12 genes in the individuals with the related variants had relatively biased expression.We further presented the clinical phenotypes of the proband carrying the risk genes in both our samples and Caucasian samples to show the effect of the risk genes on phenotype.Regarding variants in noncoding regions,a total of 74 de novo variants and 30 inherited variants were predicted as pathogenic with high confidence,which were mapped to specific genes or regulatory features.The number of de novo variants found in patient was significantly associated with the parents’ages at the birth of the child,and gender with trend.We also identified small de novo structural variants in ASD trios.The results in this study provided important evidence for understanding the genetic mechanism of ASD.
基金supported by the National Key R&D Program of China(Grant Nos.2018YFC0910400 and 2017YFC0907500)the National Science Foundation of China(Grant Nos.31671372,61702406,and 31701739)+3 种基金the Fundamental Research Funds for the Central Universitiesthe World-Class Universities(Disciplines)the Characteristic Development Guidance Funds for the Central Universitiesthe Shanghai Municipal Science and Technology Major Project(Grant No.2017SHZDZX01)。
文摘Complex structural variants(CSVs) are genomic alterations that have more than two breakpoints and are considered as the simultaneous occurrence of simple structural variants.However,detecting the compounded mutational signals of CSVs is challenging through a commonly used model-match strategy.As a result,there has been limited progress for CSV discovery compared with simple structural variants.Here,we systematically analyzed the multi-breakpoint connection feature of CSVs,and proposed Mako,utilizing a bottom-up guided model-free strategy,to detect CSVs from paired-end short-read sequencing.Specifically,we implemented a graph-based pattern growth approach,where the graph depicts potential breakpoint connections,and pattern growth enables CSV detection without pre-defined models.Comprehensive evaluations on both simulated and real datasets revealed that Mako outperformed other algorithms.Notably,validation rates of CSVs on real data based on experimental and computational validations as well as manual inspections are around 70%,where the medians of experimental and computational breakpoint shift are 13 bp and 26 bp,respectively.Moreover,the Mako CSV subgraph effectively characterized the breakpoint connections of a CSV event and uncovered a total of 15 CSV types,including two novel types of adjacent segment swap and tandem dispersed duplication.Further analysis of these CSVs also revealed the impact of sequence homology on the formation of CSVs.Mako is publicly available at https://github.com/xjtu-omics/Mako.
基金supported by the Natural Science Foun-dation Project of CQ CSTC (No. 2009BB1241)Ministry of Science and Technology of China (No. 2006AA10A117 and 2005CB121003)
文摘Short interspersed elements (SINEs), which are mainly composed of Bm1, are abundant in the domesticated silkworm. A 294 bp novel SINE family, designated as BmSE, was identified by mining the database of the complete Bombyx mori genome. A representational BmSE element is flanked by an 11 bp target site duplication sequence posterior poly (A) at the 3′ end and has the sequence motifs of an internal promoter of RNA polymerase III, which are similar to that of Bm1. The repetitive elements of BmSE are widely distributed in all 28 chromosomes of the genome and share the common (ATTT) repeats at the ends. GC-content distribution shows that BmSE tends to accumulate preferably in the region of higher AT content than that of Bm1. A high proportion of the BmSEs are mapped to the coding sequence introns, whereas several elements are also present in the UTR of some transcripts, indicating that BmSEs are indeed exonized with UTRs. Of the 615 identified structural variants (SVs) of BmSE among the 40 domesticated and wild silkworms, only 230 SVs were found in the domesticated silkworms, indicating that many recent SV events of BmSE occurred after domestication, which was probably due to its mobilization. Our analysis might assist in developing BmSE as a potential marker and in understanding the evolutionary roles of SINEs in the domesticated silkworm.