Neurodegenerative diseases cause great medical and economic burdens for both patients and society;however, the complex molecular mechanisms thereof are not yet well understood. With the development of high-coverage se...Neurodegenerative diseases cause great medical and economic burdens for both patients and society;however, the complex molecular mechanisms thereof are not yet well understood. With the development of high-coverage sequencing technology, researchers have started to notice that genomic repeat regions, previously neglected in search of disease culprits, are active contributors to multiple neurodegenerative diseases. In this review, we describe the association between repeat element variants and multiple degenerative diseases through genome-wide association studies and targeted sequencing. We discuss the identification of disease-relevant repeat element variants, further powered by the advancement of long-read sequencing technologies and their related tools, and summarize recent findings in the molecular mechanisms of repeat element variants in brain degeneration, such as those causing transcriptional silencing or RNA-mediated gain of toxic function. Furthermore, we describe how in silico predictions using innovative computational models, such as deep learning language models, could enhance and accelerate our understanding of the functional impact of repeat element variants. Finally, we discuss future directions to advance current findings for a better understanding of neurodegenerative diseases and the clinical applications of genomic repeat elements.展开更多
Endophthalmitis is a serious ophthalmic disease characterized by changes in the eye's posterior segment,such as hypopyon and intraocular inflammation,vitritis being a hallmark.Infection-caused endophthalmitis can ...Endophthalmitis is a serious ophthalmic disease characterized by changes in the eye's posterior segment,such as hypopyon and intraocular inflammation,vitritis being a hallmark.Infection-caused endophthalmitis can lead to irreversible vision loss,accompanied by eye pain or eye distention,and in the most severe cases the removal of the eyeball.Microorganisms such as bacteria,fungi,viruses,and parasites typically account for the disease and the entry pathways of the microbial can be divided into either endogenous or exogenous approaches,according to the origin of the etiological agents.Exogenous endophthalmitis can be derived from various occasions(such as postoperative complications or trauma)while endogenous endophthalmitis results from the bloodstream which carries pathogens to the eye.This review aims to summarize the application of new technology in pathogen identification of endophthalmitis so as to prevent the disease and better guide clinical diagnosis and treatment.展开更多
High-quality DNA extraction is a crucial step in metagenomic studies.Bias by different isolation kits impairs the comparison across datasets.A trending topic is,however,the analysis of multiple metagenomes from the sa...High-quality DNA extraction is a crucial step in metagenomic studies.Bias by different isolation kits impairs the comparison across datasets.A trending topic is,however,the analysis of multiple metagenomes from the same patients to draw a holistic picture of microbiota associated with diseases.We thus collected bile,stool,saliva,plaque,sputum,and conjunctival swab samples and performed DNA extraction with three commercial kits.For each combination of the specimen type and DNA extraction kit,20-gigabase(Gb)metagenomic data were generated using short-read sequencing.While profiles of the specimen types showed close proximity to each other,we observed notable differences in the alpha diversity and composition of the microbiota depending on the DNA extraction kits.No kit outperformed all selected kits on every specimen.We reached consistently good results using the Qiagen QiAamp DNA Microbiome Kit.Depending on the specimen,our data indicate that over 10 Gb of sequencing data are required to achieve sufficient resolution,but DNA-based identification is superior to identification by mass spectrometry.Finally,longread nanopore sequencing confirmed the results(correlation coefficient>0.98).Our results thus suggest using a strategy with only one kit for studies aiming for a direct comparison of multiple microbiotas from the same patients.展开更多
The Human Genome Project opened an era of(epi)genomic research,and also provided a platform for the development of new sequencing technologies.During and after the project,several sequencing technologies continue to d...The Human Genome Project opened an era of(epi)genomic research,and also provided a platform for the development of new sequencing technologies.During and after the project,several sequencing technologies continue to dominate nucleic acid sequencing markets.Currently,Illumina(short-read),PacBio(long-read),and Oxford Nanopore(longread)are the most popular sequencing technologies.Unlike PacBio or the popular short-read sequencers before it,which,as examples of the second or so-called Next-Generation Sequencing platforms,need to synthesize when sequencing,nanopore technology directly sequences native DNA and RNA molecules.Nanopore sequencing,therefore,avoids converting mRNA into cDNA molecules,which not only allows for the sequencing of extremely long native DNA and full-length RNA molecules but also document modifications that have been made to those native DNA or RNA bases.In this review on direct DNA sequencing and direct RNA sequencing using Oxford Nanopore technology,we focus on their development and application achievements,discussing their challenges and future perspective.We also address the problems researchers may encounter applying these approaches in their research topics,and how to resolve them.展开更多
Background:Oxford Nanopore long-read sequencing technology addresses current limitations for DNA methylation detection that are inherent in short-read bisulfite sequencing or methylation microarrays.A number of analyt...Background:Oxford Nanopore long-read sequencing technology addresses current limitations for DNA methylation detection that are inherent in short-read bisulfite sequencing or methylation microarrays.A number of analytical tools,such as Nanopolish,Guppy/Tombo and DeepMod,have been developed to detect DNA methylation on Nanopore data.However,additional improvements can be made in computational efficiency,prediction accuracy,and contextual interpretation on complex genomics regions(such as repetitive regions,low GC density regions).Method:In the current study,we apply Transformer architecture to detect DNA methylation on ionic signals from Oxford Nanopore sequencing data.Transformer is an algorithm that adopts self-attention architecture in the neural networks and has been widely used in natural language processing.Results:Compared to traditional deep-learning method such as convolutional neural network(CNN)and recurrent neural network(RNN),Transformer may have specific advantages in DNA methylation detection,because the self-attention mechanism can assist the relationship detection between bases that are far from each other and pay more attention to important bases that carry characteristic methylation-specific signals within a specific sequence context.Conclusion:We demonstrated the ability of Transformers to detect methylation on ionic signal data.展开更多
Circular RNA(circRNA)is a special type of non-coding RNA that participates in diverse biological processes in both animals and plants.Five years ago,we developed a comprehensive plant circRNA database(PlantcircBase),w...Circular RNA(circRNA)is a special type of non-coding RNA that participates in diverse biological processes in both animals and plants.Five years ago,we developed a comprehensive plant circRNA database(PlantcircBase),which has attracted much attention from the plant circRNA community.Here,we report an updated PlantcircBase(v.7.0),which contains 171,118 circRNAs from 21 plant species.Over 31,000 of the circRNAs have full-length sequences constructed based on analysis of 749 bulk RNA sequencing(RNAseq)datasets downloaded from the public domain and Nanopore long-read sequencing results of rice RNAs newly generated in this study.A plant multiple conservation score(PMCS),based on the conservation of both sequence and expression profiles,was calculated for each circRNA to quantify and compare the conservation of all circRNAs.A new parameter,plant circRNA confidence level(PCCL),is introduced to measure the identity reliability of each circRNA based on experimental validation results and the number of references that support the circRNA.All this information and other details of circRNAs can be browsed,searched,and downloaded from PlantcircBase 7.0,which also provides online bioinformatics tools for visualization and sequence alignment.PlantcircBase 7.0 is publicly and freely accessible at http://ibi.zju.edu.cn/plantcircbase/.展开更多
Mung bean is an economically important legume crop species that is used as a food,consumed as a vegetable,and used as an ingredient and even as a medicine.To explore the genomic diversity of mung bean,we assembled a h...Mung bean is an economically important legume crop species that is used as a food,consumed as a vegetable,and used as an ingredient and even as a medicine.To explore the genomic diversity of mung bean,we assembled a high-quality reference genome(Vrad_JL7)that was479.35 Mb in size,with a contig N50 length of 10.34 Mb.A total of 40,125 protein-coding genes were annotated,representing96.9%of the genetic region.We also sequenced 217 accessions,mainly landraces and cultivars from China,and identified 2,229,343 high-quality single-nucleotide polymorphisms(SNPs).Population structure revealed that the Chinese accessions diverged into two groups and were distinct from non-Chinese lines.Genetic diversity analysis based on genomic data from 750 accessions in 23 countries supported the hypothesis that mung bean was first domesticated in south Asia and introduced to east Asia probably through the Silk Road.We constructed the first pan-genome of mung bean germplasm and assembled 287.73 Mb of non-reference sequences.Among the genes,83.1%were core genes and 16.9%were variable.Presence/absence variation(PAV)events of nine genes involved in the regulation of the photoperiodic flowering pathway were identified as being under selection during the adaptation process to promote early flowering in the spring.Genomewide association studies(GWASs)revealed 2,912 SNPs and 259 gene PAV events associated with 33 agronomic traits,including a SNP in the coding region of the SWEET10 homolog(jg24043)involved in crude starch content and a PAV event in a large fragment containing 11 genes for color-related traits.This high-quality reference genome and pan-genome will provide insights into mung bean breeding.展开更多
The ability of the third generation sequencing technologies to provide longer sequence reads contributes to the use of the longest possible amplicons as specific bacterial markers for metabarcoding studies.Nanopore se...The ability of the third generation sequencing technologies to provide longer sequence reads contributes to the use of the longest possible amplicons as specific bacterial markers for metabarcoding studies.Nanopore sequencing technologies are increasingly used worldwide to profile microbiomes in environmental and food samples.The identification of beneficial or pathogenic bacteria in dairy fermented foods is related to their valuable health properties and also contributes to food safety issues.Here we described and optimised a PCRbased methodology of almost the entire ribosomal operon sequences(16S-ITS-23S)and their subsequent sequencing by MinION device.We used three different sequencing data processing and analysis strategies.Two of those utilized user-friendly software without the need of being conversant with any programming language.We tested all workflows on a simple mock community composed of a mixture of 7 bacterial DNA.Our scripted bioinformatics pipeline denoted as“AEROS”,representing an approach based on taxonomic classification with our reference database called AEROS-DB(Almost Entire Ribosomal Operon Sequences),was applied to traditional Slovak sheep cheese made from unpasteurized milk.All bacterial genera included in the mock community were detected with relatively small differences compared to the expected relative abundance using each of the three approaches.The AEROS approach provided more accurate composition data on this community at the species level as well.The results suggested that the use of almost entire rrn operon sequences in metabarcoding studies is suitable to analyze the bacterial consortia in cheeses and related dairy fermented products.展开更多
Over the past decade,nanopore sequencing has experienced significant advancements and changes,transitioning from an initially emerging technology to a significant instrument in the field of genomic sequencing.However,...Over the past decade,nanopore sequencing has experienced significant advancements and changes,transitioning from an initially emerging technology to a significant instrument in the field of genomic sequencing.However,as advancements in next-generation sequencing technology persist,nanopore sequencing also improves.This paper reviews the developments,applications,and outlook on nanopore sequencing technology.Currently,nanopore sequencing supports both DNA and RNA sequencing,making it widely applicable in areas such as telomere-to-telomere(T2T)genome assembly,direct RNA sequencing(DRS),and metagenomics.The openness and versatility of nanopore sequencing have established it as a preferred option for an increasing number of research teams,signaling a transformative influence on life science research.As the nanopore sequencing technology advances,it provides a faster,more costeffective approach with extended read lengths,demonstrating the significant potential for complex genome assembly,pathogen detection,environmental monitoring,and human disease research,offering a fresh perspective in sequencing technologies.展开更多
基金supported by the National Natural Science Foundation of China, No.61932008Natural Science Foundation of Shanghai, No.21ZR1403200 (both to JC)。
文摘Neurodegenerative diseases cause great medical and economic burdens for both patients and society;however, the complex molecular mechanisms thereof are not yet well understood. With the development of high-coverage sequencing technology, researchers have started to notice that genomic repeat regions, previously neglected in search of disease culprits, are active contributors to multiple neurodegenerative diseases. In this review, we describe the association between repeat element variants and multiple degenerative diseases through genome-wide association studies and targeted sequencing. We discuss the identification of disease-relevant repeat element variants, further powered by the advancement of long-read sequencing technologies and their related tools, and summarize recent findings in the molecular mechanisms of repeat element variants in brain degeneration, such as those causing transcriptional silencing or RNA-mediated gain of toxic function. Furthermore, we describe how in silico predictions using innovative computational models, such as deep learning language models, could enhance and accelerate our understanding of the functional impact of repeat element variants. Finally, we discuss future directions to advance current findings for a better understanding of neurodegenerative diseases and the clinical applications of genomic repeat elements.
文摘Endophthalmitis is a serious ophthalmic disease characterized by changes in the eye's posterior segment,such as hypopyon and intraocular inflammation,vitritis being a hallmark.Infection-caused endophthalmitis can lead to irreversible vision loss,accompanied by eye pain or eye distention,and in the most severe cases the removal of the eyeball.Microorganisms such as bacteria,fungi,viruses,and parasites typically account for the disease and the entry pathways of the microbial can be divided into either endogenous or exogenous approaches,according to the origin of the etiological agents.Exogenous endophthalmitis can be derived from various occasions(such as postoperative complications or trauma)while endogenous endophthalmitis results from the bloodstream which carries pathogens to the eye.This review aims to summarize the application of new technology in pathogen identification of endophthalmitis so as to prevent the disease and better guide clinical diagnosis and treatment.
文摘High-quality DNA extraction is a crucial step in metagenomic studies.Bias by different isolation kits impairs the comparison across datasets.A trending topic is,however,the analysis of multiple metagenomes from the same patients to draw a holistic picture of microbiota associated with diseases.We thus collected bile,stool,saliva,plaque,sputum,and conjunctival swab samples and performed DNA extraction with three commercial kits.For each combination of the specimen type and DNA extraction kit,20-gigabase(Gb)metagenomic data were generated using short-read sequencing.While profiles of the specimen types showed close proximity to each other,we observed notable differences in the alpha diversity and composition of the microbiota depending on the DNA extraction kits.No kit outperformed all selected kits on every specimen.We reached consistently good results using the Qiagen QiAamp DNA Microbiome Kit.Depending on the specimen,our data indicate that over 10 Gb of sequencing data are required to achieve sufficient resolution,but DNA-based identification is superior to identification by mass spectrometry.Finally,longread nanopore sequencing confirmed the results(correlation coefficient>0.98).Our results thus suggest using a strategy with only one kit for studies aiming for a direct comparison of multiple microbiotas from the same patients.
基金supported by the Key-Areas Research and Development Program of Guangdong Province(2020B020220004)the Youth Innovation Promotion Association,Chinese Academy of Sciences(2017399)+2 种基金the Science and Technology Program of Guangzhou(202002030097)the Hong Kong Research Grants Council Area of Excellence Scheme(AoE/M-403/16),the ECS(27204518)TRS of the HKSAR government(T21-705/20-N).
文摘The Human Genome Project opened an era of(epi)genomic research,and also provided a platform for the development of new sequencing technologies.During and after the project,several sequencing technologies continue to dominate nucleic acid sequencing markets.Currently,Illumina(short-read),PacBio(long-read),and Oxford Nanopore(longread)are the most popular sequencing technologies.Unlike PacBio or the popular short-read sequencers before it,which,as examples of the second or so-called Next-Generation Sequencing platforms,need to synthesize when sequencing,nanopore technology directly sequences native DNA and RNA molecules.Nanopore sequencing,therefore,avoids converting mRNA into cDNA molecules,which not only allows for the sequencing of extremely long native DNA and full-length RNA molecules but also document modifications that have been made to those native DNA or RNA bases.In this review on direct DNA sequencing and direct RNA sequencing using Oxford Nanopore technology,we focus on their development and application achievements,discussing their challenges and future perspective.We also address the problems researchers may encounter applying these approaches in their research topics,and how to resolve them.
文摘Background:Oxford Nanopore long-read sequencing technology addresses current limitations for DNA methylation detection that are inherent in short-read bisulfite sequencing or methylation microarrays.A number of analytical tools,such as Nanopolish,Guppy/Tombo and DeepMod,have been developed to detect DNA methylation on Nanopore data.However,additional improvements can be made in computational efficiency,prediction accuracy,and contextual interpretation on complex genomics regions(such as repetitive regions,low GC density regions).Method:In the current study,we apply Transformer architecture to detect DNA methylation on ionic signals from Oxford Nanopore sequencing data.Transformer is an algorithm that adopts self-attention architecture in the neural networks and has been widely used in natural language processing.Results:Compared to traditional deep-learning method such as convolutional neural network(CNN)and recurrent neural network(RNN),Transformer may have specific advantages in DNA methylation detection,because the self-attention mechanism can assist the relationship detection between bases that are far from each other and pay more attention to important bases that carry characteristic methylation-specific signals within a specific sequence context.Conclusion:We demonstrated the ability of Transformers to detect methylation on ionic signal data.
基金supported by the National Natural Science Foundation of China(32101729,31871589,and 91740108)the National Postdoctoral Program for Innovative Talents(BX20200301).
文摘Circular RNA(circRNA)is a special type of non-coding RNA that participates in diverse biological processes in both animals and plants.Five years ago,we developed a comprehensive plant circRNA database(PlantcircBase),which has attracted much attention from the plant circRNA community.Here,we report an updated PlantcircBase(v.7.0),which contains 171,118 circRNAs from 21 plant species.Over 31,000 of the circRNAs have full-length sequences constructed based on analysis of 749 bulk RNA sequencing(RNAseq)datasets downloaded from the public domain and Nanopore long-read sequencing results of rice RNAs newly generated in this study.A plant multiple conservation score(PMCS),based on the conservation of both sequence and expression profiles,was calculated for each circRNA to quantify and compare the conservation of all circRNAs.A new parameter,plant circRNA confidence level(PCCL),is introduced to measure the identity reliability of each circRNA based on experimental validation results and the number of references that support the circRNA.All this information and other details of circRNAs can be browsed,searched,and downloaded from PlantcircBase 7.0,which also provides online bioinformatics tools for visualization and sequence alignment.PlantcircBase 7.0 is publicly and freely accessible at http://ibi.zju.edu.cn/plantcircbase/.
基金supported by the National Key R&D Program of China(2019YFD1000700/2019YFD1000702)the China Agricultural Research System(CARS-08-G3)+2 种基金the Key Research and Development Program of Hebei(21326305D)the Hebei Agriculture Research System(HBCT2018070203)the Hebei Talent Project.
文摘Mung bean is an economically important legume crop species that is used as a food,consumed as a vegetable,and used as an ingredient and even as a medicine.To explore the genomic diversity of mung bean,we assembled a high-quality reference genome(Vrad_JL7)that was479.35 Mb in size,with a contig N50 length of 10.34 Mb.A total of 40,125 protein-coding genes were annotated,representing96.9%of the genetic region.We also sequenced 217 accessions,mainly landraces and cultivars from China,and identified 2,229,343 high-quality single-nucleotide polymorphisms(SNPs).Population structure revealed that the Chinese accessions diverged into two groups and were distinct from non-Chinese lines.Genetic diversity analysis based on genomic data from 750 accessions in 23 countries supported the hypothesis that mung bean was first domesticated in south Asia and introduced to east Asia probably through the Silk Road.We constructed the first pan-genome of mung bean germplasm and assembled 287.73 Mb of non-reference sequences.Among the genes,83.1%were core genes and 16.9%were variable.Presence/absence variation(PAV)events of nine genes involved in the regulation of the photoperiodic flowering pathway were identified as being under selection during the adaptation process to promote early flowering in the spring.Genomewide association studies(GWASs)revealed 2,912 SNPs and 259 gene PAV events associated with 33 agronomic traits,including a SNP in the coding region of the SWEET10 homolog(jg24043)involved in crude starch content and a PAV event in a large fragment containing 11 genes for color-related traits.This high-quality reference genome and pan-genome will provide insights into mung bean breeding.
基金supported by the Slovak Research and Development Agency under Contract no.APVV-20-0001published with the support of the Operational Program Integrated Infrastructure within the project:“Výskum v sieti SANET a moznosti jej d’alsieho vyuzitia a rozvoja/Research in the SANET network and possibilities of its further use and development”,ITMS code 313011W988co-financed by the ERDF.The study is a result of the implementation of the project PreveLynch,ITMS 2014+:313011V578.
文摘The ability of the third generation sequencing technologies to provide longer sequence reads contributes to the use of the longest possible amplicons as specific bacterial markers for metabarcoding studies.Nanopore sequencing technologies are increasingly used worldwide to profile microbiomes in environmental and food samples.The identification of beneficial or pathogenic bacteria in dairy fermented foods is related to their valuable health properties and also contributes to food safety issues.Here we described and optimised a PCRbased methodology of almost the entire ribosomal operon sequences(16S-ITS-23S)and their subsequent sequencing by MinION device.We used three different sequencing data processing and analysis strategies.Two of those utilized user-friendly software without the need of being conversant with any programming language.We tested all workflows on a simple mock community composed of a mixture of 7 bacterial DNA.Our scripted bioinformatics pipeline denoted as“AEROS”,representing an approach based on taxonomic classification with our reference database called AEROS-DB(Almost Entire Ribosomal Operon Sequences),was applied to traditional Slovak sheep cheese made from unpasteurized milk.All bacterial genera included in the mock community were detected with relatively small differences compared to the expected relative abundance using each of the three approaches.The AEROS approach provided more accurate composition data on this community at the species level as well.The results suggested that the use of almost entire rrn operon sequences in metabarcoding studies is suitable to analyze the bacterial consortia in cheeses and related dairy fermented products.
基金financially supported by the Natural Science Foundation of China(32470055 and U23A20148)the China Postdoctoral Science Foundation(2024M753580)the Agricultural Science and Technology Innovation Program(CAAS-ZDRW202308)。
文摘Over the past decade,nanopore sequencing has experienced significant advancements and changes,transitioning from an initially emerging technology to a significant instrument in the field of genomic sequencing.However,as advancements in next-generation sequencing technology persist,nanopore sequencing also improves.This paper reviews the developments,applications,and outlook on nanopore sequencing technology.Currently,nanopore sequencing supports both DNA and RNA sequencing,making it widely applicable in areas such as telomere-to-telomere(T2T)genome assembly,direct RNA sequencing(DRS),and metagenomics.The openness and versatility of nanopore sequencing have established it as a preferred option for an increasing number of research teams,signaling a transformative influence on life science research.As the nanopore sequencing technology advances,it provides a faster,more costeffective approach with extended read lengths,demonstrating the significant potential for complex genome assembly,pathogen detection,environmental monitoring,and human disease research,offering a fresh perspective in sequencing technologies.