Previous studies on genetic diseases predominantly focused on protein-coding variations, overlooking the vast noncoding regions in the human genome. The development of high-throughput sequencing technologies and funct...Previous studies on genetic diseases predominantly focused on protein-coding variations, overlooking the vast noncoding regions in the human genome. The development of high-throughput sequencing technologies and functional genomics tools has enabled the systematic identification of functional noncoding variants. These variants can impact gene expression, regulation, and chromatin conformation, thereby contributing to disease pathogenesis. Understanding the mechanisms that underlie the impact of noncoding variants on genetic diseases is indispensable for the development of precisely targeted therapies and the implementation of personalized medicine strategies. The intricacies of noncoding regions introduce a multitude of challenges and research opportunities. In this review, we introduce a spectrum of noncoding variants involved in genetic diseases, along with research strategies and advanced technologies for their precise identification and in-depth understanding of the complexity of the noncoding genome. We will delve into the research challenges and propose potential solutions for unraveling the genetic basis of rare and complex diseases.展开更多
AlphaFold2(AF2)is an artificial intelligence(AI)system developed by DeepMind that can predict three-dimensional(3D)structures of proteins from amino acid sequences with atomic-level accuracy.Protein structure predicti...AlphaFold2(AF2)is an artificial intelligence(AI)system developed by DeepMind that can predict three-dimensional(3D)structures of proteins from amino acid sequences with atomic-level accuracy.Protein structure prediction is one of the most challenging problems in computational biology and chemistry,and has puzzled scientists for 50 years.The advent of AF2 presents an unprecedented progress in protein structure prediction and has attracted much attention.Subsequent release of structures of more than 200 million proteins predicted by AF2 further aroused great enthusiasm in the science community,especially in the fields of biology and medicine.AF2 is thought to have a significant impact on structural biology and research areas that need protein structure information,such as drug discovery,protein design,prediction of protein function,et al.Though the time is not long since AF2 was developed,there are already quite a few application studies of AF2 in the fields of biology and medicine,with many of them having preliminarily proved the potential of AF2.To better understand AF2 and promote its applications,we will in this article summarize the principle and system architecture of AF2 as well as the recipe of its success,and particularly focus on reviewing its applications in the fields of biology and medicine.Limitations of current AF2 prediction will also be discussed.展开更多
Dear Editor,Pulmonary fibrosis(PF)is characterized by failed alveolar reepithelialization and fibroblast activation.1 A continuing struggle in the field has been how to diagnose and treat the disease early and effecti...Dear Editor,Pulmonary fibrosis(PF)is characterized by failed alveolar reepithelialization and fibroblast activation.1 A continuing struggle in the field has been how to diagnose and treat the disease early and effectively on the basis of shared pathogenetic mechanisms.Transforming growth factor-β(TGF-β)signaling and mitochondria are involved in PF pathogenesis.2 Mitochondrial respiratory chain complexes(RCCs)I,III,IV,and V comprise both nuclear DNA(nDNA)-and mitochondrial DNA(mtDNA)-encoded subunits,and their biogenesis depends on the cooperation of cytoplasmic(cyto)and mitochondrial(mito)translation(Supplementary Fig.1).3 To date,a correlation between PF and RCC biogenesis has not been reported.展开更多
With the rapid development of various omics and related technologies,as well as the revolutionary computing power upsurge,bioinformatics has ushered in unprecedented development opportunities.China’s bioinformatics r...With the rapid development of various omics and related technologies,as well as the revolutionary computing power upsurge,bioinformatics has ushered in unprecedented development opportunities.China’s bioinformatics research as a cohesive team effort continues to grow,and has also achieved many gratifying discoveries[1,2].The vigorous development of bioinformatics in China today is inseparable from the foundation and promotion of the older generation of scientists who pioneered in bioinformatics research.展开更多
Eukaryotic genomes undergo pervasive transcription,generating vast amounts of noncoding RNAs alongside protein-coding mRNAs[1].These noncoding RNAs,including small noncoding RNAs,long noncoding RNAs(lncRNAs),and circu...Eukaryotic genomes undergo pervasive transcription,generating vast amounts of noncoding RNAs alongside protein-coding mRNAs[1].These noncoding RNAs,including small noncoding RNAs,long noncoding RNAs(lncRNAs),and circular RNAs,have been shown to play critical roles in gene regulation,chromatin remodeling,assembly of membraneless organelles,and other essential biological processes.They function through a diverse range of mechanisms[2],[3],[4],[5].Dysregulation of noncoding RNAs contributes to human disease pathogenesis and affects plant development and stress response[6],[7],[8].Over the past decade,significant progress has been made in unraveling the functions of noncoding RNAs and elucidating the molecular mechanisms by which they operate.The involvement of noncoding RNAs in human disease pathogenesis and agronomic trait regulation has garnered increasing attention.展开更多
Accumulating evidence suggests that non-coding RNAs (ncRNAs) are both widespread and functionally important in many eukaryotic organisms. In this study, we employed a special size fractionation and cDNA library cons...Accumulating evidence suggests that non-coding RNAs (ncRNAs) are both widespread and functionally important in many eukaryotic organisms. In this study, we employed a special size fractionation and cDNA library construction method followed by 454 deep sequencing to systematically profile rice intermediate-size ncRNAs. Our analysis resulted in the identification of 1349 ncRNAs in total, including 754 novel ncRNAs of an unknown functional category. Chromosome distribution of all identified ncRNAs showed no strand bias, and displayed a pattern similar to that observed in protein-coding genes with few chromosome dependencies. More than half of the ncRNAs were centered around the plus-strand of the 5' and 3' termini of the coding regions. The majority of the novel ncRNAs were rice specific, while 78% of the small nucleolar RNAs (snoRNAs) were conserved. Tandem duplication drove the expansion of over half of the snoRNA gene families. Furthermore, 90% of the snoRNA candidates were shown to produce small RNAs between 20-30 nt, 80% of which were associated with ARGONAUT proteins generally, and AGOlb in particular. Overall, our findings provide a comprehensive view of an intermediate-size non-coding transcriptome in a monocot species, which will serve as a useful platform for an in-depth analysis of ncRNA functions.展开更多
ABSTRACT Recent advances in genome-wide techniques allowed the identification of thousands of non-coding RNAs with various sizes in eukaryotes, some of which have further been shown to serve important functions in man...ABSTRACT Recent advances in genome-wide techniques allowed the identification of thousands of non-coding RNAs with various sizes in eukaryotes, some of which have further been shown to serve important functions in many biologi- cal processes. However, in model plant Arabidopsis, novel intermediate-sized ncRNAs (im-ncRNAs) (50-300 nt) have very limited information. By using a modified isolation strategy combined with deep-sequencing technology, we identified 838 im-ncRNAs in Arabidopsis globally. More than half (58%) are new ncRNA species, mostly evolutionary divergent. Interestingly, annotated protein-coding genes with 5'-UTR-derived novel im-ncRNAs tend to be highly expressed. For intergenic im-ncRNAs, their average abundances were comparable to mRNAs in seedlings, but subsets exhibited signifi- cantly lower expression in senescing leaves. Further, intergenic im-ncRNAs were regulated by similar genetic and epige- netic mechanisms to those of protein-coding genes, and some showed developmentally regulated expression patterns. Large-scale reverse genetic screening showed that the down-regulation of a number of im-ncRNAs resulted in either obvious molecular changes or abnormal developmental phenotypes in vivo, indicating the functional importance of im-ncRNAs in plant growth and development. Together, our results demonstrate that novel Arabidopsis im-ncRNAs are developmentally regulated and functional components discovered in the transcriptome.展开更多
regulation of miRNA genes contributes to pathogenesis of a wide range of human diseases, including cancer. The TAR DNA binding protein 43 (TDP- 43), a RNAJDNA binding protein associated with neu- rodegeneration, is ...regulation of miRNA genes contributes to pathogenesis of a wide range of human diseases, including cancer. The TAR DNA binding protein 43 (TDP- 43), a RNAJDNA binding protein associated with neu- rodegeneration, is involved in miRNA biogenesis. Here, we systematically examined miRNAs regulated by TDP- 43 using RNA-Seq coupled with an siRNA-mediated knockdown approach. TDP-43 knockdown affected the expression of a number of miRNAs. In addition, TDP-43 down-regulation led to alterations in the patterns of dif- ferent isoforms of miRNAs (isomiRs) and miRNA arm selection, suggesting a previously unknown role of TDP- 43 in miRNA processing. A number of TDP-43 associ- ated miRNAs, and their candidate target genes, are associated with human cancers. Our data reveal highly complex roles of TDP-43 in regulating different miRNAs and their target genes. Our results suggest that TDP-43 may promote migration of lung cancer cells by regulat- ing miR-423-3p. In contrast, TDP-43 increases miR-500a- 3p expression and binds to the mature miR-500a-3p sequence. Reduced expression of miR-500a-3p is associated with poor survival of lung cancer patients,suggesting that TDP-43 may have a suppressive role in cancer by regulating miR-500a-3p. Cancer-associated genes LIF and PAPPA are possible targets of miR-500a- 3p. Our work suggests that TDP-43-regulated miRNAs may play multifaceted roles in the pathogenesis of cancer.展开更多
The central dogma states that genes encoded in the DNA should be first transcribed into messenger RNA(mRNA)and then translated into functional proteins(Crick,1970).This dogma has been written in numerous textbooks and...The central dogma states that genes encoded in the DNA should be first transcribed into messenger RNA(mRNA)and then translated into functional proteins(Crick,1970).This dogma has been written in numerous textbooks and learned by myriad students.However,along with the completion of the human genome project in June 2000,an astonishing fact was revealed:only 1.5%of the human genome encodes for proteins(Lander et al.,2001;Venter et al.,2001).This fact raised three fundamental questions:(i)why does the human genome have so few protein-coding genes?(ii)how to explain the apparent differences between humans and other species using the limited coding genes?(iii)what are the roles of the noncoding regions in our genome?展开更多
Small proteins specifically refer to proteins consisting of less than 100 amino acids translated from small open reading frames(s ORFs),which were usually missed in previous genome annotation.The significance of small...Small proteins specifically refer to proteins consisting of less than 100 amino acids translated from small open reading frames(s ORFs),which were usually missed in previous genome annotation.The significance of small proteins has been revealed in current years,along with the discovery of their diverse functions.However,systematic annotation of small proteins is still insufficient.Sm Prot was specially developed to provide valuable information on small proteins for scientific community.Here we present the update of Sm Prot,which emphasizes reliability of translated s ORFs,genetic variants in translated s ORFs,disease-specific s ORF translation events or sequences,and remarkably increased data volume.More components such as non-ATG translation initiation,function,and new sources are also included.Sm Prot incorporated638,958 unique small proteins curated from 3,165,229 primary records,which were computationally predicted from 419 ribosome profiling(Ribo-seq)datasets or collected from literature and other sources from 370 cell lines or tissues in 8 species(Homo sapiens,Mus musculus,Rattus norvegicus,Drosophila melanogaster,Danio rerio,Saccharomyces cerevisiae,Caenorhabditis elegans,and Escherichia coli).In addition,small protein families identified from human microbiomes were also collected.All datasets in Sm Prot are free to access,and available for browse,search,and bulk downloads at http://bigdata.ibp.ac.cn/SmProt/.展开更多
Dear Editor,Living systems such as Caenorhabditis elegans are organized by interaction networks of biopolymer and small mol-ecules.The molecular engineering of specifi c targets to monitor and control these biological...Dear Editor,Living systems such as Caenorhabditis elegans are organized by interaction networks of biopolymer and small mol-ecules.The molecular engineering of specifi c targets to monitor and control these biological molecules has led to significant advances in our understand-ing of how biological systems are organ-ized,maintain themselves and disin-tegrate(Prescher and Bertozzi,2005).For instance,green fl uorescent protein(GFP)derived from jellyfi sh Aequorea Victoria helps us to accurately locate and observe proteins over long term periods.In addition,many novel LOV(light,oxy-gen,or voltage)domains such as LOV2-Jαand Vivid offer the possibility of cell-based motility management and tissue-based gene expression control(Wu et al.,2009;Wang et al.,2012).展开更多
Unlike classic skarn-type scheelite deposits directly acquiring sufficient Ca2+ from surrounding limestones, all of the scheelite orebodies of the Shangfang tungsten(W) deposit occur mainly in amphibolite, and this pr...Unlike classic skarn-type scheelite deposits directly acquiring sufficient Ca2+ from surrounding limestones, all of the scheelite orebodies of the Shangfang tungsten(W) deposit occur mainly in amphibolite, and this provides a new perspective on the mineralization mechanism of W deposits. The ability of hydrothermal scheelite(CaWO4) to bind REE3+ in their Ca2+ crystal lattices makes it a useful mineral for tracing fluid-rock interactions in hydrothermal mineralization systems. In this study, the REE compositions of scheelite and some silicate minerals were measured systematically in-situ by laser ablation inductively coupled plasma mass spectrometry(LA-ICP-MS) to assess the extent of fluid-rock interactions for the Late Mesozoic quartz-vein-type Shangfang W deposits. According to the variations in CaO and REE among scheelite and silicate minerals, the amphibole and actinolite in amphibolite may be able to release large amounts of Ca2+ and REE3+ into the ore-forming fluids during chlorite alteration, which is critical for scheelite precipitation. Furthermore, an improved batch crystallization model was adopted for simulating the process of scheelite precipitation and fluid evolution. The results of both the in-situ measurements and model calculations demonstrate that the precipitation of early-stage scheelite with medium rare-earth elements(MREE)-rich and [Eu/Eu*]N<1. The early-stage scheelite would consume more MREE than LREE and HREE of fluid, which will gradually produce residual fluids with strong MREE-depletion and [Eu/Eu*]N>1. Even though the partition coefficient of REE is constant, the later-stage scheelite will also inherit a certain degree of MREE-depletion and [Eu/Eu*]N future from the residual fluids. As a common mineral, sheelite forms in various types of hydrothermal ore deposits(e.g., tungsten and gold deposits). Hence, the improved batch crystallization model is also possible for obtaining detailed information regarding fluid evolution for other types of hydrothermal deposits. The results from model calculations also illustrate that the Eu anomalies of scheelite are not an effective index correlated to oxygen fugacity of fluids but rather are dominantly controlled by the continuous precipitation of scheelite.展开更多
Estimating taxonomic content constitutes a key problem in metagenomic sequencing data analysis.However,extracting such content from high-throughput data of next-generation sequencing is very time-consuming with the cu...Estimating taxonomic content constitutes a key problem in metagenomic sequencing data analysis.However,extracting such content from high-throughput data of next-generation sequencing is very time-consuming with the currently available software.Here,we present CloudLCA,a parallel LCA algorithm that significantly improves the efficiency of determining taxonomic composition in metagenomic data analysis.Results show that CloudLCA(1)has a running time nearly linear with the increase of dataset magnitude,(2)displays linear speedup as the number of processors grows,especially for large datasets,and(3)reaches a speed of nearly 215 million reads each minute on a cluster with ten thin nodes.In comparison with MEGAN,a well-known metagenome analyzer,the speed of CloudLCA is up to 5 more times faster,and its peak memory usage is approximately 18.5%that of MEGAN,running on a fat node.CloudLCA can be run on one multiprocessor node or a cluster.It is expected to be part of MEGAN to accelerate analyzing reads,with the same output generated as MEGAN,which can be import into MEGAN in a direct way to finish the following analysis.Moreover,CloudLCA is a universal solution for finding the lowest common ancestor,and it can be applied in other fields requiring an LCA algorithm.展开更多
This article records the author’s experience in participating in the early human genome and bioinformatics research in China,especially the non-coding sequence of the genome.It also introduced the beginning of human ...This article records the author’s experience in participating in the early human genome and bioinformatics research in China,especially the non-coding sequence of the genome.It also introduced the beginning of human genome research in china,including the experts and teams involved in the International Human Genome Project All the progress of bioinformatics originates from the inheritance of theoretical biology and the layout of philosophers in china.展开更多
The Cys-rich domain, core region and basic domain are highly conserved and very important to the trans-activation activity of HIV-1 Tat trans-activator. The three-dimensional structures of 6 mutants of HIV-1 Tat prote...The Cys-rich domain, core region and basic domain are highly conserved and very important to the trans-activation activity of HIV-1 Tat trans-activator. The three-dimensional structures of 6 mutants of HIV-1 Tat protein were constructed with the methods of molecular dynamics simulation. The variations of the structures of the mutants have been analyzed and the factors that led to abolishment of trans-activation activity have been discussed.展开更多
With the development of computational methods and RNA sequencing technology for assembling the transcriptome, it is becoming clear that the mammal genome is pervasively tran- scribed, and large numbers of long noncodi...With the development of computational methods and RNA sequencing technology for assembling the transcriptome, it is becoming clear that the mammal genome is pervasively tran- scribed, and large numbers of long noncoding RNAs (lncRNAs) composing the major part of the transcriptome have been identified (Ravasi et al., 2006; Birney et al., 2007;展开更多
In recent years, large numbers of non-coding RNAs (ncR- NAs) have been identified in C. elegans but their functions are still not well studied. In C. elegans, CEP-1 is the sole homolog of the p53 family of genes. In...In recent years, large numbers of non-coding RNAs (ncR- NAs) have been identified in C. elegans but their functions are still not well studied. In C. elegans, CEP-1 is the sole homolog of the p53 family of genes. In order to obtain transcription profiles of ncRNAs regulated by CEP-1 under normal and UV stressed conditions, we applied the 'not-so- random' hexamers priming strategy to RNA sequencing in C. elegans, This NSR-seq strategy efficiently depleted rRNA transcripts from the samples and showed high technical replicability. We identified more than 1,000 ncR- NAs whose apparent expression was repressed by CEP-1, while around 200 were activated. Around 40% of the CEP-1 activated ncRNAs promoters contain a putative CEP-1- binding site. CEP-1 regulated ncRNAs were frequently clustered and concentrated on the X chromosome. These results indicate that numerous ncRNAs are involved in CEP-1 transcriptional network and that these are espe- cially enriched on the X chromosome in C. elegans.展开更多
Although only about 2%of the human genome has proved to be protein-coding genes,recent advances in genome wide analysis have revealed that the majority of the genome is transcribed,mainly from noncoding segments that ...Although only about 2%of the human genome has proved to be protein-coding genes,recent advances in genome wide analysis have revealed that the majority of the genome is transcribed,mainly from noncoding segments that were once considered"junk sequences"or"dark matters"(Liu et al.,2011a;Zhang et al.,2014b). In addition to the well-characterized housekeeping non- coding RNAs (ncRNAs) (tRNA, rRNA, small nuclear RNA and small nucleolar RNAs) and some small regulatory ncRNAs (microRNAs and small interfering RNAs), the transcriptome of mammals could also pervasively have been transcribed long noncoding RNAs (lncRNAs, at least 200 nt) (Rinn and Chang, 2012; Xie et al., 2012).展开更多
Since the launching of the human genome sequencing project in the 1990s,genomic research has already achieved definite results.At the beginning of the present century,the complete genomes of several model organisms ha...Since the launching of the human genome sequencing project in the 1990s,genomic research has already achieved definite results.At the beginning of the present century,the complete genomes of several model organisms have already been sequenced,including a number of prokaryote microorganisms and the eukaryotes yeast(Saccharomyces cerevisiae),nematode(C.elegans),fruit fly(Drosophila melanogaster)and thale cress(Arabidopsis thaliana)as well as the major part of the human genome.These achievements signified that a new era of data mining and analysis on the human genome had commenced.The language of human genetics would gradually be read and understood,and the genetic information underlying metabolism,development,differentiation and evolution would progressively become known to mankind.Large amounts of data are already accumulating,but at present many of the rules that should guide the understanding of this information are yet unknown.Bioinformatics research is thus not only becoming more important,but is also faced with severe challenges as well as great opportunities.展开更多
A starting point of curating bioinformatic resources for the public is marked by the establishment of the US National Center for Biotechnology Information(NCBI)in 1988[1].One of its many purposes is certainly to ech...A starting point of curating bioinformatic resources for the public is marked by the establishment of the US National Center for Biotechnology Information(NCBI)in 1988[1].One of its many purposes is certainly to echo the initiative of the Human Genome Project(HGP)––when two landmark reports were published at the same time:‘‘Mapping and Sequencing the Human Genome’’by the National Research Council[2]and‘‘Mapping Our Genes––The Genome Project:How Big,How Fast?’’by the US Congress[3].展开更多
基金supported by the National Key Research and Development Program of China(82030030)the 1·3·5 Project for Disciplines of Excellence,West China Hospital+1 种基金Sichuan University(ZYJC20002)to H.YuanSichuan Science and Technology Program(2022YFS0211)to K.Wu.
文摘Previous studies on genetic diseases predominantly focused on protein-coding variations, overlooking the vast noncoding regions in the human genome. The development of high-throughput sequencing technologies and functional genomics tools has enabled the systematic identification of functional noncoding variants. These variants can impact gene expression, regulation, and chromatin conformation, thereby contributing to disease pathogenesis. Understanding the mechanisms that underlie the impact of noncoding variants on genetic diseases is indispensable for the development of precisely targeted therapies and the implementation of personalized medicine strategies. The intricacies of noncoding regions introduce a multitude of challenges and research opportunities. In this review, we introduce a spectrum of noncoding variants involved in genetic diseases, along with research strategies and advanced technologies for their precise identification and in-depth understanding of the complexity of the noncoding genome. We will delve into the research challenges and propose potential solutions for unraveling the genetic basis of rare and complex diseases.
基金the National Key R&D Program of China(2021YFC2500203)Beijing Natural Science Foundation Haidian Origination and Innovation Joint Fund(L222007)+1 种基金the National Natural Science Foundation of China(32070670)Innovation Project for Institute of Computing Technology,CAS.(E161080).
文摘AlphaFold2(AF2)is an artificial intelligence(AI)system developed by DeepMind that can predict three-dimensional(3D)structures of proteins from amino acid sequences with atomic-level accuracy.Protein structure prediction is one of the most challenging problems in computational biology and chemistry,and has puzzled scientists for 50 years.The advent of AF2 presents an unprecedented progress in protein structure prediction and has attracted much attention.Subsequent release of structures of more than 200 million proteins predicted by AF2 further aroused great enthusiasm in the science community,especially in the fields of biology and medicine.AF2 is thought to have a significant impact on structural biology and research areas that need protein structure information,such as drug discovery,protein design,prediction of protein function,et al.Though the time is not long since AF2 was developed,there are already quite a few application studies of AF2 in the fields of biology and medicine,with many of them having preliminarily proved the potential of AF2.To better understand AF2 and promote its applications,we will in this article summarize the principle and system architecture of AF2 as well as the recipe of its success,and particularly focus on reviewing its applications in the fields of biology and medicine.Limitations of current AF2 prediction will also be discussed.
基金the National Key R&D Program of China(2018YFA0106900,2018YFA0106901)the National Natural Science Foundation of China(NSFC)(31830043).
文摘Dear Editor,Pulmonary fibrosis(PF)is characterized by failed alveolar reepithelialization and fibroblast activation.1 A continuing struggle in the field has been how to diagnose and treat the disease early and effectively on the basis of shared pathogenetic mechanisms.Transforming growth factor-β(TGF-β)signaling and mitochondria are involved in PF pathogenesis.2 Mitochondrial respiratory chain complexes(RCCs)I,III,IV,and V comprise both nuclear DNA(nDNA)-and mitochondrial DNA(mtDNA)-encoded subunits,and their biogenesis depends on the cooperation of cytoplasmic(cyto)and mitochondrial(mito)translation(Supplementary Fig.1).3 To date,a correlation between PF and RCC biogenesis has not been reported.
文摘With the rapid development of various omics and related technologies,as well as the revolutionary computing power upsurge,bioinformatics has ushered in unprecedented development opportunities.China’s bioinformatics research as a cohesive team effort continues to grow,and has also achieved many gratifying discoveries[1,2].The vigorous development of bioinformatics in China today is inseparable from the foundation and promotion of the older generation of scientists who pioneered in bioinformatics research.
文摘Eukaryotic genomes undergo pervasive transcription,generating vast amounts of noncoding RNAs alongside protein-coding mRNAs[1].These noncoding RNAs,including small noncoding RNAs,long noncoding RNAs(lncRNAs),and circular RNAs,have been shown to play critical roles in gene regulation,chromatin remodeling,assembly of membraneless organelles,and other essential biological processes.They function through a diverse range of mechanisms[2],[3],[4],[5].Dysregulation of noncoding RNAs contributes to human disease pathogenesis and affects plant development and stress response[6],[7],[8].Over the past decade,significant progress has been made in unraveling the functions of noncoding RNAs and elucidating the molecular mechanisms by which they operate.The involvement of noncoding RNAs in human disease pathogenesis and agronomic trait regulation has garnered increasing attention.
基金This work was supported by grants from National Basic Research Program of China (973 Program) (2012CB910900) National Natural Science Foundation of China (31171156, U1031001)+1 种基金the Ministry of Science and Technology of China (2011CB100101, 2009DFB30030, 2008AA022301) and the Ministry of Agriculture of China (2008ZX08012-005, 2009ZX08012-021 B).We thank Dr. Ning Wei and Abigail Coplin for reading and commenting this manuscript. No conflict of interest declared.
文摘Accumulating evidence suggests that non-coding RNAs (ncRNAs) are both widespread and functionally important in many eukaryotic organisms. In this study, we employed a special size fractionation and cDNA library construction method followed by 454 deep sequencing to systematically profile rice intermediate-size ncRNAs. Our analysis resulted in the identification of 1349 ncRNAs in total, including 754 novel ncRNAs of an unknown functional category. Chromosome distribution of all identified ncRNAs showed no strand bias, and displayed a pattern similar to that observed in protein-coding genes with few chromosome dependencies. More than half of the ncRNAs were centered around the plus-strand of the 5' and 3' termini of the coding regions. The majority of the novel ncRNAs were rice specific, while 78% of the small nucleolar RNAs (snoRNAs) were conserved. Tandem duplication drove the expansion of over half of the snoRNA gene families. Furthermore, 90% of the snoRNA candidates were shown to produce small RNAs between 20-30 nt, 80% of which were associated with ARGONAUT proteins generally, and AGOlb in particular. Overall, our findings provide a comprehensive view of an intermediate-size non-coding transcriptome in a monocot species, which will serve as a useful platform for an in-depth analysis of ncRNA functions.
基金grants from the National Basic Research Program of China (973 Program),the National Natural Science Foundation of China,in part by the Peking-Tsinghua Center for Life Sciences and a grant from the Next-Generation BioGreen 21 Program,Rural Development Administration,Republic of Korea
文摘ABSTRACT Recent advances in genome-wide techniques allowed the identification of thousands of non-coding RNAs with various sizes in eukaryotes, some of which have further been shown to serve important functions in many biologi- cal processes. However, in model plant Arabidopsis, novel intermediate-sized ncRNAs (im-ncRNAs) (50-300 nt) have very limited information. By using a modified isolation strategy combined with deep-sequencing technology, we identified 838 im-ncRNAs in Arabidopsis globally. More than half (58%) are new ncRNA species, mostly evolutionary divergent. Interestingly, annotated protein-coding genes with 5'-UTR-derived novel im-ncRNAs tend to be highly expressed. For intergenic im-ncRNAs, their average abundances were comparable to mRNAs in seedlings, but subsets exhibited signifi- cantly lower expression in senescing leaves. Further, intergenic im-ncRNAs were regulated by similar genetic and epige- netic mechanisms to those of protein-coding genes, and some showed developmentally regulated expression patterns. Large-scale reverse genetic screening showed that the down-regulation of a number of im-ncRNAs resulted in either obvious molecular changes or abnormal developmental phenotypes in vivo, indicating the functional importance of im-ncRNAs in plant growth and development. Together, our results demonstrate that novel Arabidopsis im-ncRNAs are developmentally regulated and functional components discovered in the transcriptome.
基金We thank Geir SkogerbФ for careful reading of the manuscript and valuable suggestions. This work was supported by National Natural Science Foundation of China (Grant Nos. 31520103905 and 31701122) and National High Technology Research and Development Program ("863" Program)of China (2014AA021502), MC, LZ, JL are supported by grants from the the National Basic Research Program (973 Program) (No. 2013CB917803) and the National Natural Science Foundation of China (Grant No, 91132710). RK issupported by National Natural Science Foundation of China (Grant No. 31501133). WM is supported by NIH (F30 NS090893). JYW is supported by NIH (R01CA175360).
文摘regulation of miRNA genes contributes to pathogenesis of a wide range of human diseases, including cancer. The TAR DNA binding protein 43 (TDP- 43), a RNAJDNA binding protein associated with neu- rodegeneration, is involved in miRNA biogenesis. Here, we systematically examined miRNAs regulated by TDP- 43 using RNA-Seq coupled with an siRNA-mediated knockdown approach. TDP-43 knockdown affected the expression of a number of miRNAs. In addition, TDP-43 down-regulation led to alterations in the patterns of dif- ferent isoforms of miRNAs (isomiRs) and miRNA arm selection, suggesting a previously unknown role of TDP- 43 in miRNA processing. A number of TDP-43 associ- ated miRNAs, and their candidate target genes, are associated with human cancers. Our data reveal highly complex roles of TDP-43 in regulating different miRNAs and their target genes. Our results suggest that TDP-43 may promote migration of lung cancer cells by regulat- ing miR-423-3p. In contrast, TDP-43 increases miR-500a- 3p expression and binds to the mature miR-500a-3p sequence. Reduced expression of miR-500a-3p is associated with poor survival of lung cancer patients,suggesting that TDP-43 may have a suppressive role in cancer by regulating miR-500a-3p. Cancer-associated genes LIF and PAPPA are possible targets of miR-500a- 3p. Our work suggests that TDP-43-regulated miRNAs may play multifaceted roles in the pathogenesis of cancer.
基金This work was supported by the National Natural Science Foundation of China(91940000).We thank Drs.Xiaorong Zhang and Jing Hu for critical reading of this manuscript.We are sorry for the excellent works supported by the Major Research Program that are not highlighted in this comment due to space limitations.
文摘The central dogma states that genes encoded in the DNA should be first transcribed into messenger RNA(mRNA)and then translated into functional proteins(Crick,1970).This dogma has been written in numerous textbooks and learned by myriad students.However,along with the completion of the human genome project in June 2000,an astonishing fact was revealed:only 1.5%of the human genome encodes for proteins(Lander et al.,2001;Venter et al.,2001).This fact raised three fundamental questions:(i)why does the human genome have so few protein-coding genes?(ii)how to explain the apparent differences between humans and other species using the limited coding genes?(iii)what are the roles of the noncoding regions in our genome?
基金supported by the National Key R&D Program of China(Grant No.2016YFC0901702)National Natural Science Foundation of China(Grant Nos.81902519,91940306,31871294,31701117,and 31970647)+4 种基金the National Key R&D Program of China(Grant Nos.2017YFC0907503,2016YFC0901002,and 2018YFA0106901)the Strategic Priority Research Program of Chinese Academy of Sciences(Grant No.XDB38040300)the 13th Five-year Informatization Plan of Chinese Academy of Sciences(Grant No.XXH13505-05)Special Investigation on Science and Technology Basic Resources,Ministry of Science and Technology,China(Grant No.2019FY100102)the National Genomics Data Center,China。
文摘Small proteins specifically refer to proteins consisting of less than 100 amino acids translated from small open reading frames(s ORFs),which were usually missed in previous genome annotation.The significance of small proteins has been revealed in current years,along with the discovery of their diverse functions.However,systematic annotation of small proteins is still insufficient.Sm Prot was specially developed to provide valuable information on small proteins for scientific community.Here we present the update of Sm Prot,which emphasizes reliability of translated s ORFs,genetic variants in translated s ORFs,disease-specific s ORF translation events or sequences,and remarkably increased data volume.More components such as non-ATG translation initiation,function,and new sources are also included.Sm Prot incorporated638,958 unique small proteins curated from 3,165,229 primary records,which were computationally predicted from 419 ribosome profiling(Ribo-seq)datasets or collected from literature and other sources from 370 cell lines or tissues in 8 species(Homo sapiens,Mus musculus,Rattus norvegicus,Drosophila melanogaster,Danio rerio,Saccharomyces cerevisiae,Caenorhabditis elegans,and Escherichia coli).In addition,small protein families identified from human microbiomes were also collected.All datasets in Sm Prot are free to access,and available for browse,search,and bulk downloads at http://bigdata.ibp.ac.cn/SmProt/.
基金This project was supported by the National Basic Research Program(973 Program)(Nos.2010CB912303 and 2010CB833701)the National Natural Science Foundation of China(Grant Nos.90913022 and 31170818)the project from Chinese Academy of Sciences(KSCX2-EW-Q-11).
文摘Dear Editor,Living systems such as Caenorhabditis elegans are organized by interaction networks of biopolymer and small mol-ecules.The molecular engineering of specifi c targets to monitor and control these biological molecules has led to significant advances in our understand-ing of how biological systems are organ-ized,maintain themselves and disin-tegrate(Prescher and Bertozzi,2005).For instance,green fl uorescent protein(GFP)derived from jellyfi sh Aequorea Victoria helps us to accurately locate and observe proteins over long term periods.In addition,many novel LOV(light,oxy-gen,or voltage)domains such as LOV2-Jαand Vivid offer the possibility of cell-based motility management and tissue-based gene expression control(Wu et al.,2009;Wang et al.,2012).
基金financially supported by the National Science Foundation of China (No. 41803012)the China Postdoctoral Science Foundation (No. 2017M622546)。
文摘Unlike classic skarn-type scheelite deposits directly acquiring sufficient Ca2+ from surrounding limestones, all of the scheelite orebodies of the Shangfang tungsten(W) deposit occur mainly in amphibolite, and this provides a new perspective on the mineralization mechanism of W deposits. The ability of hydrothermal scheelite(CaWO4) to bind REE3+ in their Ca2+ crystal lattices makes it a useful mineral for tracing fluid-rock interactions in hydrothermal mineralization systems. In this study, the REE compositions of scheelite and some silicate minerals were measured systematically in-situ by laser ablation inductively coupled plasma mass spectrometry(LA-ICP-MS) to assess the extent of fluid-rock interactions for the Late Mesozoic quartz-vein-type Shangfang W deposits. According to the variations in CaO and REE among scheelite and silicate minerals, the amphibole and actinolite in amphibolite may be able to release large amounts of Ca2+ and REE3+ into the ore-forming fluids during chlorite alteration, which is critical for scheelite precipitation. Furthermore, an improved batch crystallization model was adopted for simulating the process of scheelite precipitation and fluid evolution. The results of both the in-situ measurements and model calculations demonstrate that the precipitation of early-stage scheelite with medium rare-earth elements(MREE)-rich and [Eu/Eu*]N<1. The early-stage scheelite would consume more MREE than LREE and HREE of fluid, which will gradually produce residual fluids with strong MREE-depletion and [Eu/Eu*]N>1. Even though the partition coefficient of REE is constant, the later-stage scheelite will also inherit a certain degree of MREE-depletion and [Eu/Eu*]N future from the residual fluids. As a common mineral, sheelite forms in various types of hydrothermal ore deposits(e.g., tungsten and gold deposits). Hence, the improved batch crystallization model is also possible for obtaining detailed information regarding fluid evolution for other types of hydrothermal deposits. The results from model calculations also illustrate that the Eu anomalies of scheelite are not an effective index correlated to oxygen fugacity of fluids but rather are dominantly controlled by the continuous precipitation of scheelite.
文摘Estimating taxonomic content constitutes a key problem in metagenomic sequencing data analysis.However,extracting such content from high-throughput data of next-generation sequencing is very time-consuming with the currently available software.Here,we present CloudLCA,a parallel LCA algorithm that significantly improves the efficiency of determining taxonomic composition in metagenomic data analysis.Results show that CloudLCA(1)has a running time nearly linear with the increase of dataset magnitude,(2)displays linear speedup as the number of processors grows,especially for large datasets,and(3)reaches a speed of nearly 215 million reads each minute on a cluster with ten thin nodes.In comparison with MEGAN,a well-known metagenome analyzer,the speed of CloudLCA is up to 5 more times faster,and its peak memory usage is approximately 18.5%that of MEGAN,running on a fat node.CloudLCA can be run on one multiprocessor node or a cluster.It is expected to be part of MEGAN to accelerate analyzing reads,with the same output generated as MEGAN,which can be import into MEGAN in a direct way to finish the following analysis.Moreover,CloudLCA is a universal solution for finding the lowest common ancestor,and it can be applied in other fields requiring an LCA algorithm.
文摘This article records the author’s experience in participating in the early human genome and bioinformatics research in China,especially the non-coding sequence of the genome.It also introduced the beginning of human genome research in china,including the experts and teams involved in the International Human Genome Project All the progress of bioinformatics originates from the inheritance of theoretical biology and the layout of philosophers in china.
文摘The Cys-rich domain, core region and basic domain are highly conserved and very important to the trans-activation activity of HIV-1 Tat trans-activator. The three-dimensional structures of 6 mutants of HIV-1 Tat protein were constructed with the methods of molecular dynamics simulation. The variations of the structures of the mutants have been analyzed and the factors that led to abolishment of trans-activation activity have been discussed.
基金supported by the grants from the National Natural Science Foundation of China (No. 31300889)
文摘With the development of computational methods and RNA sequencing technology for assembling the transcriptome, it is becoming clear that the mammal genome is pervasively tran- scribed, and large numbers of long noncoding RNAs (lncRNAs) composing the major part of the transcriptome have been identified (Ravasi et al., 2006; Birney et al., 2007;
文摘In recent years, large numbers of non-coding RNAs (ncR- NAs) have been identified in C. elegans but their functions are still not well studied. In C. elegans, CEP-1 is the sole homolog of the p53 family of genes. In order to obtain transcription profiles of ncRNAs regulated by CEP-1 under normal and UV stressed conditions, we applied the 'not-so- random' hexamers priming strategy to RNA sequencing in C. elegans, This NSR-seq strategy efficiently depleted rRNA transcripts from the samples and showed high technical replicability. We identified more than 1,000 ncR- NAs whose apparent expression was repressed by CEP-1, while around 200 were activated. Around 40% of the CEP-1 activated ncRNAs promoters contain a putative CEP-1- binding site. CEP-1 regulated ncRNAs were frequently clustered and concentrated on the X chromosome. These results indicate that numerous ncRNAs are involved in CEP-1 transcriptional network and that these are espe- cially enriched on the X chromosome in C. elegans.
基金supported by the grants from the National Key Research and Development Plan (2016YFA0100702,2016YFC0902502)the National Key Basic Research Program (973 Program) (Nos.2013CB531304 and 2011CBA01104)+1 种基金the National Sciences Foundation of China (Nos. 31301152,31670789,31671316,31370789 and 30825023)CAMS Innovation Fund for Medical Sciences (CIFMS,2016-I2M-2-001,2016-I2M-1-001,2016-I2M-1-004)
文摘Although only about 2%of the human genome has proved to be protein-coding genes,recent advances in genome wide analysis have revealed that the majority of the genome is transcribed,mainly from noncoding segments that were once considered"junk sequences"or"dark matters"(Liu et al.,2011a;Zhang et al.,2014b). In addition to the well-characterized housekeeping non- coding RNAs (ncRNAs) (tRNA, rRNA, small nuclear RNA and small nucleolar RNAs) and some small regulatory ncRNAs (microRNAs and small interfering RNAs), the transcriptome of mammals could also pervasively have been transcribed long noncoding RNAs (lncRNAs, at least 200 nt) (Rinn and Chang, 2012; Xie et al., 2012).
文摘Since the launching of the human genome sequencing project in the 1990s,genomic research has already achieved definite results.At the beginning of the present century,the complete genomes of several model organisms have already been sequenced,including a number of prokaryote microorganisms and the eukaryotes yeast(Saccharomyces cerevisiae),nematode(C.elegans),fruit fly(Drosophila melanogaster)and thale cress(Arabidopsis thaliana)as well as the major part of the human genome.These achievements signified that a new era of data mining and analysis on the human genome had commenced.The language of human genetics would gradually be read and understood,and the genetic information underlying metabolism,development,differentiation and evolution would progressively become known to mankind.Large amounts of data are already accumulating,but at present many of the rules that should guide the understanding of this information are yet unknown.Bioinformatics research is thus not only becoming more important,but is also faced with severe challenges as well as great opportunities.
基金supported by the National High-tech R&D Program of China (863 Program Grant Nos: 2012AA020402 and 2012AA02A202)
文摘A starting point of curating bioinformatic resources for the public is marked by the establishment of the US National Center for Biotechnology Information(NCBI)in 1988[1].One of its many purposes is certainly to echo the initiative of the Human Genome Project(HGP)––when two landmark reports were published at the same time:‘‘Mapping and Sequencing the Human Genome’’by the National Research Council[2]and‘‘Mapping Our Genes––The Genome Project:How Big,How Fast?’’by the US Congress[3].