小开放阅读框(small open reading frame,sORF)一般指基因组中能够编码长度在100个氨基酸左右或以内短肽的开放阅读框。它们广泛存在于植物基因组,却因编码短肽而常被基因组注释忽视。随着翻译组学和蛋白质组学测序技术的发展,具有翻译...小开放阅读框(small open reading frame,sORF)一般指基因组中能够编码长度在100个氨基酸左右或以内短肽的开放阅读框。它们广泛存在于植物基因组,却因编码短肽而常被基因组注释忽视。随着翻译组学和蛋白质组学测序技术的发展,具有翻译活性的sORF被证实广泛存在于植物基因组,且参与植物生长发育等重要过程的调控。该文归纳了近些年来植物领域sORF的一些研究进展,主要包括sORF的来源与分类、信息学预测方法和生物学功能等,并基于此对植物sORF未来的研究方向进行了展望。展开更多
Small proteins specifically refer to proteins consisting of less than 100 amino acids translated from small open reading frames(s ORFs),which were usually missed in previous genome annotation.The significance of small...Small proteins specifically refer to proteins consisting of less than 100 amino acids translated from small open reading frames(s ORFs),which were usually missed in previous genome annotation.The significance of small proteins has been revealed in current years,along with the discovery of their diverse functions.However,systematic annotation of small proteins is still insufficient.Sm Prot was specially developed to provide valuable information on small proteins for scientific community.Here we present the update of Sm Prot,which emphasizes reliability of translated s ORFs,genetic variants in translated s ORFs,disease-specific s ORF translation events or sequences,and remarkably increased data volume.More components such as non-ATG translation initiation,function,and new sources are also included.Sm Prot incorporated638,958 unique small proteins curated from 3,165,229 primary records,which were computationally predicted from 419 ribosome profiling(Ribo-seq)datasets or collected from literature and other sources from 370 cell lines or tissues in 8 species(Homo sapiens,Mus musculus,Rattus norvegicus,Drosophila melanogaster,Danio rerio,Saccharomyces cerevisiae,Caenorhabditis elegans,and Escherichia coli).In addition,small protein families identified from human microbiomes were also collected.All datasets in Sm Prot are free to access,and available for browse,search,and bulk downloads at http://bigdata.ibp.ac.cn/SmProt/.展开更多
小开放阅读框(small open reading frame,sORF)广泛存在于不同生物基因组中,由于其序列短,以及编码的产物小蛋白(small protein,或称微蛋白;microprotein或迷你蛋白miniprotein)检测困难等原因,小开放阅读框长期未得到充分注释和研究。...小开放阅读框(small open reading frame,sORF)广泛存在于不同生物基因组中,由于其序列短,以及编码的产物小蛋白(small protein,或称微蛋白;microprotein或迷你蛋白miniprotein)检测困难等原因,小开放阅读框长期未得到充分注释和研究。近年来,随着高通量测序、翻译组和质谱分析等技术的不断发展,在不同生物中发现大量新的小开放阅读框,其编码的小蛋白及介导的翻译调控已应用于药物开发及植物抗病机理等研究。但是,目前对微生物的小开放阅读框相关研究和应用还相对有限。本文综述了小开放阅读框编码产物小蛋白的发现和鉴定,以及上游开放阅读框(upstream open reading frame,uORF)对mRNA翻译调控等最新研究进展,重点介绍了微生物基因组中小开放阅读框的鉴定和功能研究进展,为深入认识微生物中小开放阅读框的功能和作用机制,以及植物和动物等高等其他生物的小蛋白和翻译调控相关研究提供参考。展开更多
With the development of modern sequencing techniques and bioinformatics, genomes that were once thought to be noncoding have been found to encode abundant functional micropeptides(miPs), a kind of small polypeptides. ...With the development of modern sequencing techniques and bioinformatics, genomes that were once thought to be noncoding have been found to encode abundant functional micropeptides(miPs), a kind of small polypeptides. Although miPs are difficult to analyze and identify, a number of studies have begun to focus on them. More and more miPs have been revealed as essential for energy metabolism homeostasis, immune regulation, and tumor growth and development. Many reports have shown that miPs are especially essential for regulating glucose and lipid metabolism and regulating mitochondrial function.MiPs are also involved in the progression of related diseases. This paper reviews the sources and identification of miPs, as well as the functional significance of miPs for metabolism-related diseases, with the aim of revealing their potential clinical applications.展开更多
Non-conventional peptides(NCPs),which include small open reading frame-encoded peptides,play critical roles in fundamental biological processes.In this study,we developed an integrated peptidogenomic pipeline using hi...Non-conventional peptides(NCPs),which include small open reading frame-encoded peptides,play critical roles in fundamental biological processes.In this study,we developed an integrated peptidogenomic pipeline using high-throughput mass spectra to probe a customized six-frame translation database and applied it to large-scale identification of NCPs in plants.A total of 1993 and 1860 NCPs were unambiguously identified in maize and Arabidopsis,respectively.These NCPs showed distinct characteristics compared with conventional peptides and were derived from introns,3′UTRs,5′UTRs,junctions,and intergenic regions.Furthermore,our results showed that translation events in unannotated transcripts occur more broadly than previously thought.In addition,we found that dozens of maize NCPs are enriched within regions associated with phenotypic variations and domestication selection,indicating that they potentially are involved in genetic regulation of complex traits and domestication in maize.Taken together,our study developed an integrated peptidogenomic pipeline for large-scale identification of NCPs in plants,which would facilitate global characterization of NCPs from other plants.The identification of large-scale NCPs in both monocot(maize)and dicot(Arabidopsis)plants indicates that a large portion of plant genome can be translated into biologically functional molecules,which has important implications for functional genomic studies.展开更多
文摘小开放阅读框(small open reading frame,sORF)一般指基因组中能够编码长度在100个氨基酸左右或以内短肽的开放阅读框。它们广泛存在于植物基因组,却因编码短肽而常被基因组注释忽视。随着翻译组学和蛋白质组学测序技术的发展,具有翻译活性的sORF被证实广泛存在于植物基因组,且参与植物生长发育等重要过程的调控。该文归纳了近些年来植物领域sORF的一些研究进展,主要包括sORF的来源与分类、信息学预测方法和生物学功能等,并基于此对植物sORF未来的研究方向进行了展望。
基金supported by the National Key R&D Program of China(Grant No.2016YFC0901702)National Natural Science Foundation of China(Grant Nos.81902519,91940306,31871294,31701117,and 31970647)+4 种基金the National Key R&D Program of China(Grant Nos.2017YFC0907503,2016YFC0901002,and 2018YFA0106901)the Strategic Priority Research Program of Chinese Academy of Sciences(Grant No.XDB38040300)the 13th Five-year Informatization Plan of Chinese Academy of Sciences(Grant No.XXH13505-05)Special Investigation on Science and Technology Basic Resources,Ministry of Science and Technology,China(Grant No.2019FY100102)the National Genomics Data Center,China。
文摘Small proteins specifically refer to proteins consisting of less than 100 amino acids translated from small open reading frames(s ORFs),which were usually missed in previous genome annotation.The significance of small proteins has been revealed in current years,along with the discovery of their diverse functions.However,systematic annotation of small proteins is still insufficient.Sm Prot was specially developed to provide valuable information on small proteins for scientific community.Here we present the update of Sm Prot,which emphasizes reliability of translated s ORFs,genetic variants in translated s ORFs,disease-specific s ORF translation events or sequences,and remarkably increased data volume.More components such as non-ATG translation initiation,function,and new sources are also included.Sm Prot incorporated638,958 unique small proteins curated from 3,165,229 primary records,which were computationally predicted from 419 ribosome profiling(Ribo-seq)datasets or collected from literature and other sources from 370 cell lines or tissues in 8 species(Homo sapiens,Mus musculus,Rattus norvegicus,Drosophila melanogaster,Danio rerio,Saccharomyces cerevisiae,Caenorhabditis elegans,and Escherichia coli).In addition,small protein families identified from human microbiomes were also collected.All datasets in Sm Prot are free to access,and available for browse,search,and bulk downloads at http://bigdata.ibp.ac.cn/SmProt/.
文摘小开放阅读框(small open reading frame,sORF)广泛存在于不同生物基因组中,由于其序列短,以及编码的产物小蛋白(small protein,或称微蛋白;microprotein或迷你蛋白miniprotein)检测困难等原因,小开放阅读框长期未得到充分注释和研究。近年来,随着高通量测序、翻译组和质谱分析等技术的不断发展,在不同生物中发现大量新的小开放阅读框,其编码的小蛋白及介导的翻译调控已应用于药物开发及植物抗病机理等研究。但是,目前对微生物的小开放阅读框相关研究和应用还相对有限。本文综述了小开放阅读框编码产物小蛋白的发现和鉴定,以及上游开放阅读框(upstream open reading frame,uORF)对mRNA翻译调控等最新研究进展,重点介绍了微生物基因组中小开放阅读框的鉴定和功能研究进展,为深入认识微生物中小开放阅读框的功能和作用机制,以及植物和动物等高等其他生物的小蛋白和翻译调控相关研究提供参考。
基金supported by the National Natural Science Foundation of China(No.81870237)。
文摘With the development of modern sequencing techniques and bioinformatics, genomes that were once thought to be noncoding have been found to encode abundant functional micropeptides(miPs), a kind of small polypeptides. Although miPs are difficult to analyze and identify, a number of studies have begun to focus on them. More and more miPs have been revealed as essential for energy metabolism homeostasis, immune regulation, and tumor growth and development. Many reports have shown that miPs are especially essential for regulating glucose and lipid metabolism and regulating mitochondrial function.MiPs are also involved in the progression of related diseases. This paper reviews the sources and identification of miPs, as well as the functional significance of miPs for metabolism-related diseases, with the aim of revealing their potential clinical applications.
基金This work is supported by the National Natural Science Foundation of China(nos.31872872 and U1804113)National Key Research and Deveopment Program of China(no.2016YFD0101003),and Henan Association for Science and Technology.
文摘Non-conventional peptides(NCPs),which include small open reading frame-encoded peptides,play critical roles in fundamental biological processes.In this study,we developed an integrated peptidogenomic pipeline using high-throughput mass spectra to probe a customized six-frame translation database and applied it to large-scale identification of NCPs in plants.A total of 1993 and 1860 NCPs were unambiguously identified in maize and Arabidopsis,respectively.These NCPs showed distinct characteristics compared with conventional peptides and were derived from introns,3′UTRs,5′UTRs,junctions,and intergenic regions.Furthermore,our results showed that translation events in unannotated transcripts occur more broadly than previously thought.In addition,we found that dozens of maize NCPs are enriched within regions associated with phenotypic variations and domestication selection,indicating that they potentially are involved in genetic regulation of complex traits and domestication in maize.Taken together,our study developed an integrated peptidogenomic pipeline for large-scale identification of NCPs in plants,which would facilitate global characterization of NCPs from other plants.The identification of large-scale NCPs in both monocot(maize)and dicot(Arabidopsis)plants indicates that a large portion of plant genome can be translated into biologically functional molecules,which has important implications for functional genomic studies.