摘要
近期,从非编码RNA中发现具有肽编码能力的小开放阅读框(sORFs),激发了人们对这种长期被忽略的基因组元件的研究兴趣,sORFs迅速成为当前重点研究领域.由于表达水平及丰度低、序列短等因素,对肽编码sORFs的有效研究方法及数据资源还很缺乏,现有研究仅集中在少数真核模式生物,对自然界中广泛存在的原核生物研究非常少,肽编码sORFs的发现为目前精准背景下的基因组注释提出严峻挑战.在此背景下,本文首先系统研究了80余种不同类型原核生物中长度小于100个氨基酸的肽编码sORFs分布及功能特征,并对不同长度区间sORFs的序列组成、分布及进化特征进行了对比分析.结果表明,肽编码sORFs在原核生物基因组普遍存在,随着序列长度的降低,其序列复杂度降低,行使的生物功能也相对集中.在此基础上,进一步结合当前肽编码sORFs研究现状,深入总结了肽编码sORFs研究存在的问题及挑战,为今后肽编码sORFs研究奠定了坚实理论基础.
Whether encoding protein is the golden standard for distinguishing protein coding genes and non-coding RNA (ncRNA), while recent detected peptide coding small open reading frames (sORFs) from lncRNA challenged this standard. Now, more and more studies have shown that peptide coding sORFs exist in different regions of eukaryotic genomes universally, which play important roles in biological activities. Because of the low expression level as well as low abundance and the short sequence length, there are few computational and experimental methods or data resources exploited for peptide coding sORFs, then study ofpeptide coding sORFs is in its early phase. At present, most studies ofpeptide coding sORFs are concentrated on several model eukaryotes, people know little about its intrinsic features, therefore the peptide coding sORFs bring more challenges for genome annotation under the precision medicine era. In this work, comprehensive sequence and function analysis of the peptide coding sORFs were firstly performed based on more than 80 prokaryotic genomes. The results show that peptide coding sORFs also exist in prokaryotic genomes universally and many peptide coding sORFs sequences are conserved among different genomes. Further analysis indicates that the sequence complexity decreases and their functions are relatively centered with the decrease of sequence length ofpeptide coding sORFs. Finally, we summarized the problems and challenges psoposed by peptide coding sORFs, which will provide solid theoretical basis for future sORFs related studies.
出处
《生物化学与生物物理进展》
SCIE
CAS
CSCD
北大核心
2018年第1期59-67,共9页
Progress In Biochemistry and Biophysics
基金
国家自然科学基金(61771093
61671107)
山东省自然科学基金重点项目(ZR2016JL027)资助~~