摘要
Small proteins specifically refer to proteins consisting of less than 100 amino acids translated from small open reading frames(s ORFs),which were usually missed in previous genome annotation.The significance of small proteins has been revealed in current years,along with the discovery of their diverse functions.However,systematic annotation of small proteins is still insufficient.Sm Prot was specially developed to provide valuable information on small proteins for scientific community.Here we present the update of Sm Prot,which emphasizes reliability of translated s ORFs,genetic variants in translated s ORFs,disease-specific s ORF translation events or sequences,and remarkably increased data volume.More components such as non-ATG translation initiation,function,and new sources are also included.Sm Prot incorporated638,958 unique small proteins curated from 3,165,229 primary records,which were computationally predicted from 419 ribosome profiling(Ribo-seq)datasets or collected from literature and other sources from 370 cell lines or tissues in 8 species(Homo sapiens,Mus musculus,Rattus norvegicus,Drosophila melanogaster,Danio rerio,Saccharomyces cerevisiae,Caenorhabditis elegans,and Escherichia coli).In addition,small protein families identified from human microbiomes were also collected.All datasets in Sm Prot are free to access,and available for browse,search,and bulk downloads at http://bigdata.ibp.ac.cn/SmProt/.
基金
supported by the National Key R&D Program of China(Grant No.2016YFC0901702)
National Natural Science Foundation of China(Grant Nos.81902519,91940306,31871294,31701117,and 31970647)
the National Key R&D Program of China(Grant Nos.2017YFC0907503,2016YFC0901002,and 2018YFA0106901)
the Strategic Priority Research Program of Chinese Academy of Sciences(Grant No.XDB38040300)
the 13th Five-year Informatization Plan of Chinese Academy of Sciences(Grant No.XXH13505-05)
Special Investigation on Science and Technology Basic Resources,Ministry of Science and Technology,China(Grant No.2019FY100102)
the National Genomics Data Center,China。