期刊文献+

多特征融合的植物长链非编码RNA的预测

Prediction of plant long non⁃coding RNA by fusing multiple features
下载PDF
导出
摘要 长链非编码RNA(Long non-coding RNA,lncRNA)是一类被定义为转录本的长度大于200 nt、没有蛋白编码能力的RNA转录本。研究表明,lncRNA在调节植物生长发育、表观遗传反应以及各种胁迫反应中起重要作用。但是与人类和动物相比,植物lncRNA的研究仍然处于起步阶段。目前,如何从大量的转录本中准确地挑选出lncRNA仍然是植物lncRNA研究领域的重要问题之一。本文构建了新的植物lncRNA和mRNA数据集,分析了数据集中植物lncRNA的序列及结构特征,提取了序列的k-mer频数信息、二级结构信息、开放阅读框信息以及序列的几何柔性等特征,基于SVM(Support Vector Machine,SVM)算法,用Jackknife检验对植物lncRNA进行了预测,并且计算了各种特征融合后对植物lncRNA预测结果的影响,准确率达到了96.14%。 Long non-coding RNA(lncRNA)is a type of RNA transcript defined as having a length greater than 200 nt and no protein coding ability.Studies have shown that lncRNA plays an important role in regulating plant growth and development,epigenetic responses,and various stress responses.However,compared with humans and animals,the study of plant lncRNA is still in its infancy.How to accurately select lncRNA from a large number of transcripts is still one of the important issues in the field of plant lncRNA research.This study constructed a new plant lncRNA and mRNA dataset,analyzed the sequence and structural features of the plant lncRNA in the dataset,and extracted the k-mer frequency information,secondary structure,open reading frame,and geometric flexibility information of the sequence,based on the Support Vector Machine(SVM)algorithm.Jackknife test was conducted for the prediction of plant lncRNA,and the impact of the fusion of various features on the prediction results of plant lncRNA was calculated,where the accuracy reached 96.14%.
作者 闫玲娟 陈颖丽 闫冬雪 范芷妤 YAN Lingjuan;CHEN Yingli;YAN Dongxue;FAN Zhiyu(School of Physical Science and Technology,Inner Mongolia University,Hohhot 010021,China)
出处 《生物信息学》 2021年第2期128-135,共8页 Chinese Journal of Bioinformatics
基金 国家自然科学基金项目(No.61861035,31870838)。
关键词 植物lncRNA 特征提取 多特征融合 支持向量机 Plant lncRNA Feature extraction Multiple features fusion Support Vector Machine
  • 相关文献

参考文献5

二级参考文献30

  • 1Filomena De Lucia, Caroline Dean. Long non-coding RNAs and chromatin regulation [ J ]. Current Opinion in Plant Biology, 2011, 14(2) :168-173.
  • 2Paul Bertone, Viktor Stolc, Thomas E. Royce, Joel S. Rozowsky, Alexander E. Urban, Xiaowei Zhu, John L. Rinn, Waraporn Tongprasit, Manoj Samanta, Sherman Weissman, Mark Gerstein, Michael Snyder. Global identification of human transcribed se- quences with genome tiling arrays [ J ]. Science, 2004, 306 ( 5705 ) :2242-2246.
  • 3Yohiyuki Sakuraba, Toru Kimura, Hirohi Masuya, Hideki Noguchi, Hideki Sezutsu, K. Takahasi, Atsushi Toyoda, Ryuta- ro Fukumura, Takuya Murata, Yoshiyuki Sakaki, Masayuki Yamamura, Shigeharu Wakana, Tetsuo Noda, Toshihiko Shi- roishi, Yoichi Gondo. Identification and characterization of new long conserved noneoding sequences in vertebrates [ J ]. Mam- malian Genome, 2008, 19(10) :703-712.
  • 4The FANTOM Consortium and the RIKEN Genome Exploration Research Group Phase I & II Team. Analysis of the mouse tran- scriptome based on functional annotation of 60,770 full-length cD- NAs [J]. Nature, 2002, 420(6915) :563-573.
  • 5Alexey Soshnev, Hiroshi Ishimoto, Bryant McAllister, Xingguo Li, Misty Wehling, Toshihiro Kitamoto, Pamela Geyer. A con- served long noncoding RNA affects sleep behavior in Drosophila [J]. Genetics, 2011, 189(2) :455-468.
  • 6Robert Young, Ana Marques, Charlotte Tibbit, Wilfried Haerty, Andrew Bassett, Ji-Long Liu, Chris Ponting. Identification and properties of 1,119 candidate lincRNA Loci in the Drosophila mel- anogaster genome [ J]. Genome Biology and Evolution, 2012, 4 (4) :427-.442.
  • 7Anna Campalans, Adam Kondorosi, Martin Crespi. Enod40, a short open reading frame containing mRNA, induces cytoplas- mic localization of a nuclear RNA binding protein in Medicagotruncatula [J]. Plant Cell, 2004, 16(4) :1047-1059.
  • 8Jae Bok Heo, Sibum Sung. Vernalization-mediated epigenetic si- lencing by a long intronic noncoding RNA [ J]. Science, 2011, 331 (6013) :76-79.
  • 9Besma Ben Amor, Sonia Wirth, Francisco Merchan, Philippe Laporte, Yves d' Aubenton-Carafa, Judith Hirsch, Alexis Mai- zel, Allison Mallory, Antoine Lucas, Jean Marc Deragon, Herve Vaucheret, Claude Thermes, Martin Crespi. Novel long non-pro- tein coding RNAs involved in Arabidopsis differentiation and stress responses [ Jl. Genome research, 2009, 19(1 ) :57-69.
  • 10Yun Ju Kim, Binglian Zheng, Yu Yu, So Youn Won, Bcixin Mo, Xuemei Chen, The role of Mediator in small and long non- coding RNA production in Arabidopsis thaliana [ J ]. EMBO J, 2011,30(5) :814-822.

共引文献15

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部