期刊文献+

Identification of MicroRNA Precursors with Support Vector Machine and String Kernel 被引量:1

Identification of MicroRNA Precursors with Support Vector Machine and String Kernel
原文传递
导出
摘要 MicroRNAs (miRNAs) are one family of short (21-23 nt) regulatory non-coding RNAs processed from long (70-110 nt) miRNA precursors (pre-miRNAs). Identifying true and false precursors plays an important role in computational identification of miRNAs. Some numerical features have been extracted from precursor sequences and their secondary structures to suit some classification methods; however, they may lose some usefully discriminative information hidden in sequences and structures. In this study, pre-miRNA sequences and their secondary structures are directly used to construct an exponential kernel based on weighted Levenshtein distance between two sequences. This string kernel is then combined with support vector machine (SVM) for detecting true and false pre-miRNAs. Based on 331 training samples of true and false human pre-miRNAs, 2 key parameters in SVM are selected by 5-fold cross validation and grid search, and 5 realizations with different 5-fold partitions are executed. Among 16 independent test sets from 3 human, 8 animal, 2 plant, 1 virus, and 2 artificially false human pre-miRNAs, our method statistically outperforms the previous SVM-based technique on 11 sets, including 3 human, 7 animal, and 1 false human pre-miRNAs. In particular, premiRNAs with multiple loops that were usually excluded in the previous work are correctly identified in this study with an accuracy of 92.66%. MicroRNAs (miRNAs) are one family of short (21-23 nt) regulatory non-coding RNAs processed from long (70-110 nt) miRNA precursors (pre-miRNAs). Identifying true and false precursors plays an important role in computational identification of miRNAs. Some numerical features have been extracted from precursor sequences and their secondary structures to suit some classification methods; however, they may lose some usefully discriminative information hidden in sequences and structures. In this study, pre-miRNA sequences and their secondary structures are directly used to construct an exponential kernel based on weighted Levenshtein distance between two sequences. This string kernel is then combined with support vector machine (SVM) for detecting true and false pre-miRNAs. Based on 331 training samples of true and false human pre-miRNAs, 2 key parameters in SVM are selected by 5-fold cross validation and grid search, and 5 realizations with different 5-fold partitions are executed. Among 16 independent test sets from 3 human, 8 animal, 2 plant, 1 virus, and 2 artificially false human pre-miRNAs, our method statistically outperforms the previous SVM-based technique on 11 sets, including 3 human, 7 animal, and 1 false human pre-miRNAs. In particular, premiRNAs with multiple loops that were usually excluded in the previous work are correctly identified in this study with an accuracy of 92.66%.
出处 《Genomics, Proteomics & Bioinformatics》 SCIE CAS CSCD 2008年第2期121-128,共8页 基因组蛋白质组与生物信息学报(英文版)
基金 the National Nat-ural Science Foundation of China (No. 60405001 and 60875001) the Natural Science Foundationof Jiangsu Province, China (No. BK2004142).
关键词 string kernel support vector machine MICRORNA PRECURSOR weighted Levenshteindistance string kernel, support vector machine, microRNA, precursor, weighted Levenshteindistance
  • 相关文献

参考文献30

  • 1Lee, Y., et al. 2002. MicroRNA maturation: stepwise processing and subcellular localization. EMBO J. 21: 4663-4670.
  • 2Bartel, D.P. 2004. MicroRNAs: genomics, biogenesis, mechanism, and function. Cell 116: 281-297.
  • 3Kurihara, Y. and Watanabe, Y. 2004. Arabidopsis micro-RNA biogenesis through Dicer-like 1 protein functions. Proc. Natl. Acad. Sci. USA 101: 12753- 12758.
  • 4Zhang, B., et al. 2007. MicroRNAs and their regulatory roles in animals and plants. J. Cell. Physiol. 210: 279-289.
  • 5Zhang, B., et al. 2006. Computational identification of microRNAs and their targets. Comput. Biol. Chem. 30: 395-407.
  • 6Thomassen, G.O., et al. 2006. Computational prediction of microRNAs encoded in viral and other genomes. J. Biomed. Biotechnol. 2006: 95270.
  • 7Chen, F. and Yin, Q.J. 2005. Gene expression regulators--microRNAs. Chinese Sci. Bull. 50: 1281- 1292.
  • 8Brown, J.R. and Sanseau, P. 2005. A computational view of microRNAs and their targets. Drug Discov. Today 10: 595-601.
  • 9Hofacker, I.L., et al. 1994. Fast folding and comparison of RNA secondary structures. Monatsh. Chem. 125: 167-188.
  • 10Lim, L.P., et al. 2003. The microRNAs of Caenorhabditis elegans. Genes Dev. 17: 991-1008.

同被引文献9

  • 1Llave C, Xie Z X, Kasschau K D, et al.Cleavage of scarecrow-like mRNA targets directed by a class of Arabidopsis miRNA[J].Science, 2002, 297 (5589) - 2053-2056.
  • 2Bartel D P.MicroRNAs : genomics, biogenesis, mechanism and function[J].Cell, 2004, 116 ( 2 ) : 281-297.
  • 3Griffiths-Jones S, Saini H K, van Dongen S, et al.miR- Base: tools for mieroRNA genomics[J].Nucleic Acids Res,2008,36: 154-158.
  • 4Berezikov E, Guryev V, Belt J V D, et al.Phylogenetic shadowing and computational identification of human microRNA genes[J].Cell, 2005,120 ( 1 ) : 21-24.
  • 5Yousef M,Nebozhyn M,Shatkay H,et al.Combining multi- species genomic data for microRNA identification using a Naive Bays classifier[J].Bioinformatics, 2006, 22 ( 11 ) : 1325-1334.
  • 6Pedersen J S, Bejerano G, Siepel A, et al.Identification and classification of conserved RNA secondary struc- tures in the human genome[J].PLoS Computational Biol- ogy, 2006,2 (4) : 251-262.
  • 7Christoph J T,Lydia Gr,Dajana L,et al.SplamiRrediction of spliced miRNAs in plants[J].Bioinformatics, 2011,27(9):1215-1223.
  • 8Wu Y G, Wei B, Liu H Z, et al.MiRPara: a SVM-based sottware tool for prediction of most probably microRNA coding regions in genome scale sequences[J].BMC Bio- informatics, 2011,19 (12).
  • 9金伟波,李楠楠,吴方丽,孔栋,郭蔼光.水稻MicroRNA的预测及实验验证[J].中国生物化学与分子生物学报,2007,23(9):743-750. 被引量:7

引证文献1

二级引证文献5

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部