摘要
目的:构建具有高敏感性和高特异性的microRNA前体(pre-miRNA)识别模型。方法:根据300例经实验验证的人pre-miRNA和300例从3′UTR折成茎环结构的片段中随机选取的阴性样本,基于支持向量机方法构建了区分pre-miRNA和pseudo pre-miRNA的分类器MiRscreen。为提高分类器的性能,我们采用遗传算法搜索影响分类器性能的2个重要参数C和γ。结果与结论:该分类器对训练集的敏感性为99.33%,特异性为100%,对剩余的91例人pre-miRNA和91例3′UTR中的pseudo pre-miRNA敏感性和特异性分别达到91.21%(83/91)和93.41%(85/91)。在除人以外的其他20种动物和病毒的1353例pre-miRNA中,MiRscreen正确判断出其中的1192例,敏感性达到88.10%,其中马雷克病病毒、猕猴淋巴隐病毒、EB病毒、猿猴病毒40、非洲爪蟾、狗、绵羊和猕猴共计8个物种的敏感性达到100%;在随机抽取的100条RefSeq基因折叠形成的556例pseudo pre-miRNA和随机抽取的797例人19号染色体折叠形成的pseudo pre-miRNA(共计1353例混合阴性样本)中,MiRscreen的特异性达到85.14%(1152/1353)。与其他6种同类方法相比,MiRscreen在敏感性和特异性方面均具有较好的性能,分类精度最高,达到86.62%,比其他方法高6%以上;MiRscreen的AUC值达到0.938,也明显高于其他方法。
Objective: To construct a prediction model for microRNA precursors (pre-miRNAs) with high sensitivity and high specificity. Methods: A prediction model, MiRscreen, for microRNA precursors using genetic algorithm and support vector machines was introduced. The training dataset contained 300 human experimentally validated pre-miRNAs as positive samples and 300 pseudo pre-miRNAs as negative samples. The negative samples were randomly selected from 3' UTR stem-loops. To improve the performance of the classifier, genetic algorithm was employed to search for C and γ, which were two important parameters for SVM classifiers. Results and conclusion: The sensitivity and specificity for the training dataset were 99.33% and 100% , respectively. For the remaining 91 human pre-miRNAs and 91 pseudo pre-miRNAs from 3′UTR, the sensitivity and specificity were 91.21% ( 83/91 ) and 93.41% ( 85/91 ) , respectively. The overall sensitivity of MiRscreen for 1 353 experimentally validated animal and virus( excluding human)pre-miRNAs was 88.10% (1 192/1 353 ) ,and the sensitivity for eight species was 100% , including Marek's disease virus, rhesus lymphoeryptovirus, Epstein-Barr virus, simian virus 40, Xenopus laevis, Canis familiaris, Ovis aries and Macaca mulatta. The overall specificity for the 556 pseudo pre-miRNAs from 100 randomly selected RefSeq genes and 797 pseudo pre-rniRNAs randomly selected from human chromosome 19 was 85.14% ( 1 152/1 353 ). Compared with the other six miRNA classification methods proposed previously, MiRscreen is remarkable in both sensitivity and specificity on the independent test dataset. The accuracy of MiRscreen is 86.62% , which is 6% higher than that of the other methods. The AUC of MiRscreen is 0.938, alsogreater than the AUC of each of the other six methods. Therefore, the presented model MiRScreen can facilitate experimented identification of premiRNAs.
出处
《军事医学科学院院刊》
CSCD
北大核心
2008年第3期287-292,共6页
Bulletin of the Academy of Military Medical Sciences
基金
国家自然科学基金资助项目(30500105
30470411)
关键词
微RNAS
识别
遗传算法
支持向量机
microRNAs
classification
genetic algorithm
support vector machines