摘要
提出一种新的基于支持向量机的人类ncRNA基因预测方法。首先从GENCODE数据库和UCSC数据库中提取人的ncRNA和mRNA序列数据,选择单核苷酸、二核苷酸出现频率等86个特征作为原始数据,其次利用离散小波变换去除冗余信息和噪声,最后建立离散小波变换与支持向量机相结合的ncRNA基因预测模型(DWT-SVM)。实验结果表明DWTSVM模型对测试集ncRNA的预测准确率为93.71%,优于PCA-SVM和DWT-KNN两种预测模型的预测结果。
This paper presents a new human ncRNA gene prediction method based on support vector machine.Firstly,ncRNA and mRNA sequence data are extracted from GENCODE database and UCSC database,choosing 86 characteristics such as single nucleotide,two nucleotide as the original data.Secondly using discrete wavelet transform to remove redundant information and noise.Finally ncRNA prediction model combined by discrete wavelet transform and support vector machine is built up.Experimental result show that prediction accuracy of test set ncRNA based on DWT-SVM model is 93.71%,which is better than the prediction result of PCA-SVM and DWT-KNN prediction model.
出处
《青岛科技大学学报(自然科学版)》
CAS
2017年第2期112-118,共7页
Journal of Qingdao University of Science and Technology:Natural Science Edition
基金
国家自然科学基金项目(51372125)
山东省自然科学基金项目(ZR2013AM007
ZR2014FL021)
山东省高等学校科技计划项目(J13LI54)
青岛科技大学大学生创新训练计划项目(201606002)
关键词
非编码RNA
基因预测
支持向量机
离散小波变换
non-coding RNA
gene prediction
support vector machine
discrete wavelet transform