摘要
目的非编码RNA-蛋白质的相互作用(noncoding RNA-protein interactions,ncRPI)具有重要的生物学意义,目前预测其相互作用已成为当下研究非编码RNA (noncoding RNA,ncRNA)和蛋白质功能的重要途径之一。方法本研究基于ncRNA和蛋白质的序列信息提取特征,运用卷积自编码器预处理原始数据,训练三个机器学习模型:LightGBM(LBM)、随机森林(random forest,RF)和极端梯度增强算法(extreme gradient boosting,XGB),预测ncRNA与蛋白质的相互作用。结果在RPI369和RPI488两个数据集做5倍交叉验证,LBM、RF与XGB三个模型在两个数据集均达到较高的预测准确率,在RPI369数据集三个模型的预测准确率分别为0.757(LBM)、0.791(RF)、0.791(XGB),在RPI488数据集三个模型的预测准确率分别为0.918(LBM)、0.908(RF)、0.918(XGB);三个模型在RPI1807、RPI2241、RPI13254大数据集也取得较高的AUC(area under curve)值,在RPI1807三个模型的AUC值均为0.99,在RPI2241三个模型最低AUC值为0.87,在RPI13254三个模型最低AUC值为0.81,都表现出较好的预测准确性。结论机器学习方法能够预测ncRNA与蛋白质是否存在相互作用。
Objective The biological significance of noncoding RNA-protein interactions (ncRPI) is important,and ncRPI prediction is an important way to study the function of noncoding RNA (ncRNA) and protein. Methods We extracted feature based on the sequence of ncRNA and protein in the work,preprocessed raw data by training a convolutional autoencoder (CAE). Three machine learning models,LightGBM (LBM),random forest (RF) and extreme gradient boosting (XGB) were trained to predict the ncRPI. Results We tested the three models by 5-fold cross validation (CV) on RPI369 and RPI488. All the three methods of LBM,RF and XGB achieved high performance with the accuracy of 0.757 (LBM),0.791 (RF),0.791 (XGB) on RPI369,respectively. On RPI488,the three models obtained the accuracy of 0.918 (LBM),0.908 (RF),0.918 (XGB),respectively. The three models obtained higher area under curve (AUC) on large-scale data. On RPI1807,all the three models obtained the AUC of 0.99,and the smallest AUC of 0.87 and 0.81 on RPI2241 and RPI13254,respectively. All the three methods of LBM,RF and XGB performed well for predicting ncRPI. Conclusions The machine learning methods can be used to predict ncRNA-protein interaction.
作者
程淑萍
谭建军
门婧睿
CHENG Shuping;TAN Jianjun;MEN Jingrui(College of Life Science and Bioengineering,Beijing University of Technology,Intelligent Physiological Measurement and Clinical Translation, Beijing International Base for Scientific and Technological Cooperation,Beijing 100124)
出处
《北京生物医学工程》
2019年第4期353-359,共7页
Beijing Biomedical Engineering
基金
国家自然科学基金(21173014)资助