期刊文献+

基于机器学习方法的非编码RNA-蛋白质相互作用的预测 被引量:4

Prediction of ncRNA-protein interactions based on machine learning methods
下载PDF
导出
摘要 目的非编码RNA-蛋白质的相互作用(noncoding RNA-protein interactions,ncRPI)具有重要的生物学意义,目前预测其相互作用已成为当下研究非编码RNA (noncoding RNA,ncRNA)和蛋白质功能的重要途径之一。方法本研究基于ncRNA和蛋白质的序列信息提取特征,运用卷积自编码器预处理原始数据,训练三个机器学习模型:LightGBM(LBM)、随机森林(random forest,RF)和极端梯度增强算法(extreme gradient boosting,XGB),预测ncRNA与蛋白质的相互作用。结果在RPI369和RPI488两个数据集做5倍交叉验证,LBM、RF与XGB三个模型在两个数据集均达到较高的预测准确率,在RPI369数据集三个模型的预测准确率分别为0.757(LBM)、0.791(RF)、0.791(XGB),在RPI488数据集三个模型的预测准确率分别为0.918(LBM)、0.908(RF)、0.918(XGB);三个模型在RPI1807、RPI2241、RPI13254大数据集也取得较高的AUC(area under curve)值,在RPI1807三个模型的AUC值均为0.99,在RPI2241三个模型最低AUC值为0.87,在RPI13254三个模型最低AUC值为0.81,都表现出较好的预测准确性。结论机器学习方法能够预测ncRNA与蛋白质是否存在相互作用。 Objective The biological significance of noncoding RNA-protein interactions (ncRPI) is important,and ncRPI prediction is an important way to study the function of noncoding RNA (ncRNA) and protein. Methods We extracted feature based on the sequence of ncRNA and protein in the work,preprocessed raw data by training a convolutional autoencoder (CAE). Three machine learning models,LightGBM (LBM),random forest (RF) and extreme gradient boosting (XGB) were trained to predict the ncRPI. Results We tested the three models by 5-fold cross validation (CV) on RPI369 and RPI488. All the three methods of LBM,RF and XGB achieved high performance with the accuracy of 0.757 (LBM),0.791 (RF),0.791 (XGB) on RPI369,respectively. On RPI488,the three models obtained the accuracy of 0.918 (LBM),0.908 (RF),0.918 (XGB),respectively. The three models obtained higher area under curve (AUC) on large-scale data. On RPI1807,all the three models obtained the AUC of 0.99,and the smallest AUC of 0.87 and 0.81 on RPI2241 and RPI13254,respectively. All the three methods of LBM,RF and XGB performed well for predicting ncRPI. Conclusions The machine learning methods can be used to predict ncRNA-protein interaction.
作者 程淑萍 谭建军 门婧睿 CHENG Shuping;TAN Jianjun;MEN Jingrui(College of Life Science and Bioengineering,Beijing University of Technology,Intelligent Physiological Measurement and Clinical Translation, Beijing International Base for Scientific and Technological Cooperation,Beijing 100124)
出处 《北京生物医学工程》 2019年第4期353-359,共7页 Beijing Biomedical Engineering
基金 国家自然科学基金(21173014)资助
关键词 非编码RNA-蛋白质相互作用 LightGBM 随机森林 极端梯度增强算法 卷积自编码器 interaction of ncRNA-protein LightGBM random forest extreme gradient boosting convolutional autoencoder
  • 相关文献

同被引文献18

引证文献4

二级引证文献20

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部