期刊文献+

SRF-LDA:基于堆叠集成学习的LncRNA与疾病关联预测方法

SRF-LDA: A Stacking-Based Ensemble Learning Model for LncRNA-Disease Association Prediction
下载PDF
导出
摘要 长链非编码RNA (lncRNA)是一类长度大于200 nt的非编码RNA,是非编码基因组的重要组成部分。大量实验证实,lncRNA与人类疾病的发生发展密不可分,但除了一小部分的lncRNA与人类疾病关系已知之外,大多数的lncRNA与人类疾病的关系仍然有待研究,因此准确识别与疾病有关的lncRNA有助于研究lncRNA在疾病中的作用机制,探索治疗疾病的新方法。在本研究中,为了提高对LDA的预测能力,我们实现了一种基于堆叠集成学习的LDA预测模型(简称SRFLDA)。在SRFLAD中,第一部分通过整合lncRNA的K-mer、疾病的高斯相互作用谱核相似性及已知lncRNA-疾病关联(LDA)三种类型的特征作为融合特征输入模型。第二部分使用堆叠集成学习策略通过组合多个不同参数的随机森林分类器作为基模型进行特征分类,并使用支持向量机作为元模型对随机森林的分类结果进行组合优化,从而得到更准确、鲁棒的LDA预测结果。第三部分通过十倍交叉验证对模型进行训练评价。结果表明该方法在预测LDA方面具有较好的性能,平均AUC的值为0.9246,平均AUPR值为0.9166,预测效果优于其他几种现有的LDA预测模型。 Long non-coding RNAs (lncRNAs) are a class of non-coding RNAs larger than 200 nt in length and are an important component of the non-coding genome. A large number of experiments have confirmed that lncRNA is inseparable from the occurrence and development of human diseases, but except for a small number of lncRNAs with human diseases, the relationship between most lncRNAs and human diseases still needs to be studied, so accurate identification of lncRNAs related to diseases is helpful to study the mechanism of action of lncRNAs in diseases and explore new ways to treat diseases. In this study, in order to improve the prediction ability of LDA, we implemented an LDA prediction model based on stacked ensemble learning (SRFLDA). In SRFLAD, the first part is used to integrate three types of features of lncRNA, namely K-mer, Gaussian interaction spectral nuclear similarity of disease, and known lncRNA-disease association (LDA), as fusion features as input into the model. In the second part, the stacked ensemble learning strategy is used to classify features by combining random forest classifiers with multiple different parameters as the base model, and the support vector machine is used as a metamodel to combine and optimize the classification results of the random forest, so as to obtain more accurate and robust LDA prediction results. The third part is to evaluate the training of the model through tenfold cross-validation. The results show that the proposed method has good performance in predicting LDA, with an average AUC value of 0.9246 and an average AUPR value of 0.9166, which is better than that of several other existing LDA prediction models.
作者 孙捷 谭者斌
出处 《计算生物学》 CAS 2023年第4期35-44,共10页 Hans Journal of Computational Biology
  • 相关文献

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部