SRF-LDA:基于堆叠集成学习的LncRNA与疾病关联预测方法

SRF-LDA: A Stacking-Based Ensemble Learning Model for LncRNA-Disease Association Prediction

下载PDF

导出

摘要长链非编码RNA (lncRNA)是一类长度大于200 nt的非编码RNA,是非编码基因组的重要组成部分。大量实验证实,lncRNA与人类疾病的发生发展密不可分,但除了一小部分的lncRNA与人类疾病关系已知之外,大多数的lncRNA与人类疾病的关系仍然有待研究,因此准确识别与疾病有关的lncRNA有助于研究lncRNA在疾病中的作用机制,探索治疗疾病的新方法。在本研究中,为了提高对LDA的预测能力,我们实现了一种基于堆叠集成学习的LDA预测模型(简称SRFLDA)。在SRFLAD中,第一部分通过整合lncRNA的K-mer、疾病的高斯相互作用谱核相似性及已知lncRNA-疾病关联(LDA)三种类型的特征作为融合特征输入模型。第二部分使用堆叠集成学习策略通过组合多个不同参数的随机森林分类器作为基模型进行特征分类,并使用支持向量机作为元模型对随机森林的分类结果进行组合优化,从而得到更准确、鲁棒的LDA预测结果。第三部分通过十倍交叉验证对模型进行训练评价。结果表明该方法在预测LDA方面具有较好的性能,平均AUC的值为0.9246,平均AUPR值为0.9166,预测效果优于其他几种现有的LDA预测模型。 Long non-coding RNAs (lncRNAs) are a class of non-coding RNAs larger than 200 nt in length and are an important component of the non-coding genome. A large number of experiments have confirmed that lncRNA is inseparable from the occurrence and development of human diseases, but except for a small number of lncRNAs with human diseases, the relationship between most lncRNAs and human diseases still needs to be studied, so accurate identification of lncRNAs related to diseases is helpful to study the mechanism of action of lncRNAs in diseases and explore new ways to treat diseases. In this study, in order to improve the prediction ability of LDA, we implemented an LDA prediction model based on stacked ensemble learning (SRFLDA). In SRFLAD, the first part is used to integrate three types of features of lncRNA, namely K-mer, Gaussian interaction spectral nuclear similarity of disease, and known lncRNA-disease association (LDA), as fusion features as input into the model. In the second part, the stacked ensemble learning strategy is used to classify features by combining random forest classifiers with multiple different parameters as the base model, and the support vector machine is used as a metamodel to combine and optimize the classification results of the random forest, so as to obtain more accurate and robust LDA prediction results. The third part is to evaluate the training of the model through tenfold cross-validation. The results show that the proposed method has good performance in predicting LDA, with an average AUC value of 0.9246 and an average AUPR value of 0.9166, which is better than that of several other existing LDA prediction models.

作者孙捷谭者斌

机构地区大连交通大学理学院大连交通大学软件学院

出处《计算生物学》 CAS 2023年第4期35-44,共10页 Hans Journal of Computational Biology

关键词 lncRNA 疾病 lncRNA-疾病关联随机森林变量重要性特征选择支持向量机

分类号 R73 [医药卫生—肿瘤]

引文网络
相关文献

1王可洋,钟雪晴.肠道菌群失调与疾病关系概述[J].世界华人消化杂志,2024,32(4):280-284.
2王浩宇,赵汝岩,周献刚.装备保障初始训练体系研究[J].中文科技期刊数据库（文摘版）工程技术,2016(9):290-292.
3古海博,王成凤,金远,池方爱,李颜娥.融合蛋白质语言模型与深度神经网络的植物蛋白质相互作用预测研究[J].电子技术应用,2024,50(4):22-28.
4过惠平,韩西宁,李进军,邱伟.关于军队院校教学训练工作分类评估改革的思考[J].中文科技期刊数据库（全文版）社会科学,2019(7):444-444.
5钱谷生,郎妙郎,王红梅.滑膜关节的结构基础与临床关节疾病关系的探讨[J].中文科技期刊数据库（文摘版）医药卫生,2016(7):34-34.
6刘仪,王晓莺.南航新疆分公司职员营养状况及碘盐与甲状腺疾病关系的流行病学调查[J].中国科技期刊数据库医药,2019(1):172-172.
7梁子越,陈凤格,张莹,康慧.石家庄市大气NO_(2)短期暴露对儿童神经系统门诊量影响的病例交叉研究[J].环境与职业医学,2024,41(3):288-293.
8吴鹏,杨占君.基于网络药理学和分子对接探究香青兰保护脑卒中的作用机制[J].包头医学院学报,2024,40(6):1-6.
9王甘红,陈健,沈支佳,奚美娟,周燕婷.基于自动化机器学习建立结肠镜肠道准备失败风险预测模型及评价[J].中国内镜杂志,2024,30(5):36-47.
10孙文,孙吉利,卢虹良.基于非匹配滤波的SAR通信一体化技术[J].中国科学院大学学报（中英文）,2024,41(3):387-397.

计算生物学

2023年第4期

浏览历史

内容加载中请稍等...

SRF-LDA:基于堆叠集成学习的LncRNA与疾病关联预测方法

相关作者

相关机构

相关主题

浏览历史