摘要
研究表明长非编码RNA (long non-coding RNA, lncRNA)在许多生物的生命活动中发挥着重要作用。识别潜在的lncRNA-疾病关联(lncRNA-disease associations, LDAs)有助于研究疾病的发病机制,及时地诊断、预防和治疗疾病。本文提出了一种基于多图结构和注意力机制的图卷积网络模型预测LDAs,简称MGSGCN。该模型综合了疾病语义相似性、lncRNA功能相似性、疾病与lncRNA高斯相互作用谱核相似性和余弦相似性,构建了疾病和lncRNA的特征向量。基于图卷积网络(graph conventional network, GCN)和图注意力网络(graph attention network, GAT),使用了提取封闭子图和交互信息传播的多图结构策略来训练和预测LDAs。MGSGCN在Dataset1和Dataset2上的五折交叉验证(five-fold cross validation, 5-CV)的准确率分别为94.55%和87.44%。将MGSGCN与其它四个前人研究的计算模型进行比较,评价指标结果凸显了MGSGCN具有良好的分类性能。此外,对与子宫颈癌相关的lncRNA进行了案例分析。发现MGSGCN预测出了未被实验证实的LDAs,这说明该模型具有预测新的LDAs的能力。
Studies have shown that long non-coding RNA (lncRNA) plays an important role in the life activities of many organisms. Identifying potential lncRNA-disease associations (LDAs) helps to study the pathogenesis of diseases and to diagnose, prevent and treat diseases in a timely manner. In this paper, we proposed a graph convolutional network model based on multi-graph structure and attention mechanism to predict LDAs, named MGSGCN. The model integrated disease semantic similarity, lncRNA functional similarity, disease and lncRNA Gaussian interaction profile kernel similarity, and cosine similarity, and constructed disease and lncRNA feature vectors. Based on graph conventional network (GCN) and graph attention network (GAT), a multi-graph structural strategy for extracting enclosing subgraphs and interaction information propagation was used to train and predict LDAs. The accuracy of MGSGCN on Dataset1 and Dataset2 with five-fold cross validation (5-CV) is 94.55% and 87.44%, respectively. Compared MGSGCN with four other computational models from previous studies, and the results of the evaluation metrics highlighted the good classification performance of MGSGCN. In addition, a case study of lncRNAs associated with cervical cancer was performed. MGSGCN was found to predict LDAs that were not experimentally confirmed, suggesting that the model has the ability to predict new LDAs.
出处
《生物医学》
2024年第3期457-470,共14页
Hans Journal of Biomedicine