摘要
为了解决利用单一生物数据无法揭示复杂的生物过程和疾病机制的问题,提出了一种多信息融合的DGPMIF致病基因预测方法。首先,构建一个具有疾病-表型、疾病-基因、蛋白质-蛋白质和基因-本体关联的异构网络,利用网络嵌入算法提取该异构网络中节点的低维向量表示,同时结合网络拓扑算法提取网络结构特征。其次,利用余弦相似性算法衡量节点向量的相似性,预测疾病与基因之间的关系。最后,通过对特定疾病的案例进行研究,并与经典致病基因预测方法进行对比,验证DGPMIF方法的有效性。结果表明:不同类型的关联数据对增强致病基因预测性能具有重要作用;经过多层次信息融合,提高了致病基因预测的预测性能。DGPMIF预测方法能够高效挖掘网络中蕴含的信息,对相关疾病基因关联的预测研究具有重要的参考价值。
In order to solve the problem of being unable to reveal complex biological processes and disease mechanisms using only a single biological data,proposed a disease-causing gene prediction method,DGPMIF,adopting a multi-information fusion strategy.Firstly,a heterogeneous network with disease-phenotype,disease-gene,protein-protein and gene-ontology associations was constructed.The network embedding algorithm was used to extract the low-dimensional vector representation of the nodes in the heterogeneous network.At the same time,the network topology algorithm was combined to extract network structural characteristics.Secondly,the cosine similarity algorithm was used to measure the similarity of node vectors and predict the relationship between diseases and genes.Finally,the effectiveness of the DGPMIF method was verified through case studies of specific diseases and comparison with classic disease-causing gene prediction methods.The results show that different types of associated data play an important role in enhancing the prediction performance of disease-causing genes,and the predictive performance of disease-causing gene prediction is improved through multi-level information fusion.DGPMIF prediction method can efficiently mine the information contained in the network,and has important reference value for prediction research on gene association of related diseases.
作者
马金龙
翟美静
MA Jinlong;ZHAI Meijing(School of Information Science and Engineering,Hebei University of Science and Technology,Shijiazhuang,Hebei 050018,China)
出处
《河北工业科技》
CAS
2024年第1期27-35,共9页
Hebei Journal of Industrial Science and Technology
基金
河北省省级科技计划资助项目(23550801D)。
关键词
人工智能其他学科
致病基因
异构网络
信息融合
网络嵌入
网络结构特征
other disciplines of artificial intelligence
disease-causing genes
heterogeneous network
information fusion
network embedding
network structural characteristics