摘要
随着现代测序技术的发展,产生海量生物数据,快速发展的生物信息学也在不断剖析这些数据的隐藏生物信息。通过生物网络研究基因型与疾病表型的关联关系从而实现致病基因的预测和寻找基因导致的疾病。基于疾病基因模块性特征,提出整合蛋白质相互作用网络、疾病表型相似性网络、疾病-基因对应网络,构建异构生物网络,改进网页排序算法TrustRank,对候选基因与疾病进行优先级排序,实现预测功能。本文还将通过Spark平台开发基因疾病搜索系统,数据存储在HBase中,形成大数据存储、处理、分析的解决方案,对临床诊断和疾病治疗提供新思路。
With the development of modern sequencing technology,resulting in massive biological data,the rapid development of Bioinformatics is also constantly analyzing the hidden information of these data.Through the biological network to study the relationship between genotype and disease phenotype,the prediction of pathogenic genes could be achieved and diseases caused by the genes found.Based on the modular nature of the disease gene,the paper proposes to integrate the protein interaction network,disease phenotype similarity network,disease-gene correspondence network,construct heterogeneous biological network,improve the web page sorting algorithm TrustRank,prioritize candidate genes and realize diseases forecasting function.This paper also develops the genetic disease search system through the Spark platform.The data are stored in HBase,which form a large data storage,processing and analysis solution,and provide new ideas for clinical diagnosis and disease treatment.
作者
杨勤
臧天仪
YANG Qin;ZANG Tianyi(School of Computer Science and Technology,Harbin Institute of Technology,Harbin 150001,China)
出处
《智能计算机与应用》
2019年第1期272-276,共5页
Intelligent Computer and Applications