摘要
蛋白质是完成重要生物活动所必需的分子。准确掌握蛋白质功能,将对生命科学研究及应用起到极大的促进作用。高通量技术的发展产生了海量的蛋白质序列,利用计算技术预测大规模蛋白质功能已成为当今生物信息学的核心任务之一。目前,作为蛋白质功能预测的研究热点,基于蛋白质相互作用网络的预测方法在降低数据噪声影响、充分利用网络拓扑特性及整合多源数据等方面仍不够完善。文中结合带阻力随机游走得到的全局拓扑相似度,及功能术语的语义相似度,设计了一种双加权投票蛋白质功能预测算法BiWV;并在此基础上整合了生物通路信息,提出了带生物通路的双加权投票算法——BiWV-P。在酿酒酵母和人类数据集上,对所提算法与TMC,UBiRW和ProHG 3种算法的预测效果进行对比分析。实验结果显示,算法BiWV和BiWV-P能够有效预测蛋白质功能,并在许多数据集上获得较其他算法更高的微正确率与微F1。
Proteins are the essential molecules to accomplish important biological activities.It will greatly promote the advance of life science research and application to accurately grasp their functions.A tremendous amount of protein sequences has been generated with the development of high-throughput techniques.Thus,prediction of large-scale protein functions with computation technology has become one of the key tasks in bioinformatics today.Currently,the prediction method based on protein-protein interaction network,which is a research hotspot of protein function prediction,still has shortcomings at such aspects as reducing the impact of data noise,making full use of network topology characteristics,integrating multi-source data,and so on.In this paper,the Bi-Weighted Vote(BIWV) algorithm was proposed to predict protein functions,which combines the global topological similarity produced by Random Walk with Resistance (RWS) and the semantic similarity between terms.In addition,the Bi-Weighted Vote algorithm with pathway (BiWV-P) was presented by integrating the information of biological pathway.By using the data sets of saccharomyces cerevi-siae and homo sapiens,experiments were performed to compare TMC,UBiRW,ProHG,BiWV and BiWV-P.The experimental results indicate that BiWV algorithm and BiWV-P algorithm can predict protein functions effectively,and achieve higher micro-accuracy and micro-F1 than other algorithms in many data sets.
作者
唐家琪
吴璟莉
廖元秀
王金艳
TANG Jia-qi;WU Jing-li;LIAO Yuan-xiu;WANG Jin-yan(School of Computer Science & Information Engineering,Guangxi Normal University,Guilin,Guangxi 541004,China;Guangxi Key Laboratory of Multi-Source Information Mining & Safety,Guangxi Normal University,Guilin,Guangxi 541004,China;Guangxi Regional Multi-Source Information Integration & Intelligent ProcessingCooperation Innovation Center,Guilin,Guangxi 541004,China)
出处
《计算机科学》
CSCD
北大核心
2019年第4期222-227,共6页
Computer Science
基金
国家自然科学基金项目(61762015
61502111
61662007
61763003)
广西自然科学基金项目(2015GXNSFAA139288)
"八桂学者"工程专项
广西科技基地和人才专项(AD16380008)资助
关键词
蛋白质相互作用网络
功能预测
随机游走
语义相似度
生物通路
Protein-protein interaction network
Function prediction
Random walk
Semantic similarity
Biological pathway