摘要
动态蛋白质网络的构建和复合物识别问题是生物信息学领域目前研究的热点。针对现有的算法在解决前述问题上的不足,提出了一种基于隐马尔科夫模型的蛋白质复合物识别算法(HMM-PC)。首先基于蛋白质的基因共表达特性构建初始蛋白质网络,然后利用蛋白质的共享功能注释、共享结构域和连接强度等信息来对网络进行加权,得到动态蛋白质网络。在此基础上,考虑前一时刻蛋白质网络拓扑结构信息对当前时刻蛋白质网络拓扑结构信息的影响,采用隐马尔科夫模型描述蛋白质复合物与网络个体间的相互关系,进而将动态蛋白质网络中的复合物识别问题建模为隐马尔科夫模型中的最优状态序列发现问题,并采用维特比算法识别得到蛋白质复合物。最后通过理论分析证明了所提算法的复杂度较低。采用DIP数据集和MIPS数据集中的酵母蛋白质网络作为测试对象,大量的仿真实验结果也表明,HMM-PC算法的鲁棒性较强,在查全率、查准率、F-measure和效率等方面的性能都要优于现有的复合物识别算法。
The construction of dynamic protein networks and the recognition of protein complexes are the hot topics in the current research of bioinformatics.In view of the shortcomings of existing algorithms in solving the above problems,a protein complex recognition algorithm(HMM-PC)based on hidden Markov model is proposed.In this paper,the initial protein network is constructed based on the co-expression characteristics of proteins,and then the dynamic protein network is obtained by weighting the initial network with the information of shared function annotation,shared domain and connection strength.On this basis,considering the influence of the previous time protein network topology information on the current protein network topology information,the relationship between protein complex and network individuals is described based on HMM,and then the problem of protein complex recognition in dynamic protein networks is modeled as the problem of optimal state sequence discovery in HMM and the protein complex is identified by the Viterbi algorithm.Finally,theoretical analysis shows that the proposed algorithm has low complexity.The yeast protein network in DIP data set and MIPS data set is used as the test object.A large number of simulation results also show that HMM-PC algorithm has strong robustness,and its performance is better than the existing composite recognition algorithms in terms of recall,precision,F-measure and efficiency.
作者
李鹏
罗爱静
闵慧
谭荪怡
郭惠敏
LI Peng;LUO Aijing;MIN Hui;TAN Sunyi;GUO Huimin(The Third Xiangya Hospital of Central South University,Changsha 410006,China;School of Informatics,Hunan University of Chinese Medicine,Changsha 410208,China;Key Laboratory of Medical Information Research(Central South University),College of Hunan Province,Changsha 410006,China;Software Department,Hunan College of Information,Changsha 410200,China)
出处
《计算机科学与探索》
CSCD
北大核心
2021年第10期1980-1989,共10页
Journal of Frontiers of Computer Science and Technology
基金
国家社会科学基金重点项目(17AZD037)
国家重点研发计划(2017YFC1703306)
湖南省自然科学基金青年项目(2019JJ50453)
湖南省自然科学基金面上项目(2018JJ2301)
湖南省科技厅重点项目(2017SK2111)
湖南中医药大学开放基金项目(2018JK02)
湖南省教育厅一般项目(19C1318)。