摘要
蛋白质复合物对于生物学家有效了解细胞组织和功能具有重要意义,如何通过计算方法从蛋白质-蛋白质相互作用(PPI)网络中识别复合物是当前研究热点之一。然而,由于PPI网络中存在大量假阴性和假阳性噪声数据且现有已知蛋白质复合物并不完整,使得如何克服PPI网络的噪声问题,以及更好地利用已知蛋白质复合物,成为蛋白质复合物识别亟待解决的关键问题。为此,该文提出一种基于蛋白质复合物拓扑信息,利用监督学习进行蛋白质复合物识别的算法(NOBEL)。首先,NOBEL根据蛋白质的生物信息和拓扑信息构建加权PPI网络,降低了网络中的噪声问题;然后,通过加权PPI网络和未加权PPI网络提取复合物拓扑信息作为特征,并根据提取的特征训练监督学习模型,使得监督学习模型能有效学习复合物蕴含的信息;最后,将训练好的模型应用于PPI网络识别蛋白质复合物。作者在四种真实PPI网络上进行了实验,实验结果表明,NOBEL与其他七种蛋白质复合物识别算法相比,在F-measure方面分别至少提高了4.39%(Gavin)、1.32%(DIP)、2.39%(WI-PHI_core)和2.34%(WI-PHI_extend)。
Protein complexes are significant in understand cell organization and function, and to identify complex from protein-protein interaction(PPI) network by computational method is one of the hot research topics. To overcome the noise issue in PPI network, this paper proposes a protein complex identification algorithm(NOBEL) via supervised learning based on topological information of protein complex. Firstly, NOBEL construct a weighted PPI network based on proteins biological information and topological information, so as to reduce the noise problem in the network. Then, complex topological information is extracted as features for the supervised model through weighted and unweighted PPI network. Finally, the trained model is applied to identify protein complexes from PPI networks. Experiments on four real PPI networks show that, compared with the other seven protein complexes identification algorithms, NOBEL improves F-measure by at least 4.39% on Gavin, 1.32% on DIP, 2.39% on WI-PHI;ore and 2.34% on WI-PHI;xtend, respectively.
作者
王晓旭
刘晓霞
WANG Xiaoxu;LIU Xiaoxia(School of Electronics Engineering and Computer Science,Dalian Maritime University,Dalian,Liaoning 116026,China)
出处
《中文信息学报》
CSCD
北大核心
2021年第9期82-93,共12页
Journal of Chinese Information Processing
基金
中国博士后科学基金(2020M680931)。