期刊文献+

基于PPI网络与机器学习的蛋白质功能预测方法 被引量:7

Protein function prediction method based on PPI network and machine learning
下载PDF
导出
摘要 针对现有的基于蛋白质相互作用(PPI)网络的蛋白质功能预测方法预测精度不高、易受数据噪声影响的问题,提出一种基于机器学习(层次聚类、主成分分析和多层感知器)的蛋白质功能预测方法 HPMM。该方法综合考虑蛋白质宏观和微观层面的信息,将蛋白质家族、结构域和重要位点信息作为顶点属性整合到PPI网络中以减轻网络中数据噪声的影响。首先,基于层次聚类和主成分分析进行特征提取,得到功能模块和属性主成分特征,然后训练多层感知器模型,建立多特征与多功能之间的映射关系以用于功能预测。在三个分别被分子功能(MF)、生物过程(BP)和细胞组件(CC)注释的人类PPI网络上进行测试,对HPMM、余弦迭代算法(CIA)和有向PPI网络基因本体术语传播(GoDIN)算法的功能预测效果进行比较分析。实验结果表明,相比CIA和GoDIN这两种完全基于PPI网络的方法,HPMM的精确度与F值更高。 Aiming at the problem that the prediction method of protein function based on the current Protein-Protein Interaction (PPI) network has low precision and is susceptible to data noise, a new machine learning protein function prediction method named HPMM (HC, PCA and MLP based Method) was proposed, which combined Hierarchical Clustering (HC), Principal Component Analysis (PCA) and Multi-layer Perception (MLP). HPMM took comprehensive consideration from macro and micro perspectives, It combined the information of protein families, domains and important sites into the vertex attributes of PPI networks to alleviate the effect from the data noise of networks. Firstly, the features of function modules and attribute principal components were extracted by using HC and PCA. Secondly, a mapping relationship between muhi-feature and multi-function, used to predict protein functions, was constructed by training the MLP model. Three homo sapiens PPI networks, which were annotated by Molecular Functions (MF), Biological Processes (BP), and Cellular Components (CC) respectively, were adopted in the experiments. Comparisons were performed among the HPMM algorithm, the Cosine Iterative Algorithm (CIA) and the Diffusing GO Terms in the Directed PPI Network (GoDIN) Algorithm. The experimental results indicate that HPMM can obtain higher precision and F-measure than algorithms CIA and GoDIN, which are purely PPI network based methods.
出处 《计算机应用》 CSCD 北大核心 2018年第3期722-727,共6页 journal of Computer Applications
基金 国家自然科学基金资助项目(61363035 61762015) 广西自然科学基金资助项目(2015GXNSFAA139288) "八桂学者"工程专项 广西多源信息挖掘与安全重点实验室系统性研究基金资助项目(14-A-03-02 15-A-03-02) 广西研究生教育创新计划项目(XYCSZ2017067)~~
关键词 功能预测 机器学习 蛋白质相互作用 层次聚类 主成分分析 多层感知器 function prediction machine learning Protein-Protein Interaction (PPI) Hierarchical Clustering (HC) Principal Component Analysis (PCA) Multi-Layer Perceptron (MLP)
  • 相关文献

参考文献1

二级参考文献12

  • 1刘幺和,陈睿,彭伟,周蕾.一种BP神经网络学习率的优化设计[J].湖北工业大学学报,2007,22(3):1-3. 被引量:15
  • 2JACOBS R A. Increased rates of convergence through learning rate adaptation[J]. Neural networks, 1988, 1(4): 295-307.
  • 3RIEDMILLER M, BRAUN H. RPROP-A fast adaptive learning algorithm[C]//Proceedings of the International Symposium on Computer and Information Sciences (ISCIS VII).Ankara, Turkey, 1992.
  • 4CHARALAMBOUS C. Conjugate gradient algorithm for efficient training of artificial neural networks[J]. Devices and systems, IEE proceedings G-Circuits, 1992, 139(3): 301-310.
  • 5VOGL T P, MANGIS J K, RIGLER A K, et al. Accelerating the convergence of the back-propagation method[J]. Biological cybernetics, 1988, 59(4/5): 257-263.
  • 6DENNIS J E Jr, SCHNABEL R B. Numerical methods for unconstrained optimization and nonlinear equations[M]. Philadelphia, USA: SIAM, 1996.
  • 7MOR J J. The Levenberg-Marquardt algorithm: implementation and theory[M]//WATSON G A. Numerical Analysis. Berlin Heidelberg: Springer, 1978: 105-116.
  • 8HAM F M, KOSTANIC I. Principles of neurocomputing for science and engineering[M]. New York, NY: McGraw-Hill Science, 2000.
  • 9http://archive.ics.uci.edu/ml/datasets/seeds.
  • 10侯祥林,陈长征,虞和济,王铁光,纪盛青.神经网络权值和阈值的优化方法[J].东北大学学报(自然科学版),1999,20(4):447-450. 被引量:49

共引文献4

同被引文献30

引证文献7

二级引证文献19

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部