Genes associated with similar diseases are often functionally related.This principle is largely supported by many biological data sources,such as disease phenotype similarities,protein complexes,protein-protein intera...Genes associated with similar diseases are often functionally related.This principle is largely supported by many biological data sources,such as disease phenotype similarities,protein complexes,protein-protein interactions,pathways and gene expression profiles.Integrating multiple types of biological data is an effective method to identify disease genes for many genetic diseases.To capture the gene-disease associations based on biological networks,a kernel-based Markov random field(MRF)method is proposed by combining graph kernels and the MRF method.In the proposed method,three kinds of kernels are employed to describe the overall relationships of vertices in five biological networks,respectively,and a novel weighted MRF method is developed to integrate those data.In addition,an improved Gibbs sampling procedure and a novel parameter estimation method are proposed to generate predictions from the kernel-based MRF method.Numerical experiments are carried out by integrating known gene-disease associations,protein complexes,protein-protein interactions,pathways and gene expression profiles.The proposed kernel-based MRF method is evaluated by the leave-one-out cross validation paradigm,achieving an AUC score of 0.771 when integrating all those biological data in our experiments,which indicates that our proposed method is very promising compared with many existing methods.展开更多
Predicting protein functions is an important issue in the post-genomic era. This paper studies several network-based kernels including local linear embedding (LLE) kernel method, diffusion kernel and laplacian kerne...Predicting protein functions is an important issue in the post-genomic era. This paper studies several network-based kernels including local linear embedding (LLE) kernel method, diffusion kernel and laplacian kernel to uncover the relationship between proteins functions and protein-protein interactions (PPI). The author first construct kernels based on PPI networks, then apply support vector machine (SVM) techniques to classify proteins into different functional groups. The 5-fold cross validation is then applied to the selected 359 GO terms to compare the performance of different kernels and guilt-by-association methods including neighbor counting methods and Chi-square methods. Finally, the authors conduct predictions of functions of some unknown genes and verify the preciseness of our prediction in part by the information of other data source.展开更多
The binding of Endonuclease colicin 9 (E9) by Immunity protein 9 (Im9) was found to involve some hotspots from helix III of Im9 on protein-protein interface that contribute the dominant binding energy to the complex.I...The binding of Endonuclease colicin 9 (E9) by Immunity protein 9 (Im9) was found to involve some hotspots from helix III of Im9 on protein-protein interface that contribute the dominant binding energy to the complex.In the current work,MD simulations of the WT and three hotspot mutants (D51A,Y54A and Y55A of Im9) of the E9-Im9 complexes were carried out to investigate specific interaction mechanisms of these three hotspot residues.The changes of binding energy between the WT and mutants of the complex were computed by the MM/PBSA method using a polarized force field and were in excellent agreement with experiment values,verifying that these three residues were indeed hotspots of the binding complex.Energy decomposition analysis revealed that binding by D51 to E9 was dominated by electrostatic interaction due to the presence of the carboxyl group of Asp51 which hydrogen bonds to K89.For binding by hotspots Y54 and Y55,van der Waals interaction from the aromatic side chain of tyrosine provided the dominant interaction.For comparison,calculation by using the standard (nonpolarizable) AMBER99SB force field produced binding energy changes from these mutations in opposite direction to the experimental observation.Dynamic hydrogen bond analysis showed that conformations sampled from MD simulation in the standard AMBER force field were distorted from the native state and they disrupted the inter-protein hydrogen bond network of the protein-protein complex.The current work further demonstrated that electrostatic polarization plays a critical role in modulating protein-protein binding.展开更多
基金supported by the Natural Sciences and Engineering Research Council of CanadaNational Natural Science Foundation of China(61428209,61232001)
文摘Genes associated with similar diseases are often functionally related.This principle is largely supported by many biological data sources,such as disease phenotype similarities,protein complexes,protein-protein interactions,pathways and gene expression profiles.Integrating multiple types of biological data is an effective method to identify disease genes for many genetic diseases.To capture the gene-disease associations based on biological networks,a kernel-based Markov random field(MRF)method is proposed by combining graph kernels and the MRF method.In the proposed method,three kinds of kernels are employed to describe the overall relationships of vertices in five biological networks,respectively,and a novel weighted MRF method is developed to integrate those data.In addition,an improved Gibbs sampling procedure and a novel parameter estimation method are proposed to generate predictions from the kernel-based MRF method.Numerical experiments are carried out by integrating known gene-disease associations,protein complexes,protein-protein interactions,pathways and gene expression profiles.The proposed kernel-based MRF method is evaluated by the leave-one-out cross validation paradigm,achieving an AUC score of 0.771 when integrating all those biological data in our experiments,which indicates that our proposed method is very promising compared with many existing methods.
基金This research is supported in part by HKRGC Grant 7017/07P, HKU CRCG Grants, HKU strategic theme grant on computational sciences, HKU Hung Hing Ying Physical Science Research Grant, National Natural Science Foundation of China Grant No. 10971075 and Guangdong Provincial Natural Science Grant No. 9151063101000021. The preliminary version of this paper has been presented in the OSB2009 conference and published in the corresponding conference proceedings[25]. The authors would like to thank the anonymous referees for their helpful comments and suggestions.
文摘Predicting protein functions is an important issue in the post-genomic era. This paper studies several network-based kernels including local linear embedding (LLE) kernel method, diffusion kernel and laplacian kernel to uncover the relationship between proteins functions and protein-protein interactions (PPI). The author first construct kernels based on PPI networks, then apply support vector machine (SVM) techniques to classify proteins into different functional groups. The 5-fold cross validation is then applied to the selected 359 GO terms to compare the performance of different kernels and guilt-by-association methods including neighbor counting methods and Chi-square methods. Finally, the authors conduct predictions of functions of some unknown genes and verify the preciseness of our prediction in part by the information of other data source.
基金the National Natural Science Foundation of China(21003048,10974054,and 20933002)Shanghai PuJiang Program (09PJ1404000) for financial support XXY is also supported by "Scientific Research Foundation for Agricultural Machinery Bureau of Jiangsu Province (gxz10008)"CGJ is also supported by "the Fundamental Research Funds for the Central Universities"
文摘The binding of Endonuclease colicin 9 (E9) by Immunity protein 9 (Im9) was found to involve some hotspots from helix III of Im9 on protein-protein interface that contribute the dominant binding energy to the complex.In the current work,MD simulations of the WT and three hotspot mutants (D51A,Y54A and Y55A of Im9) of the E9-Im9 complexes were carried out to investigate specific interaction mechanisms of these three hotspot residues.The changes of binding energy between the WT and mutants of the complex were computed by the MM/PBSA method using a polarized force field and were in excellent agreement with experiment values,verifying that these three residues were indeed hotspots of the binding complex.Energy decomposition analysis revealed that binding by D51 to E9 was dominated by electrostatic interaction due to the presence of the carboxyl group of Asp51 which hydrogen bonds to K89.For binding by hotspots Y54 and Y55,van der Waals interaction from the aromatic side chain of tyrosine provided the dominant interaction.For comparison,calculation by using the standard (nonpolarizable) AMBER99SB force field produced binding energy changes from these mutations in opposite direction to the experimental observation.Dynamic hydrogen bond analysis showed that conformations sampled from MD simulation in the standard AMBER force field were distorted from the native state and they disrupted the inter-protein hydrogen bond network of the protein-protein complex.The current work further demonstrated that electrostatic polarization plays a critical role in modulating protein-protein binding.