To explore the molecular mechanism of Ind-igo Naturalis in intervening chronic myelocytic leukemia (CML) under the guidance of protein-protein interaction network, the molecular docking technique and in vitro c...To explore the molecular mechanism of Ind-igo Naturalis in intervening chronic myelocytic leukemia (CML) under the guidance of protein-protein interaction network, the molecular docking technique and in vitro cell experiment were chosen. CML-related genes were obtained from the online mendelian inheritance in man database (OMIM), then String 10. 0 was used for text mining and constructing the CML protein-protein interaction network. The interaction data were input in Cytoscape 3. 4. 0 software. Plug-in CentiScaPe 2. 1 was used for implement topology analysis. Small active substances of Indigo Naturalis were obtained from a third-party database, which were optimized by Chemoffice 8. 0 and Sybyl 8. 1, then small molecular ligand library was obtained. The molecular docking was carried out by Surflex-Dock module, the key target was received after scoring. Protein-protein interaction network of CML was constructed, which was consisted of 425 nodes ( proteins) and 2 799 sides ( interactions). The key gene J.AK2 was got. CML is a polygenic disease and JAK2 is likely to be a key node.展开更多
蛋白质是生命活动的重要物质基础,对其功能的准确标注可以极大地促进生命科学的研究与发展.已有的蛋白质功能预测方法通常仅关注利用蛋白质具有某些功能的信息(正样例),并没有关注利用蛋白质不相关的功能信息(负样例).已有研究表明,结...蛋白质是生命活动的重要物质基础,对其功能的准确标注可以极大地促进生命科学的研究与发展.已有的蛋白质功能预测方法通常仅关注利用蛋白质具有某些功能的信息(正样例),并没有关注利用蛋白质不相关的功能信息(负样例).已有研究表明,结合蛋白质负样例可以降低蛋白质功能预测的复杂度并提高预测精度.本文提出一种基于降维的蛋白质不相关功能预测方法 (predicting irrelevant functions of proteins based on dimensionality reduction,IFDR).IFDR通过在蛋白质互作网邻接矩阵和蛋白质–功能标记关联矩阵上分别进行随机游走,挖掘蛋白质之间的内在关系和预估蛋白质的缺失功能标记,再分别利用奇异值分解将上述2个矩阵投影降维为低维实数矩阵,最后利用半监督回归预测负样例.在酵母菌、人类和拟南芥的蛋白质数据集上的实验表明,IFDR比已有相关算法能够更准确地预测负样例,对互作网络和功能标记空间的降维均可以提高负样例预测精度.展开更多
Predicting protein functions is an important issue in the post-genomic era. This paper studies several network-based kernels including local linear embedding (LLE) kernel method, diffusion kernel and laplacian kerne...Predicting protein functions is an important issue in the post-genomic era. This paper studies several network-based kernels including local linear embedding (LLE) kernel method, diffusion kernel and laplacian kernel to uncover the relationship between proteins functions and protein-protein interactions (PPI). The author first construct kernels based on PPI networks, then apply support vector machine (SVM) techniques to classify proteins into different functional groups. The 5-fold cross validation is then applied to the selected 359 GO terms to compare the performance of different kernels and guilt-by-association methods including neighbor counting methods and Chi-square methods. Finally, the authors conduct predictions of functions of some unknown genes and verify the preciseness of our prediction in part by the information of other data source.展开更多
Identification of disease-causing genes among a large number of candidates is a fundamental challenge in human disease studies.However,it is still time-consuming and laborious to determine the real disease-causing gen...Identification of disease-causing genes among a large number of candidates is a fundamental challenge in human disease studies.However,it is still time-consuming and laborious to determine the real disease-causing genes by biological experiments.With the advances of the high-throughput techniques,a large number of protein-protein interactions have been produced.Therefore,to address this issue,several methods based on protein interaction network have been proposed.In this paper,we propose a shortest path-based algorithm,named SPranker,to prioritize disease-causing genes in protein interaction networks.Considering the fact that diseases with similar phenotypes are generally caused by functionally related genes,we further propose an improved algorithm SPGOranker by integrating the semantic similarity of gene ontology(GO)annotations.SPGOranker not only considers the topological similarity between protein pairs in a protein interaction network but also takes their functional similarity into account.The proposed algorithms SPranker and SPGOranker were applied to 1598 known orphan disease-causing genes from 172 orphan diseases and compared with three state-of-the-art approaches,ICN,VS and RWR.The experimental results show that SPranker and SPGOranker outperform ICN,VS,and RWR for the prioritization of orphan disease-causing genes.Importantly,for the case study of severe combined immunodeficiency,SPranker and SPGOranker predict several novel causal genes.展开更多
Protein complexes are the basic units of macro-molecular organizations and help us to understand the cell's mechanism.The development of the yeast two-hybrid,tandem affinity purification,and mass spectrometry high...Protein complexes are the basic units of macro-molecular organizations and help us to understand the cell's mechanism.The development of the yeast two-hybrid,tandem affinity purification,and mass spectrometry high-throughput proteomic techniques supplies a large amount of protein-protein interaction data,which make it possible to predict overlapping complexes through computational methods.Research shows that overlapping complexes can contribute to identifying essential proteins,which are necessary for the organism to survive and reproduce,and for life's activities.Scholars pay more attention to the evaluation of protein complexes.However,few of them focus on predicted overlaps.In this paper,an evaluation criterion called overlap maximum matching ratio(OMMR) is proposed to analyze the similarity between the identified overlaps and the benchmark overlap modules.Comparison of essential proteins and gene ontology(GO) analysis are also used to assess the quality of overlaps.We perform a comprehensive comparison of serveral overlapping complexes prediction approaches,using three yeast protein-protein interaction(PPI) networks.We focus on the analysis of overlaps identified by these algorithms.Experimental results indicate the important of overlaps and reveal the relationship between overlaps and identification of essential proteins.展开更多
Informative proteins are the proteins that play critical functional roles inside cells.They are the fundamental knowledge of translating bioinformatics into clinical practices.Many methods of identifying informative b...Informative proteins are the proteins that play critical functional roles inside cells.They are the fundamental knowledge of translating bioinformatics into clinical practices.Many methods of identifying informative biomarkers have been developed which are heuristic and arbitrary,without considering the dynamics characteristics of biological processes.In this paper,we present a generative model of identifying the informative proteins by systematically analyzing the topological variety of dynamic protein-protein interaction networks(PPINs).In this model,the common representation of multiple PPINs is learned using a deep feature generation model,based on which the original PPINs are rebuilt and the reconstruction errors are analyzed to locate the informative proteins.Experiments were implemented on data of yeast cell cycles and different prostate cancer stages.We analyze the effectiveness of reconstruction by comparing different methods,and the ranking results of informative proteins were also compared with the results from the baseline methods.Our method is able to reveal the critical members in the dynamic progresses which can be further studied to testify the possibilities for biomarker research.展开更多
Proteomic analysis of upland cotton was performed to profile the global detectable proteomes of ovules and fibers using two-dimensional electrophoresis(2DE).A total of 1,203 independent protein spots were collected fr...Proteomic analysis of upland cotton was performed to profile the global detectable proteomes of ovules and fibers using two-dimensional electrophoresis(2DE).A total of 1,203 independent protein spots were collected from representative 2DE gels,which were digested with trypsin and identified by matrix-assisted laser desorption and ionization-time-offlight/time-of-flight(MALDI-TOF/TOF)mass spectrometry.The mass spectrometry or tandem mass spectrometry(MS or MS/MS)data were then searched against a local database constructed from Gossypium hirsutum genome sequences,resulting in successful identification of 975 protein spots(411 for ovules and 564 for fibers).Functional annotation analysis of the 975identified proteins revealed that ovule-specific proteins were mainly enriched in functions related to fatty acid elongation,sulfur amino acid metabolism and post-replication repair,while fiber-specific proteins were enriched in functions related to root hair elongation,galactose metabolism and D-xylose metabolic processes.Further annotation analysis of the most abundant protein spots showed that 28.96%of the total proteins in the ovule were mainly located in the Golgi apparatus,endoplasmic reticulum,mitochondrion and ribosome,whereas in fibers,27.02%of the total proteins were located in the cytoskeleton,nuclear envelope and cell wall.Quantitative real-time polymerase chain reaction(q RT-PCR)analyses of the ovule-specific protein spots P61,P93 and P198 and fiber-specific protein spots 230,477 and 511 were performed to validate the proteomics data.Protein-protein interaction network analyses revealed very different network cluster patterns between ovules and fibers.This work provides the largest protein identification dataset of 2DE-detectable proteins in cotton ovules and fibers and indicates potentially important roles of tissue-specific proteins,thus providing insights into the cotton ovule and fiber proteomes on a global scale.展开更多
Bone mesenchymal stem cells(BMSCs) differentiated into neurons have been widely proposed for use in cell therapy of many neurological disorders. It is therefore important to understand the molecular mechanisms under...Bone mesenchymal stem cells(BMSCs) differentiated into neurons have been widely proposed for use in cell therapy of many neurological disorders. It is therefore important to understand the molecular mechanisms underlying this differentiation. We screened differentially expressed genes between immature neural tissues and untreated BMSCs to identify the genes responsible for neuronal differentiation from BMSCs. GSE68243 gene microarray data of rat BMSCs and GSE18860 gene microarray data of rat neurons were received from the Gene Expression Omnibus database. Transcriptome Analysis Console software showed that 1248 genes were up-regulated and 1273 were down-regulated in neurons compared with BMSCs. Gene Ontology functional enrichment, protein-protein interaction networks, functional modules, and hub genes were analyzed using DAVID, STRING 10, BiN GO tool, and Network Analyzer software, revealing that nine hub genes, Nrcam, Sema3 a, Mapk8, Dlg4, Slit1, Creb1, Ntrk2, Cntn2, and Pax6, may play a pivotal role in neuronal differentiation from BMSCs. Seven genes, Dcx, Nrcam, Sema3 a, Cntn2, Slit1, Ephb1, and Pax6, were shown to be hub nodes within the neuronal development network, while six genes, Fgf2, Tgfβ1, Vegfa, Serpine1, Il6, and Stat1, appeared to play an important role in suppressing neuronal differentiation. However, additional studies are required to confirm these results.展开更多
Ongoing improvements in Computational Biology research have generated massive amounts of Protein-Protein Interactions (PPIs) dataset. In this regard, the availability of PPI data for several organisms provoke the di...Ongoing improvements in Computational Biology research have generated massive amounts of Protein-Protein Interactions (PPIs) dataset. In this regard, the availability of PPI data for several organisms provoke the discovery of computational methods for measurements, analysis, modeling, comparisons, clustering and alignments of biological data networks. Nevertheless, fixed network comparison is computationally stubborn and as a result several methods have been used instead. We illustrate a prohabilistic approach among proteins nodes that are part of various networks by using Chapman-Kolmogorov (CK) formula. We have compared CK formula with semi-Markov random method, SMETANA. We significantly noticed that CK outperforms the SMETANA in all respects such as efficiency, speed, space and complexity. We have modified the SMETANA source codes available in MATLAB in the light of CK formula. Discriminant-Expectation Maximization (D-EM) accesses the parameters of a protein network datasets and determines a linear transformation to simplify the assumption of probabilistic format of data distributions and find good features dynamically. Our implementation finds that D-EM has a satisfactory performance in protein network alignment applications.展开更多
文摘To explore the molecular mechanism of Ind-igo Naturalis in intervening chronic myelocytic leukemia (CML) under the guidance of protein-protein interaction network, the molecular docking technique and in vitro cell experiment were chosen. CML-related genes were obtained from the online mendelian inheritance in man database (OMIM), then String 10. 0 was used for text mining and constructing the CML protein-protein interaction network. The interaction data were input in Cytoscape 3. 4. 0 software. Plug-in CentiScaPe 2. 1 was used for implement topology analysis. Small active substances of Indigo Naturalis were obtained from a third-party database, which were optimized by Chemoffice 8. 0 and Sybyl 8. 1, then small molecular ligand library was obtained. The molecular docking was carried out by Surflex-Dock module, the key target was received after scoring. Protein-protein interaction network of CML was constructed, which was consisted of 425 nodes ( proteins) and 2 799 sides ( interactions). The key gene J.AK2 was got. CML is a polygenic disease and JAK2 is likely to be a key node.
文摘蛋白质是生命活动的重要物质基础,对其功能的准确标注可以极大地促进生命科学的研究与发展.已有的蛋白质功能预测方法通常仅关注利用蛋白质具有某些功能的信息(正样例),并没有关注利用蛋白质不相关的功能信息(负样例).已有研究表明,结合蛋白质负样例可以降低蛋白质功能预测的复杂度并提高预测精度.本文提出一种基于降维的蛋白质不相关功能预测方法 (predicting irrelevant functions of proteins based on dimensionality reduction,IFDR).IFDR通过在蛋白质互作网邻接矩阵和蛋白质–功能标记关联矩阵上分别进行随机游走,挖掘蛋白质之间的内在关系和预估蛋白质的缺失功能标记,再分别利用奇异值分解将上述2个矩阵投影降维为低维实数矩阵,最后利用半监督回归预测负样例.在酵母菌、人类和拟南芥的蛋白质数据集上的实验表明,IFDR比已有相关算法能够更准确地预测负样例,对互作网络和功能标记空间的降维均可以提高负样例预测精度.
基金This research is supported in part by HKRGC Grant 7017/07P, HKU CRCG Grants, HKU strategic theme grant on computational sciences, HKU Hung Hing Ying Physical Science Research Grant, National Natural Science Foundation of China Grant No. 10971075 and Guangdong Provincial Natural Science Grant No. 9151063101000021. The preliminary version of this paper has been presented in the OSB2009 conference and published in the corresponding conference proceedings[25]. The authors would like to thank the anonymous referees for their helpful comments and suggestions.
文摘Predicting protein functions is an important issue in the post-genomic era. This paper studies several network-based kernels including local linear embedding (LLE) kernel method, diffusion kernel and laplacian kernel to uncover the relationship between proteins functions and protein-protein interactions (PPI). The author first construct kernels based on PPI networks, then apply support vector machine (SVM) techniques to classify proteins into different functional groups. The 5-fold cross validation is then applied to the selected 359 GO terms to compare the performance of different kernels and guilt-by-association methods including neighbor counting methods and Chi-square methods. Finally, the authors conduct predictions of functions of some unknown genes and verify the preciseness of our prediction in part by the information of other data source.
基金supported in part by the National Natural Science Foundation of China(61370024,61428209,61232001)Program for New Century Excellent Talents in University(NCET-12-0547)
文摘Identification of disease-causing genes among a large number of candidates is a fundamental challenge in human disease studies.However,it is still time-consuming and laborious to determine the real disease-causing genes by biological experiments.With the advances of the high-throughput techniques,a large number of protein-protein interactions have been produced.Therefore,to address this issue,several methods based on protein interaction network have been proposed.In this paper,we propose a shortest path-based algorithm,named SPranker,to prioritize disease-causing genes in protein interaction networks.Considering the fact that diseases with similar phenotypes are generally caused by functionally related genes,we further propose an improved algorithm SPGOranker by integrating the semantic similarity of gene ontology(GO)annotations.SPGOranker not only considers the topological similarity between protein pairs in a protein interaction network but also takes their functional similarity into account.The proposed algorithms SPranker and SPGOranker were applied to 1598 known orphan disease-causing genes from 172 orphan diseases and compared with three state-of-the-art approaches,ICN,VS and RWR.The experimental results show that SPranker and SPGOranker outperform ICN,VS,and RWR for the prioritization of orphan disease-causing genes.Importantly,for the case study of severe combined immunodeficiency,SPranker and SPGOranker predict several novel causal genes.
基金Project supported by the National Scientific Research Foundation of Hunan Province, China (Nos. 14C0096, 10C0408, and 10B010), the Natural Science Foundation of Hunan Province, China (Nos. 13JJ4106 and 14J J3138), and the Science and Technology Plan Project of Hunan Province, China (No. 2010FJ3044)
文摘Protein complexes are the basic units of macro-molecular organizations and help us to understand the cell's mechanism.The development of the yeast two-hybrid,tandem affinity purification,and mass spectrometry high-throughput proteomic techniques supplies a large amount of protein-protein interaction data,which make it possible to predict overlapping complexes through computational methods.Research shows that overlapping complexes can contribute to identifying essential proteins,which are necessary for the organism to survive and reproduce,and for life's activities.Scholars pay more attention to the evaluation of protein complexes.However,few of them focus on predicted overlaps.In this paper,an evaluation criterion called overlap maximum matching ratio(OMMR) is proposed to analyze the similarity between the identified overlaps and the benchmark overlap modules.Comparison of essential proteins and gene ontology(GO) analysis are also used to assess the quality of overlaps.We perform a comprehensive comparison of serveral overlapping complexes prediction approaches,using three yeast protein-protein interaction(PPI) networks.We focus on the analysis of overlaps identified by these algorithms.Experimental results indicate the important of overlaps and reveal the relationship between overlaps and identification of essential proteins.
基金supported by National Natural Science Foundation of China(30970780)Ph.D.Programs Foundation of Ministry of Education of China(20091103110005)+4 种基金the Project for the Innovation Team of Beijing,National Natural Science Foundation of China(81370038)the Beijing Natural Science Foundation(7142012)the Science and Technology Project of Beijing Municipal Education Commission(km201410005003)the Rixin Fund of Beijing University of Technology(2013-RX-L04)the Basic Research Fund of Beijing University of Technology
文摘Informative proteins are the proteins that play critical functional roles inside cells.They are the fundamental knowledge of translating bioinformatics into clinical practices.Many methods of identifying informative biomarkers have been developed which are heuristic and arbitrary,without considering the dynamics characteristics of biological processes.In this paper,we present a generative model of identifying the informative proteins by systematically analyzing the topological variety of dynamic protein-protein interaction networks(PPINs).In this model,the common representation of multiple PPINs is learned using a deep feature generation model,based on which the original PPINs are rebuilt and the reconstruction errors are analyzed to locate the informative proteins.Experiments were implemented on data of yeast cell cycles and different prostate cancer stages.We analyze the effectiveness of reconstruction by comparing different methods,and the ranking results of informative proteins were also compared with the results from the baseline methods.Our method is able to reveal the critical members in the dynamic progresses which can be further studied to testify the possibilities for biomarker research.
基金the Special Fund for Agro-scientific Research in the Public Interest of the People’s Republic of China (201403075)Major Technology Project of Hainan (ZDZX2013010-1)+1 种基金Program for Top Young Talents in the Chinese Academy of Tropical Agricultural Sciences (ITBB130102)China Postdoctoral Science Foundation (20110490003)
文摘Proteomic analysis of upland cotton was performed to profile the global detectable proteomes of ovules and fibers using two-dimensional electrophoresis(2DE).A total of 1,203 independent protein spots were collected from representative 2DE gels,which were digested with trypsin and identified by matrix-assisted laser desorption and ionization-time-offlight/time-of-flight(MALDI-TOF/TOF)mass spectrometry.The mass spectrometry or tandem mass spectrometry(MS or MS/MS)data were then searched against a local database constructed from Gossypium hirsutum genome sequences,resulting in successful identification of 975 protein spots(411 for ovules and 564 for fibers).Functional annotation analysis of the 975identified proteins revealed that ovule-specific proteins were mainly enriched in functions related to fatty acid elongation,sulfur amino acid metabolism and post-replication repair,while fiber-specific proteins were enriched in functions related to root hair elongation,galactose metabolism and D-xylose metabolic processes.Further annotation analysis of the most abundant protein spots showed that 28.96%of the total proteins in the ovule were mainly located in the Golgi apparatus,endoplasmic reticulum,mitochondrion and ribosome,whereas in fibers,27.02%of the total proteins were located in the cytoskeleton,nuclear envelope and cell wall.Quantitative real-time polymerase chain reaction(q RT-PCR)analyses of the ovule-specific protein spots P61,P93 and P198 and fiber-specific protein spots 230,477 and 511 were performed to validate the proteomics data.Protein-protein interaction network analyses revealed very different network cluster patterns between ovules and fibers.This work provides the largest protein identification dataset of 2DE-detectable proteins in cotton ovules and fibers and indicates potentially important roles of tissue-specific proteins,thus providing insights into the cotton ovule and fiber proteomes on a global scale.
基金Project supported by the Key Project of Hebei North University(No.120177)the Science and Technology Research Project of Hebei Province Department Institutions of Higher Learning(No.Z2015047),China
文摘Bone mesenchymal stem cells(BMSCs) differentiated into neurons have been widely proposed for use in cell therapy of many neurological disorders. It is therefore important to understand the molecular mechanisms underlying this differentiation. We screened differentially expressed genes between immature neural tissues and untreated BMSCs to identify the genes responsible for neuronal differentiation from BMSCs. GSE68243 gene microarray data of rat BMSCs and GSE18860 gene microarray data of rat neurons were received from the Gene Expression Omnibus database. Transcriptome Analysis Console software showed that 1248 genes were up-regulated and 1273 were down-regulated in neurons compared with BMSCs. Gene Ontology functional enrichment, protein-protein interaction networks, functional modules, and hub genes were analyzed using DAVID, STRING 10, BiN GO tool, and Network Analyzer software, revealing that nine hub genes, Nrcam, Sema3 a, Mapk8, Dlg4, Slit1, Creb1, Ntrk2, Cntn2, and Pax6, may play a pivotal role in neuronal differentiation from BMSCs. Seven genes, Dcx, Nrcam, Sema3 a, Cntn2, Slit1, Ephb1, and Pax6, were shown to be hub nodes within the neuronal development network, while six genes, Fgf2, Tgfβ1, Vegfa, Serpine1, Il6, and Stat1, appeared to play an important role in suppressing neuronal differentiation. However, additional studies are required to confirm these results.
文摘Ongoing improvements in Computational Biology research have generated massive amounts of Protein-Protein Interactions (PPIs) dataset. In this regard, the availability of PPI data for several organisms provoke the discovery of computational methods for measurements, analysis, modeling, comparisons, clustering and alignments of biological data networks. Nevertheless, fixed network comparison is computationally stubborn and as a result several methods have been used instead. We illustrate a prohabilistic approach among proteins nodes that are part of various networks by using Chapman-Kolmogorov (CK) formula. We have compared CK formula with semi-Markov random method, SMETANA. We significantly noticed that CK outperforms the SMETANA in all respects such as efficiency, speed, space and complexity. We have modified the SMETANA source codes available in MATLAB in the light of CK formula. Discriminant-Expectation Maximization (D-EM) accesses the parameters of a protein network datasets and determines a linear transformation to simplify the assumption of probabilistic format of data distributions and find good features dynamically. Our implementation finds that D-EM has a satisfactory performance in protein network alignment applications.