Essential proteins are inseparable in cell growth and survival. The study of essential proteins is important for understanding cellular functions and biological mechanisms. Therefore, various computable methods have b...Essential proteins are inseparable in cell growth and survival. The study of essential proteins is important for understanding cellular functions and biological mechanisms. Therefore, various computable methods have been proposed to identify essential proteins. Unfortunately, most methods based on network topology only consider the interactions between a protein and its neighboring proteins, and not the interactions with its higher-order distance proteins. In this paper, we propose the DSEP algorithm in which we integrated network topology properties and subcellular localization information in protein–protein interaction(PPI) networks based on four-order distances, and then used random walks to identify the essential proteins. We also propose a method to calculate the finite-order distance of the network, which can greatly reduce the time complexity of our algorithm. We conducted a comprehensive comparison of the DSEP algorithm with 11 existing classical algorithms to identify essential proteins with multiple evaluation methods. The results show that DSEP is superior to these 11 methods.展开更多
Essential proteins play an important role in disease diagnosis and drug development.Many methods have been devoted to the essential protein prediction by using some kinds of biological information.However,they either ...Essential proteins play an important role in disease diagnosis and drug development.Many methods have been devoted to the essential protein prediction by using some kinds of biological information.However,they either ignore the noise presented in the biological information itself or the noise generated during feature extraction.To overcome these problems,in this paper,we propose a novel method for predicting essential proteins called attention gate-graph attention network and temporal convolutional network(AG-GATCN).In AG-GATCN method,we use improved temporal convolutional network(TCN)to extract features from gene expression sequence.To address the noise in the gene expression sequence itself and the noise generated after the dilated causal convolution,we introduce attention mechanism and gating mechanism in TCN.In addition,we use graph attention network(GAT)to extract protein–protein interaction(PPI)network features,in which we construct the feature matrix by introducing node2vec technique and 7 centrality metrics,and to solve the GAT oversmoothing problem,we introduce gated tanh unit(GTU)in GAT.Finally,two types of features are integrated by us to predict essential proteins.Compared with the existing methods for predicting essential proteins,the experimental results show that AG-GATCN achieves better performance.展开更多
Predicting essential proteins is crucial for discovering the process of cellular organization and viability.We propose biased random walk with restart algorithm for essential proteins prediction,called BRWR.Firstly,th...Predicting essential proteins is crucial for discovering the process of cellular organization and viability.We propose biased random walk with restart algorithm for essential proteins prediction,called BRWR.Firstly,the common process of practice walk often sets the probability of particles transferring to adjacent nodes to be equal,neglecting the influence of the similarity structure on the transition probability.To address this problem,we redefine a novel transition probability matrix by integrating the gene express similarity and subcellular location similarity.The particles can obtain biased transferring probabilities to perform random walk so as to further exploit biological properties embedded in the network structure.Secondly,we use gene ontology(GO)terms score and subcellular score to calculate the initial probability vector of the random walk with restart.Finally,when the biased random walk with restart process reaches steady state,the protein importance score is obtained.In order to demonstrate superiority of BRWR,we conduct experiments on the YHQ,BioGRID,Krogan and Gavin PPI networks.The results show that the method BRWR is superior to other state-of-the-art methods in essential proteins recognition performance.Especially,compared with the contrast methods,the improvements of BRWR in terms of the ACC results range in 1.4%–5.7%,1.3%–11.9%,2.4%–8.8%,and 0.8%–14.2%,respectively.Therefore,BRWR is effective and reasonable.展开更多
Essential proteins play a vital role in biological processes,and the combination of gene expression profiles with Protein-Protein Interaction(PPI)networks can improve the identification of essential proteins.However,g...Essential proteins play a vital role in biological processes,and the combination of gene expression profiles with Protein-Protein Interaction(PPI)networks can improve the identification of essential proteins.However,gene expression data are prone to significant fluctuations due to noise interference in topological networks.In this work,we discretized gene expression data and used the discrete similarities of the gene expression spectrum to eliminate noise fluctuation.We then proposed the Pearson Jaccard coefficient(PJC)that consisted of continuous and discrete similarities in the gene expression data.Using the graph theory as the basis,we fused the newly proposed similarity coefficient with the existing network topology prediction algorithm at each protein node to recognize essential proteins.This strategy exhibited a high recognition rate and good specificity.We validated the new similarity coefficient PJC on PPI datasets of Krogan,Gavin,and DIP of yeast species and evaluated the results by receiver operating characteristic analysis,jackknife analysis,top analysis,and accuracy analysis.Compared with that of node-based network topology centrality and fusion biological information centrality methods,the new similarity coefficient PJC showed a significantly improved prediction performance for essential proteins in DC,IC,Eigenvector centrality,subgraph centrality,betweenness centrality,closeness centrality,NC,PeC,and WDC.We also compared the PJC coefficient with other methods using the NF-PIN algorithm,which predicts proteins by constructing active PPI networks through dynamic gene expression.The experimental results proved that our newly proposed similarity coefficient PJC has superior advantages in predicting essential proteins.展开更多
Identifying essential proteins from protein-protein interaction networks is important for studies onbiological evolution and new drug’s development.Most of the presented criteria for prioritizing essential proteinson...Identifying essential proteins from protein-protein interaction networks is important for studies onbiological evolution and new drug’s development.Most of the presented criteria for prioritizing essential proteinsonly focus on a certain attribute of the proteins in the network,which suffer from information loss.In order toovercome this problem,a relatively comprehensive and effective novel method for essential proteins identificationbased on improved multicriteria decision making(MCDM),called essential proteins identification-technique fororder preference by similarity to ideal solution(EPI-TOPSIS),is proposed.First,considering different attributes ofproteins,we propose three methods from different aspects to evaluate the significance of the proteins:gene-degreecentrality(GDC)for gene expression sequence;subcellular-neighbor-degree centrality(SNDC)and subcellular-indegree centrality(SIDC)for subcellular location information and protein complexes.Then,betweenness centrality(BC)and these three methods are considered together as the multiple criteria of the decision-making model.Analytic hierarchy process is used to evaluate the weights of each criterion,and the essential proteins are prioritizedby an ideal solution of MCDM,i.e.,TOPSIS.Experiments are conducted on YDIP,YMIPS,Krogan and BioGRIDnetworks.The results indicate that EPI-TOPSIS outperforms several state-of-the-art approaches for identifyingthe essential proteins through the performance measures.展开更多
The prediction of essential proteins, the minimal set required for a living cell to support cellular life, is an important task to understand the cellular processes of an organism. Fast progress in high-throughput tec...The prediction of essential proteins, the minimal set required for a living cell to support cellular life, is an important task to understand the cellular processes of an organism. Fast progress in high-throughput technologies and the production of large amounts of data enable the discovery of essential proteins at the system level by analyzing Protein-Protein Interaction (PPI) networks, and replacing biological or chemical experiments. Furthermore, additional gene-level annotation information, such as Gene Ontology (GO) terms, helps to detect essential proteins with higher accuracy. Various centrality algorithms have been used to determine essential proteins in a PPI network, and, recently motif centrality GO, which is based on network motifs and GO terms, works best in detecting essential proteins in a Baker's yeast Saccharomyces cerevisiae PPI network, compared to other centrality algorithms. However, each centrality algorithm contributes to the detection of essential proteins with different properties, which makes the integration of them a logical next step. In this paper, we construct a new feature space, named CENT-ING-GO consisting of various centrality measures and GO terms, and provide a computational approach to predict essential proteins with various machine learning techniques. The experimental results show that CENT-ING-GO feature space improves performance over the INT-GO feature space in previous work by Acencio and Lemke in 2009. We also demonstrate that pruning a PPI with informative GO terms can improve the prediction performance further.展开更多
Essential proteins are those necessary for the survival or reproduction of species and discovering such essential proteins is fundamental for understanding the minimal requirements for cellular life, which is also mea...Essential proteins are those necessary for the survival or reproduction of species and discovering such essential proteins is fundamental for understanding the minimal requirements for cellular life, which is also meaningful to the disease study and drug design. With the development of high-throughput techniques, a large number of Protein-Protein Interactions(PPIs) can be used to identify essential proteins at the network level. Up to now, though a series of network-based computational methods have been proposed, it is still a challenge to improve the prediction precision as the high false positives in PPI networks. In this paper, we propose a new method GOS to identify essential proteins by integrating the Gene expressions, Orthology, and Subcellular localization information.The gene expressions and subcellular localization information are used to determine whether a neighbor in the PPI network is reliable. Only reliable neighbors are considered when we analyze the topological characteristics of a protein in a PPI network. We also analyze the orthologous attributes of each protein to reflect its conservative features, and use a random walk model to integrate a protein's topological characteristics and its orthology. The experimental results on the yeast PPI network show that the proposed method GOS outperforms the ten existing methods DC, BC, CC, SC, EC, IC, NC, Pe C, ION, and CSC.展开更多
Essential proteins are vital to the survival of a cell. There are various features related to the essentiality of proteins, such as biological and topological features. Many computational methods have been developed t...Essential proteins are vital to the survival of a cell. There are various features related to the essentiality of proteins, such as biological and topological features. Many computational methods have been developed to identify essential proteins by using these features. However, it is still a big challenge to design an effective method that is able to select suitable features and integrate them to predict essential proteins. In this work, we first collect 26 features, and use SVM-RFE to select some of them to create a feature space for predicting essential proteins, and then remove the features that share the biological meaning with other features in the feature space according to their Pearson Correlation Coefficients(PCC). The experiments are carried out on S. cerevisiae data. Six features are determined as the best subset of features. To assess the prediction performance of our method, we further compare it with some machine learning methods, such as SVM, Naive Bayes, Bayes Network, and NBTree when inputting the different number of features. The results show that those methods using the 6 features outperform that using other features, which confirms the effectiveness of our feature selection method for essential protein prediction.展开更多
Dear Editor,Protein-protein interactions(PPIs)often play important roles in biological processes(Zhang et al.,2016).The split Renilla luciferase complementation assay(SRLCA)is one of the methods in studying PPIs...Dear Editor,Protein-protein interactions(PPIs)often play important roles in biological processes(Zhang et al.,2016).The split Renilla luciferase complementation assay(SRLCA)is one of the methods in studying PPIs.SRLCA is based on the complementation of the N-terminal domains of Renilla luciferase(LN)and C-terminal domains of Renilla luciferase (LC) non-functional halves of Renilla luciferase fused to possibly interacting proteins and emit luminescence (Deng et al., 2011; Jiang et al., 2010) (Supplementary Figure S1A).展开更多
In the past two decades, extensive studies have focused on a group of so-called polarity proteins that play conserved and essential functions in establishing and maintaining cell polarity in epithelial cells. Among th...In the past two decades, extensive studies have focused on a group of so-called polarity proteins that play conserved and essential functions in establishing and maintaining cell polarity in epithelial cells. Among them, Crumbs (Crb) is the only trans- membrane polarity protein characterized to date (Tepass et al.,展开更多
Protein complexes are the basic units of macro-molecular organizations and help us to understand the cell's mechanism.The development of the yeast two-hybrid,tandem affinity purification,and mass spectrometry high...Protein complexes are the basic units of macro-molecular organizations and help us to understand the cell's mechanism.The development of the yeast two-hybrid,tandem affinity purification,and mass spectrometry high-throughput proteomic techniques supplies a large amount of protein-protein interaction data,which make it possible to predict overlapping complexes through computational methods.Research shows that overlapping complexes can contribute to identifying essential proteins,which are necessary for the organism to survive and reproduce,and for life's activities.Scholars pay more attention to the evaluation of protein complexes.However,few of them focus on predicted overlaps.In this paper,an evaluation criterion called overlap maximum matching ratio(OMMR) is proposed to analyze the similarity between the identified overlaps and the benchmark overlap modules.Comparison of essential proteins and gene ontology(GO) analysis are also used to assess the quality of overlaps.We perform a comprehensive comparison of serveral overlapping complexes prediction approaches,using three yeast protein-protein interaction(PPI) networks.We focus on the analysis of overlaps identified by these algorithms.Experimental results indicate the important of overlaps and reveal the relationship between overlaps and identification of essential proteins.展开更多
基金Project supported by the Gansu Province Industrial Support Plan (Grant No.2023CYZC-25)the Natural Science Foundation of Gansu Province (Grant No.23JRRA770)the National Natural Science Foundation of China (Grant No.62162040)。
文摘Essential proteins are inseparable in cell growth and survival. The study of essential proteins is important for understanding cellular functions and biological mechanisms. Therefore, various computable methods have been proposed to identify essential proteins. Unfortunately, most methods based on network topology only consider the interactions between a protein and its neighboring proteins, and not the interactions with its higher-order distance proteins. In this paper, we propose the DSEP algorithm in which we integrated network topology properties and subcellular localization information in protein–protein interaction(PPI) networks based on four-order distances, and then used random walks to identify the essential proteins. We also propose a method to calculate the finite-order distance of the network, which can greatly reduce the time complexity of our algorithm. We conducted a comprehensive comparison of the DSEP algorithm with 11 existing classical algorithms to identify essential proteins with multiple evaluation methods. The results show that DSEP is superior to these 11 methods.
基金the National Natural Science Foundation of China(Grant Nos.11861045,11361033,and 62162040)。
文摘Essential proteins play an important role in disease diagnosis and drug development.Many methods have been devoted to the essential protein prediction by using some kinds of biological information.However,they either ignore the noise presented in the biological information itself or the noise generated during feature extraction.To overcome these problems,in this paper,we propose a novel method for predicting essential proteins called attention gate-graph attention network and temporal convolutional network(AG-GATCN).In AG-GATCN method,we use improved temporal convolutional network(TCN)to extract features from gene expression sequence.To address the noise in the gene expression sequence itself and the noise generated after the dilated causal convolution,we introduce attention mechanism and gating mechanism in TCN.In addition,we use graph attention network(GAT)to extract protein–protein interaction(PPI)network features,in which we construct the feature matrix by introducing node2vec technique and 7 centrality metrics,and to solve the GAT oversmoothing problem,we introduce gated tanh unit(GTU)in GAT.Finally,two types of features are integrated by us to predict essential proteins.Compared with the existing methods for predicting essential proteins,the experimental results show that AG-GATCN achieves better performance.
基金Project supported by the National Natural Science Foundation of China(Grant Nos.11861045 and 62162040)。
文摘Predicting essential proteins is crucial for discovering the process of cellular organization and viability.We propose biased random walk with restart algorithm for essential proteins prediction,called BRWR.Firstly,the common process of practice walk often sets the probability of particles transferring to adjacent nodes to be equal,neglecting the influence of the similarity structure on the transition probability.To address this problem,we redefine a novel transition probability matrix by integrating the gene express similarity and subcellular location similarity.The particles can obtain biased transferring probabilities to perform random walk so as to further exploit biological properties embedded in the network structure.Secondly,we use gene ontology(GO)terms score and subcellular score to calculate the initial probability vector of the random walk with restart.Finally,when the biased random walk with restart process reaches steady state,the protein importance score is obtained.In order to demonstrate superiority of BRWR,we conduct experiments on the YHQ,BioGRID,Krogan and Gavin PPI networks.The results show that the method BRWR is superior to other state-of-the-art methods in essential proteins recognition performance.Especially,compared with the contrast methods,the improvements of BRWR in terms of the ACC results range in 1.4%–5.7%,1.3%–11.9%,2.4%–8.8%,and 0.8%–14.2%,respectively.Therefore,BRWR is effective and reasonable.
基金supported by the Shenzhen KQTD Project(No.KQTD20200820113106007)China Scholarship Council(No.201906725017)+2 种基金the Collaborative Education Project of Industry-University cooperation of the Chinese Ministry of Education(No.201902098015)the Teaching Reform Project of Hunan Normal University(No.82)the National Undergraduate Training Program for Innovation(No.202110542004).
文摘Essential proteins play a vital role in biological processes,and the combination of gene expression profiles with Protein-Protein Interaction(PPI)networks can improve the identification of essential proteins.However,gene expression data are prone to significant fluctuations due to noise interference in topological networks.In this work,we discretized gene expression data and used the discrete similarities of the gene expression spectrum to eliminate noise fluctuation.We then proposed the Pearson Jaccard coefficient(PJC)that consisted of continuous and discrete similarities in the gene expression data.Using the graph theory as the basis,we fused the newly proposed similarity coefficient with the existing network topology prediction algorithm at each protein node to recognize essential proteins.This strategy exhibited a high recognition rate and good specificity.We validated the new similarity coefficient PJC on PPI datasets of Krogan,Gavin,and DIP of yeast species and evaluated the results by receiver operating characteristic analysis,jackknife analysis,top analysis,and accuracy analysis.Compared with that of node-based network topology centrality and fusion biological information centrality methods,the new similarity coefficient PJC showed a significantly improved prediction performance for essential proteins in DC,IC,Eigenvector centrality,subgraph centrality,betweenness centrality,closeness centrality,NC,PeC,and WDC.We also compared the PJC coefficient with other methods using the NF-PIN algorithm,which predicts proteins by constructing active PPI networks through dynamic gene expression.The experimental results proved that our newly proposed similarity coefficient PJC has superior advantages in predicting essential proteins.
基金the National Natural Science Foundation of China(Nos.62162040 and 11861045)。
文摘Identifying essential proteins from protein-protein interaction networks is important for studies onbiological evolution and new drug’s development.Most of the presented criteria for prioritizing essential proteinsonly focus on a certain attribute of the proteins in the network,which suffer from information loss.In order toovercome this problem,a relatively comprehensive and effective novel method for essential proteins identificationbased on improved multicriteria decision making(MCDM),called essential proteins identification-technique fororder preference by similarity to ideal solution(EPI-TOPSIS),is proposed.First,considering different attributes ofproteins,we propose three methods from different aspects to evaluate the significance of the proteins:gene-degreecentrality(GDC)for gene expression sequence;subcellular-neighbor-degree centrality(SNDC)and subcellular-indegree centrality(SIDC)for subcellular location information and protein complexes.Then,betweenness centrality(BC)and these three methods are considered together as the multiple criteria of the decision-making model.Analytic hierarchy process is used to evaluate the weights of each criterion,and the essential proteins are prioritizedby an ideal solution of MCDM,i.e.,TOPSIS.Experiments are conducted on YDIP,YMIPS,Krogan and BioGRIDnetworks.The results indicate that EPI-TOPSIS outperforms several state-of-the-art approaches for identifyingthe essential proteins through the performance measures.
文摘The prediction of essential proteins, the minimal set required for a living cell to support cellular life, is an important task to understand the cellular processes of an organism. Fast progress in high-throughput technologies and the production of large amounts of data enable the discovery of essential proteins at the system level by analyzing Protein-Protein Interaction (PPI) networks, and replacing biological or chemical experiments. Furthermore, additional gene-level annotation information, such as Gene Ontology (GO) terms, helps to detect essential proteins with higher accuracy. Various centrality algorithms have been used to determine essential proteins in a PPI network, and, recently motif centrality GO, which is based on network motifs and GO terms, works best in detecting essential proteins in a Baker's yeast Saccharomyces cerevisiae PPI network, compared to other centrality algorithms. However, each centrality algorithm contributes to the detection of essential proteins with different properties, which makes the integration of them a logical next step. In this paper, we construct a new feature space, named CENT-ING-GO consisting of various centrality measures and GO terms, and provide a computational approach to predict essential proteins with various machine learning techniques. The experimental results show that CENT-ING-GO feature space improves performance over the INT-GO feature space in previous work by Acencio and Lemke in 2009. We also demonstrate that pruning a PPI with informative GO terms can improve the prediction performance further.
基金supported by the National Natural Science Foundation for Excellent Young Scholars(No.61622213)the National Natural Science Foundation of China(Nos.61232001,61370024,and 61428209)
文摘Essential proteins are those necessary for the survival or reproduction of species and discovering such essential proteins is fundamental for understanding the minimal requirements for cellular life, which is also meaningful to the disease study and drug design. With the development of high-throughput techniques, a large number of Protein-Protein Interactions(PPIs) can be used to identify essential proteins at the network level. Up to now, though a series of network-based computational methods have been proposed, it is still a challenge to improve the prediction precision as the high false positives in PPI networks. In this paper, we propose a new method GOS to identify essential proteins by integrating the Gene expressions, Orthology, and Subcellular localization information.The gene expressions and subcellular localization information are used to determine whether a neighbor in the PPI network is reliable. Only reliable neighbors are considered when we analyze the topological characteristics of a protein in a PPI network. We also analyze the orthologous attributes of each protein to reflect its conservative features, and use a random walk model to integrate a protein's topological characteristics and its orthology. The experimental results on the yeast PPI network show that the proposed method GOS outperforms the ten existing methods DC, BC, CC, SC, EC, IC, NC, Pe C, ION, and CSC.
基金supported by the National Natural Science Foundation of China(Nos.61232001,61502166,61502214,61379108,and 61370024)Scientific Research Fund of Hunan Provincial Education Department(Nos.15CY007 and 10A076)
文摘Essential proteins are vital to the survival of a cell. There are various features related to the essentiality of proteins, such as biological and topological features. Many computational methods have been developed to identify essential proteins by using these features. However, it is still a big challenge to design an effective method that is able to select suitable features and integrate them to predict essential proteins. In this work, we first collect 26 features, and use SVM-RFE to select some of them to create a feature space for predicting essential proteins, and then remove the features that share the biological meaning with other features in the feature space according to their Pearson Correlation Coefficients(PCC). The experiments are carried out on S. cerevisiae data. Six features are determined as the best subset of features. To assess the prediction performance of our method, we further compare it with some machine learning methods, such as SVM, Naive Bayes, Bayes Network, and NBTree when inputting the different number of features. The results show that those methods using the 6 features outperform that using other features, which confirms the effectiveness of our feature selection method for essential protein prediction.
基金supported by the Initiative Research Program of Wuhan University(No.410100020)the advanced talent independent research program of Wuhan University(No.410100011)the National Natural Science Foundation of China(No.210700228)
文摘Dear Editor,Protein-protein interactions(PPIs)often play important roles in biological processes(Zhang et al.,2016).The split Renilla luciferase complementation assay(SRLCA)is one of the methods in studying PPIs.SRLCA is based on the complementation of the N-terminal domains of Renilla luciferase(LN)and C-terminal domains of Renilla luciferase (LC) non-functional halves of Renilla luciferase fused to possibly interacting proteins and emit luminescence (Deng et al., 2011; Jiang et al., 2010) (Supplementary Figure S1A).
基金supported by the grants from the National Institutes of Health of USA(NCRR R21RR024869, NIGMS RO1GM086423 and RO1GM121534 to Y.H.)the Start-up Foundation from Nanjing Medical University (2012RC04 to J.H.)University of Pittsburgh Medical School Center for Biologic Imaging was supported by the grant 1S100D019973-01 from NIH, USA
文摘In the past two decades, extensive studies have focused on a group of so-called polarity proteins that play conserved and essential functions in establishing and maintaining cell polarity in epithelial cells. Among them, Crumbs (Crb) is the only trans- membrane polarity protein characterized to date (Tepass et al.,
基金Project supported by the National Scientific Research Foundation of Hunan Province, China (Nos. 14C0096, 10C0408, and 10B010), the Natural Science Foundation of Hunan Province, China (Nos. 13JJ4106 and 14J J3138), and the Science and Technology Plan Project of Hunan Province, China (No. 2010FJ3044)
文摘Protein complexes are the basic units of macro-molecular organizations and help us to understand the cell's mechanism.The development of the yeast two-hybrid,tandem affinity purification,and mass spectrometry high-throughput proteomic techniques supplies a large amount of protein-protein interaction data,which make it possible to predict overlapping complexes through computational methods.Research shows that overlapping complexes can contribute to identifying essential proteins,which are necessary for the organism to survive and reproduce,and for life's activities.Scholars pay more attention to the evaluation of protein complexes.However,few of them focus on predicted overlaps.In this paper,an evaluation criterion called overlap maximum matching ratio(OMMR) is proposed to analyze the similarity between the identified overlaps and the benchmark overlap modules.Comparison of essential proteins and gene ontology(GO) analysis are also used to assess the quality of overlaps.We perform a comprehensive comparison of serveral overlapping complexes prediction approaches,using three yeast protein-protein interaction(PPI) networks.We focus on the analysis of overlaps identified by these algorithms.Experimental results indicate the important of overlaps and reveal the relationship between overlaps and identification of essential proteins.