The prediction of essential proteins, the minimal set required for a living cell to support cellular life, is an important task to understand the cellular processes of an organism. Fast progress in high-throughput tec...The prediction of essential proteins, the minimal set required for a living cell to support cellular life, is an important task to understand the cellular processes of an organism. Fast progress in high-throughput technologies and the production of large amounts of data enable the discovery of essential proteins at the system level by analyzing Protein-Protein Interaction (PPI) networks, and replacing biological or chemical experiments. Furthermore, additional gene-level annotation information, such as Gene Ontology (GO) terms, helps to detect essential proteins with higher accuracy. Various centrality algorithms have been used to determine essential proteins in a PPI network, and, recently motif centrality GO, which is based on network motifs and GO terms, works best in detecting essential proteins in a Baker's yeast Saccharomyces cerevisiae PPI network, compared to other centrality algorithms. However, each centrality algorithm contributes to the detection of essential proteins with different properties, which makes the integration of them a logical next step. In this paper, we construct a new feature space, named CENT-ING-GO consisting of various centrality measures and GO terms, and provide a computational approach to predict essential proteins with various machine learning techniques. The experimental results show that CENT-ING-GO feature space improves performance over the INT-GO feature space in previous work by Acencio and Lemke in 2009. We also demonstrate that pruning a PPI with informative GO terms can improve the prediction performance further.展开更多
Network motif is defined as a frequent and unique subgraph pattern in a network, and the search involves counting all the possible instances or listing all patterns, testing isomorphism known as NP-hard and large amou...Network motif is defined as a frequent and unique subgraph pattern in a network, and the search involves counting all the possible instances or listing all patterns, testing isomorphism known as NP-hard and large amounts of repeated processes for statistical evaluation. Although many efficient algorithms have been introduced, exhaustive search methods are still infeasible and feasible approximation methods are yet implausible.Additionally, the fast and continual growth of biological networks makes the problem more challenging. As a consequence, parallel algorithms have been developed and distributed computing has been tested in the cloud computing environment as well. In this paper, we survey current algorithms for network motif detection and existing software tools. Then, we show that some methods have been utilized for parallel network motif search algorithms with static or dynamic load balancing techniques. With the advent of cloud computing services, network motif search has been implemented with MapReduce in Hadoop Distributed File System(HDFS), and with Storm, but without statistical testing. In this paper, we survey network motif search algorithms in general, including existing parallel methods as well as cloud computing based search, and show the promising potentials for the cloud computing based motif search methods.展开更多
文摘The prediction of essential proteins, the minimal set required for a living cell to support cellular life, is an important task to understand the cellular processes of an organism. Fast progress in high-throughput technologies and the production of large amounts of data enable the discovery of essential proteins at the system level by analyzing Protein-Protein Interaction (PPI) networks, and replacing biological or chemical experiments. Furthermore, additional gene-level annotation information, such as Gene Ontology (GO) terms, helps to detect essential proteins with higher accuracy. Various centrality algorithms have been used to determine essential proteins in a PPI network, and, recently motif centrality GO, which is based on network motifs and GO terms, works best in detecting essential proteins in a Baker's yeast Saccharomyces cerevisiae PPI network, compared to other centrality algorithms. However, each centrality algorithm contributes to the detection of essential proteins with different properties, which makes the integration of them a logical next step. In this paper, we construct a new feature space, named CENT-ING-GO consisting of various centrality measures and GO terms, and provide a computational approach to predict essential proteins with various machine learning techniques. The experimental results show that CENT-ING-GO feature space improves performance over the INT-GO feature space in previous work by Acencio and Lemke in 2009. We also demonstrate that pruning a PPI with informative GO terms can improve the prediction performance further.
文摘Network motif is defined as a frequent and unique subgraph pattern in a network, and the search involves counting all the possible instances or listing all patterns, testing isomorphism known as NP-hard and large amounts of repeated processes for statistical evaluation. Although many efficient algorithms have been introduced, exhaustive search methods are still infeasible and feasible approximation methods are yet implausible.Additionally, the fast and continual growth of biological networks makes the problem more challenging. As a consequence, parallel algorithms have been developed and distributed computing has been tested in the cloud computing environment as well. In this paper, we survey current algorithms for network motif detection and existing software tools. Then, we show that some methods have been utilized for parallel network motif search algorithms with static or dynamic load balancing techniques. With the advent of cloud computing services, network motif search has been implemented with MapReduce in Hadoop Distributed File System(HDFS), and with Storm, but without statistical testing. In this paper, we survey network motif search algorithms in general, including existing parallel methods as well as cloud computing based search, and show the promising potentials for the cloud computing based motif search methods.