Uncovering causal genes for human inherited diseases,as the primary step toward understanding the pathogenesis of these diseases,requires a combined analysis of genetic and genomic data.Although bioinformatics methods...Uncovering causal genes for human inherited diseases,as the primary step toward understanding the pathogenesis of these diseases,requires a combined analysis of genetic and genomic data.Although bioinformatics methods have been designed to prioritize candidate genes resulting fromgenetic linkage analysis or association studies,the coverage of both diseases and genes in existing methods is quite limited,thereby preventing the scan of causal genes for a significant proportion of diseases at the whole-genome level.To overcome this limitation,we propose a method named pgWalk to prioritize candidate genes by integrating multiple phenomic and genomic data.We derive three types of phenotype similarities among 7719 diseases and nine types of functional similarities among 20327 genes.Based on a pair of phenotype and gene similarities,we construct a disease-gene network and then simulate the process that a random walker wanders on such a heterogeneous network to quantify the strength of association between a candidate gene and a query disease.A weighted version of the Fisher’s method with dependent correction is adopted to integrate 27 scores obtained in this way,and a final q-value is calibrated for prioritizing candidate genes.A series of validation experiments are conducted to demonstrate the superior performance of this approach.We further show the effectiveness of this method in exome sequencing studies of autism and epileptic encephalopathies.An online service and the standalone software of pgWalk can be found at http://bioinfo.au.tsinghua.edu.cn/jianglab/pgwalk.展开更多
The identification of communities is imperative in the understanding of network structures and functions.Using community detection algorithms in biological networks, the community structure of biological networks can ...The identification of communities is imperative in the understanding of network structures and functions.Using community detection algorithms in biological networks, the community structure of biological networks can be determined, which is helpful in analyzing the topological structures and predicting the behaviors of biological networks. In this paper, we analyze the diseasome network using a new method called disease-gene network detecting algorithm based on principal component analysis, which can be used to investigate the connection between nodes within the same group. Experimental results on real-world networks have demonstrated that our algorithm is more efficient in detecting community structures when compared with other well-known results.展开更多
Various posttranslational modifications (PTMs) participate in nearly all aspects of biological processes by regulating protein functions, and aberrant states of PTMs are frequently implicated in human diseases. Ther...Various posttranslational modifications (PTMs) participate in nearly all aspects of biological processes by regulating protein functions, and aberrant states of PTMs are frequently implicated in human diseases. Therefore, an integral resource of PTM–disease associations (PDAs)would be a great help for both academic research and clinical use. In this work, we reported PTMD,a well-curated database containing PTMs that are associated with human diseases. We manually collected 1950 known PDAs in 749 proteins for 23 types of PTMs and 275 types of diseases from the literature. Database analyses show that phosphorylation has the largest number of disease associations, whereas neurologic diseases have the largest number of PTM associations. We classified all known PDAs into six classes according to the PTM status in diseases and demonstrated that the upregulation and presence of PTM events account for a predominant proportion of diseaseassociated PTM events. By reconstructing a disease–gene network, we observed that breast cancers have the largest number of associated PTMs and AKT1 has the largest number of PTMs connected to diseases. Finally, the PTMD database was developed with detailed annotations and can be a useful resource for further analyzing the relations between PTMs and human diseases. PTMD is freely accessible at http://ptmd.biocuckoo.org.展开更多
With the continuing development and improvement of genome-wide techniques, a great number of candidate genes are discovered. How to identify the most likely disease genes among a large number of candidates becomes a f...With the continuing development and improvement of genome-wide techniques, a great number of candidate genes are discovered. How to identify the most likely disease genes among a large number of candidates becomes a fundamental challenge in human health. A common view is that genes related to a specific or similar disease tend to reside in the same neighbourhood of biomolecular networks. Recently, based on such observations,many methods have been developed to tackle this challenge. In this review, we firstly introduce the concept of disease genes, their properties, and available data for identifying them. Then we review the recent computational approaches for prioritizing candidate disease genes based on Protein-Protein Interaction(PPI) networks and investigate their advantages and disadvantages. Furthermore, some pieces of existing software and network resources are summarized. Finally, we discuss key issues in prioritizing candidate disease genes and point out some future research directions.展开更多
Autoimmune diseases (ADs) arise from an abnormal immune response of the body against substances and tissues normally present in the body. More than a hundred of ADs have been described in the literature so far. Alth...Autoimmune diseases (ADs) arise from an abnormal immune response of the body against substances and tissues normally present in the body. More than a hundred of ADs have been described in the literature so far. Although their etiology remains largely unclear, various types of ADs tend to share more associated genes with other types of ADs than with non-AD types. Here we present GAAD, a gene and AD association database. In GAAD, we collected 44,762 associations between 49 ADs and 4249 genes from public databases and MEDLINE documents. We manually verified the associations to ensure the quality and credibility. We reconstructed and recapitulated the relationships among ADs using their shared genes, which further validated the quality of our data. We also provided a list of significantly co-occurring gene pairs among ADs;with embedded tools, users can query gene co-occurrences and construct customized cooccurrence network with genes of interest. To make GAAD more straightforward to experimental biologists and medical scientists, we extracted additional information describing the associations through text mining, including the putative diagnostic value of the associations, type and position of gene polymorphisms, expression changes of implicated genes, as well as the phenotypical consequences, and grouped the associations accordingly. GAAD is freely available at http://gaad.medgenius.info.展开更多
基金This work was supported by the National Basic Research Program of China(2012CB316504)the National High Technology Research and Development Program of China(2012AA020401)the National Natural Science Foundation of China(61175002).
文摘Uncovering causal genes for human inherited diseases,as the primary step toward understanding the pathogenesis of these diseases,requires a combined analysis of genetic and genomic data.Although bioinformatics methods have been designed to prioritize candidate genes resulting fromgenetic linkage analysis or association studies,the coverage of both diseases and genes in existing methods is quite limited,thereby preventing the scan of causal genes for a significant proportion of diseases at the whole-genome level.To overcome this limitation,we propose a method named pgWalk to prioritize candidate genes by integrating multiple phenomic and genomic data.We derive three types of phenotype similarities among 7719 diseases and nine types of functional similarities among 20327 genes.Based on a pair of phenotype and gene similarities,we construct a disease-gene network and then simulate the process that a random walker wanders on such a heterogeneous network to quantify the strength of association between a candidate gene and a query disease.A weighted version of the Fisher’s method with dependent correction is adopted to integrate 27 scores obtained in this way,and a final q-value is calibrated for prioritizing candidate genes.A series of validation experiments are conducted to demonstrate the superior performance of this approach.We further show the effectiveness of this method in exome sequencing studies of autism and epileptic encephalopathies.An online service and the standalone software of pgWalk can be found at http://bioinfo.au.tsinghua.edu.cn/jianglab/pgwalk.
基金supported in part by the Natural Science Foundation of Education Department of Jiangsu Province(No.12KJB520019)the National Science Foundation of Jiangsu Province (No.BK20130452)+2 种基金Science and Technology Innovation Foundation of Yangzhou University (No.2012CXJ026)the National Natural Science Foundation of China (Nos.61070047,61070133,and 61003180)the National Key Basic Research and Development (973) Program of China (No.2012CB316003)
文摘The identification of communities is imperative in the understanding of network structures and functions.Using community detection algorithms in biological networks, the community structure of biological networks can be determined, which is helpful in analyzing the topological structures and predicting the behaviors of biological networks. In this paper, we analyze the diseasome network using a new method called disease-gene network detecting algorithm based on principal component analysis, which can be used to investigate the connection between nodes within the same group. Experimental results on real-world networks have demonstrated that our algorithm is more efficient in detecting community structures when compared with other well-known results.
基金supported by grants from the Special Project on Precision Medicine under the National Key R&D Program of China (Grant Nos. 2017YFC0906600 and 2016YFC0903003)the Natural Science Foundation of China (Grant Nos. 31671360 and 81670462)+2 种基金the Fundamental Research Funds for the Central Universities (Grant No. 2017KFXKJC001)the National Program for Support of Top-Notch Young Professionalsthe program for HUST Academic Frontier Youth Team, China
文摘Various posttranslational modifications (PTMs) participate in nearly all aspects of biological processes by regulating protein functions, and aberrant states of PTMs are frequently implicated in human diseases. Therefore, an integral resource of PTM–disease associations (PDAs)would be a great help for both academic research and clinical use. In this work, we reported PTMD,a well-curated database containing PTMs that are associated with human diseases. We manually collected 1950 known PDAs in 749 proteins for 23 types of PTMs and 275 types of diseases from the literature. Database analyses show that phosphorylation has the largest number of disease associations, whereas neurologic diseases have the largest number of PTM associations. We classified all known PDAs into six classes according to the PTM status in diseases and demonstrated that the upregulation and presence of PTM events account for a predominant proportion of diseaseassociated PTM events. By reconstructing a disease–gene network, we observed that breast cancers have the largest number of associated PTMs and AKT1 has the largest number of PTMs connected to diseases. Finally, the PTMD database was developed with detailed annotations and can be a useful resource for further analyzing the relations between PTMs and human diseases. PTMD is freely accessible at http://ptmd.biocuckoo.org.
文摘With the continuing development and improvement of genome-wide techniques, a great number of candidate genes are discovered. How to identify the most likely disease genes among a large number of candidates becomes a fundamental challenge in human health. A common view is that genes related to a specific or similar disease tend to reside in the same neighbourhood of biomolecular networks. Recently, based on such observations,many methods have been developed to tackle this challenge. In this review, we firstly introduce the concept of disease genes, their properties, and available data for identifying them. Then we review the recent computational approaches for prioritizing candidate disease genes based on Protein-Protein Interaction(PPI) networks and investigate their advantages and disadvantages. Furthermore, some pieces of existing software and network resources are summarized. Finally, we discuss key issues in prioritizing candidate disease genes and point out some future research directions.
文摘Autoimmune diseases (ADs) arise from an abnormal immune response of the body against substances and tissues normally present in the body. More than a hundred of ADs have been described in the literature so far. Although their etiology remains largely unclear, various types of ADs tend to share more associated genes with other types of ADs than with non-AD types. Here we present GAAD, a gene and AD association database. In GAAD, we collected 44,762 associations between 49 ADs and 4249 genes from public databases and MEDLINE documents. We manually verified the associations to ensure the quality and credibility. We reconstructed and recapitulated the relationships among ADs using their shared genes, which further validated the quality of our data. We also provided a list of significantly co-occurring gene pairs among ADs;with embedded tools, users can query gene co-occurrences and construct customized cooccurrence network with genes of interest. To make GAAD more straightforward to experimental biologists and medical scientists, we extracted additional information describing the associations through text mining, including the putative diagnostic value of the associations, type and position of gene polymorphisms, expression changes of implicated genes, as well as the phenotypical consequences, and grouped the associations accordingly. GAAD is freely available at http://gaad.medgenius.info.