为了筛选鸡IFIT5基因潜在的、具有生物学功能的nsSNPs(nonsense Single nucleotide polymorphisms,SNPs)位点和UTR-SNPs(untranslated regions-SNPs)位点。利用5种SNPs在线工具对nsSNPs进行分析预测有害位点;使用SWISS-MODEL,Pymol,Cons...为了筛选鸡IFIT5基因潜在的、具有生物学功能的nsSNPs(nonsense Single nucleotide polymorphisms,SNPs)位点和UTR-SNPs(untranslated regions-SNPs)位点。利用5种SNPs在线工具对nsSNPs进行分析预测有害位点;使用SWISS-MODEL,Pymol,Consurf server等软件对有害nsSNPs位点进行蛋白质的空间结构,氨基酸的氢键变化,及保守型等进行相关分析。此外,对UTR-SNPs位点使用UTRScan分析预测其结合模式元件是否改变。结果表明:从IFIT5基因的nsSNPs位点筛选出来的6个有害nsSNPs位点(L220I,L223M,L223V,Y375C,E391G和Y414D)都为潜在性功能位点,其中E391G位点最可能影响蛋白的结构和功能。从UTR-SNPs筛选出9个位于uORF的位点可能影响IFIT5基因的转录模式。展开更多
群体凝血因子C同源物基因(Coagulation factor C homology,COCH)是人类发现的第一个伴前庭功能障碍的耳聋基因,位于人类染色体14q12-q13上。迄今,在COCH基因上发现16个位点突变导致常染色体显性遗传非综合征型耳聋DFNA9的发生,其中包括1...群体凝血因子C同源物基因(Coagulation factor C homology,COCH)是人类发现的第一个伴前庭功能障碍的耳聋基因,位于人类染色体14q12-q13上。迄今,在COCH基因上发现16个位点突变导致常染色体显性遗传非综合征型耳聋DFNA9的发生,其中包括13个非同义单核苷酸多态性(Non-synonymous single nucleotide polymorphisms,ns SNPs)位点。由于该基因其他ns SNPs的基因型与表型关系尚不清楚,因此文章采用生物信息学方法,从COCH基因全部的SNPs中分级筛选,结合已知的致病ns SNPs信息及蛋白三维结构验证,首次预测出由COCH基因编码的cochlin蛋白的v WFA(Von Willebrand factor type A domain)区的8个高风险致病性ns SNPs(I176T、R180Q、G265E、V269L、I368N、I372T、R416C和Y424D)。同时,对位于LCCL(Limulus factor C,cochlin,and late gestation lung protein Lgl1)区域的6个已知致病突变的ns SNPs(P51S、G87W、I109N、I109T、W117R和F121S)进行了三维结构模拟,发现突变体均发生了环状结构或链状结构的改变。本研究对COCH基因的基因型与表型的相关性研究为遗传性耳聋筛查提供了相应的理论依据,也对该基因所编码的cochlin蛋白的功能研究具有一定的指导意义。展开更多
本文旨在采用生物信息学方法筛选猪钙蛋白酶抑制蛋白基因(CAST)中具有潜在生物学功能的非同义单核苷酸多态性(Non-Synonymous Single Nucleotide Polymorphisms,nsSNPs)位点,为后续开展标记辅助选择改良猪的重要经济性状提供理论参考。...本文旨在采用生物信息学方法筛选猪钙蛋白酶抑制蛋白基因(CAST)中具有潜在生物学功能的非同义单核苷酸多态性(Non-Synonymous Single Nucleotide Polymorphisms,nsSNPs)位点,为后续开展标记辅助选择改良猪的重要经济性状提供理论参考。本研究从Ensembl数据库中检索出猪CAST基因的31个nsSNPs位点,使用ConSurf、SNAP、SIFT、Polyphen-2、Mupro分析软件从序列保守性、功能性预测以及稳定性方面进行预测分析;对具有潜在功能的nsSNPs位点进行蛋白质二级结构预测,结果表明,rs319338780和rs790645252这2个位点的突变都造成了CAST蛋白质二级结构的改变。使用I-TASSE、TM-Align、PredyFlexy等软件分析这2个突变位点对蛋白质三级结构稳定性和柔性的影响,结果显示,rs790645252位点的突变使得氨基酸链738位置上的脯氨酸突变为组氨酸,并对CAST蛋白的三级结构影响较大,可能是影响猪肉质性状重要的潜在功能性位点。展开更多
Leptin receptor(LEPR)plays a vital role in obesity in humans and animals.The objective of this study is to assess LEPR functional variants for chicken adipose deposition by integration of association and in-silico ana...Leptin receptor(LEPR)plays a vital role in obesity in humans and animals.The objective of this study is to assess LEPR functional variants for chicken adipose deposition by integration of association and in-silico analysis using a unique chicken population,the Northeast Agricultural University broiler lines divergently selected for abdominal fat content(NEAUHLF).Five online bioinformatics tools were used to predict the functionality of the single nucleotide polymorphisms(SNPs)in coding region.Further,the possible structure–function relationship of high confidence SNPs was determined by bioinformatics analyses,including the conservation and stability analysis based on amino acid residues,prediction of protein ligand-binding sites,and the superposition of protein tertiary structure.Meanwhile,we analyzed the association between abdominal fat traits and 20 polymorphisms of chicken LEPR gene.The integrated results showed that rs731962924(N867I)and rs13684622(C1002R)could lead to striking changes in the structure and function of proteins,of which rs13684622(C1002R)was significantly associated with abdominal fat weight(AFW,P=0.0413)and abdominal fat percentage(AFP,P=0.0260)in chickens.Therefore,we are of the opinion that rs13684622(C1002R)may be an essential functional SNP affecting chicken abdominal fat deposition,and potentially applied to improvement of broiler abdominal fat in molecular marker-assisted selection(MAS)program.Additionally,the coupling of association with computer electronic predictive analysis provides a new avenue to identify important molecular markers for breeders.展开更多
A substitution on an amino acid sequence can be defined as "intolerant" (non-neutral) or "tolerant" (neutral) according to whether or not it detectably alters protein phenotypes (e.g.,
The asparagine-X-serine/threonine (NXS/T) motif, where X is any amino acid except proline, is the consensus motif for N-linked glycosylation. Significant numbers of high-resolution crystal structures of glycosylated...The asparagine-X-serine/threonine (NXS/T) motif, where X is any amino acid except proline, is the consensus motif for N-linked glycosylation. Significant numbers of high-resolution crystal structures of glycosylated proteins allow us to carry out structural analysis of the N-linked glycosylation sites (NGS). Our analysis shows that there is enough structural information from diverse glycoproteins to allow the development of rules which can be used to predict NGS. A Python-based tool was developed to investigate asparagines implicated in N-glycosylation in five species: Homo sapiens, Mus musculus, Drosophila melanogaster, Arabidopsis thaliana and Saccharo- myees cerevisiae. Our analysis shows that 78 % of all asparagines of NXS/T motif involved in N-gly- cosylation are localized in the loop/turn conformation in the human proteome. Similar distribution was revealed for all the other species examined. Comparative analysis of the occurrence of NXS/T motifs not known to be glycosylated and their reverse sequence (S/TXN) shows a similar distribu- tion across the secondary structural elements, indicating that the NXS/T motif in itself is not bio- logically relevant. Based on our analysis, we have defined rules to determine NGS. Using machine learning methods based on these rules we can predict with 93% accuracy if a particular site will be glycosylated. If structural information is not available the tool uses structural prediction results resulting in 74% accuracy. The tool was used to identify glycosylation sites in 108 human proteins with structures and 2247 proteins without structures that have acquired NXS/T site/s due to non-synonymous variation. The tool, Structure Feature Analysis Tool (SFAT), is freely available to the public at http://hive.biochemistry.gwu.edu/tools/sfat.展开更多
Amino acid changes due to non-synonymous variation are included as annotations for individual proteins in UniProtKB/Swiss-Prot and RefSeq which present biological data in a pro- tein- or gene-centric fashion. Unfortun...Amino acid changes due to non-synonymous variation are included as annotations for individual proteins in UniProtKB/Swiss-Prot and RefSeq which present biological data in a pro- tein- or gene-centric fashion. Unfortunately, proteome-wide analysis of non-synonymous single- nucleotide variations (nsSNVs) is not easy to perform because information on nsSNVs and func- tionally important sites are not well integrated both within and between databases and their search engines. We have developed SNVDis that allows evaluation of proteome-wide nsSNV distribution in functional sites, domains and pathways. More specifically, we have integrated human-specific data from major variation databases (UniProtKB, dbSNP and COSMIC), comprehensive sequence feature annotation from UniProtKB, Pfam, RefSeq, Conserved Domain Database (CDD) and pathway information from Protein ANalysis THrough Evolutionary Relationships (PANTHER) and mapped all of them in a uniform and comprehensive way to the human reference proteome pro- vided by UniProtKB/Swiss-Prot. Integrated information of active sites, pathways, binding sites, domains, which are extracted from a number of different sources, provides a detailed overview of how nsSNVs are distributed over the human proteome and pathways and how they intersect with functional sites of proteins. Additionally, it is possible to find out whether there is an over- or under-representation of nsSNVs in specific domains, pathways or user-defined protein lists. The underlying datasets are updated once every 3 months. SNVDis is freely available at http://hive.bio- chemistry.gwu.edu/tool/snvdis.展开更多
Large-scale sequencing has characterized an enormous number of genetic variations(GVs),and the functional analysis of GVs is fundamental to understanding differences in disease susceptibility and therapeutic response ...Large-scale sequencing has characterized an enormous number of genetic variations(GVs),and the functional analysis of GVs is fundamental to understanding differences in disease susceptibility and therapeutic response among and within populations.Using a combination of a sequence-based predictor with known phosphorylation and protein–protein interaction information,we computationally detected 9606 potential phosSNPs(phosphorylation-related single nucleotide polymorphisms),including 720 known,disease-associated SNPs that dramatically modify the human phosSNP-associated kinase–substrate network.Further analyses demonstrated that the proteins in the network are heavily associated in various signaling and cancer pathways,while cancer genes and drug targets are significantly enriched.We re-constructed four population-specific kinase–substrate networks and found that several inherited disease or cancer genes,such as IRS1,RAF1,and EGFR,were differentially regulated by phosSNPs.Thus,phosSNPs may influence disease susceptibility and be involved in cancer development by reconfiguring phosphorylation networks in different populations.Moreover,by systematically characterizing potential phosphorylation-related cancer mutations(phosCMs)in 12 types of cancers,we observed that both types of GVs preferentially occur in the known cancer genes,while a considerable number of phosphorylated proteins,especially those over-representing cancer genes,contain both phosSNPs and phosCMs.Furthermore,it was observed that phosSNPs were significantly enriched in amplification genes identified from breast cancers and tyrosine kinase circuits of lung cancers.Taken together,these results should prove helpful for further elucidation of the functional impacts of disease-associated SNPs.展开更多
文摘群体凝血因子C同源物基因(Coagulation factor C homology,COCH)是人类发现的第一个伴前庭功能障碍的耳聋基因,位于人类染色体14q12-q13上。迄今,在COCH基因上发现16个位点突变导致常染色体显性遗传非综合征型耳聋DFNA9的发生,其中包括13个非同义单核苷酸多态性(Non-synonymous single nucleotide polymorphisms,ns SNPs)位点。由于该基因其他ns SNPs的基因型与表型关系尚不清楚,因此文章采用生物信息学方法,从COCH基因全部的SNPs中分级筛选,结合已知的致病ns SNPs信息及蛋白三维结构验证,首次预测出由COCH基因编码的cochlin蛋白的v WFA(Von Willebrand factor type A domain)区的8个高风险致病性ns SNPs(I176T、R180Q、G265E、V269L、I368N、I372T、R416C和Y424D)。同时,对位于LCCL(Limulus factor C,cochlin,and late gestation lung protein Lgl1)区域的6个已知致病突变的ns SNPs(P51S、G87W、I109N、I109T、W117R和F121S)进行了三维结构模拟,发现突变体均发生了环状结构或链状结构的改变。本研究对COCH基因的基因型与表型的相关性研究为遗传性耳聋筛查提供了相应的理论依据,也对该基因所编码的cochlin蛋白的功能研究具有一定的指导意义。
文摘本文旨在采用生物信息学方法筛选猪钙蛋白酶抑制蛋白基因(CAST)中具有潜在生物学功能的非同义单核苷酸多态性(Non-Synonymous Single Nucleotide Polymorphisms,nsSNPs)位点,为后续开展标记辅助选择改良猪的重要经济性状提供理论参考。本研究从Ensembl数据库中检索出猪CAST基因的31个nsSNPs位点,使用ConSurf、SNAP、SIFT、Polyphen-2、Mupro分析软件从序列保守性、功能性预测以及稳定性方面进行预测分析;对具有潜在功能的nsSNPs位点进行蛋白质二级结构预测,结果表明,rs319338780和rs790645252这2个位点的突变都造成了CAST蛋白质二级结构的改变。使用I-TASSE、TM-Align、PredyFlexy等软件分析这2个突变位点对蛋白质三级结构稳定性和柔性的影响,结果显示,rs790645252位点的突变使得氨基酸链738位置上的脯氨酸突变为组氨酸,并对CAST蛋白的三级结构影响较大,可能是影响猪肉质性状重要的潜在功能性位点。
基金This work was supported by the National Natural Science Foundation of China(31572394)the China Agriculture Research System of MOF and MARA(CARS-41)the White Feather Broiler Breeding Joint Project of the Ministry of Agriculture and Rural Affairs of China(19190526).
文摘Leptin receptor(LEPR)plays a vital role in obesity in humans and animals.The objective of this study is to assess LEPR functional variants for chicken adipose deposition by integration of association and in-silico analysis using a unique chicken population,the Northeast Agricultural University broiler lines divergently selected for abdominal fat content(NEAUHLF).Five online bioinformatics tools were used to predict the functionality of the single nucleotide polymorphisms(SNPs)in coding region.Further,the possible structure–function relationship of high confidence SNPs was determined by bioinformatics analyses,including the conservation and stability analysis based on amino acid residues,prediction of protein ligand-binding sites,and the superposition of protein tertiary structure.Meanwhile,we analyzed the association between abdominal fat traits and 20 polymorphisms of chicken LEPR gene.The integrated results showed that rs731962924(N867I)and rs13684622(C1002R)could lead to striking changes in the structure and function of proteins,of which rs13684622(C1002R)was significantly associated with abdominal fat weight(AFW,P=0.0413)and abdominal fat percentage(AFP,P=0.0260)in chickens.Therefore,we are of the opinion that rs13684622(C1002R)may be an essential functional SNP affecting chicken abdominal fat deposition,and potentially applied to improvement of broiler abdominal fat in molecular marker-assisted selection(MAS)program.Additionally,the coupling of association with computer electronic predictive analysis provides a new avenue to identify important molecular markers for breeders.
基金supported by the National Natural Science Foundation of China (30870827)
文摘A substitution on an amino acid sequence can be defined as "intolerant" (non-neutral) or "tolerant" (neutral) according to whether or not it detectably alters protein phenotypes (e.g.,
基金Support for this work came from the George Washington University funds to RM.RG's participation is supported by RO1 CA135069 and U01 CA168926supported in part by an appointment to the Research Participation Program at the Center for Biologics Evaluation and Research administered by the Oak Ridge Institute for Science and Education through an interagency agreement between the U.S. Department of Energy and the U.S. Food and Drug Administration
文摘The asparagine-X-serine/threonine (NXS/T) motif, where X is any amino acid except proline, is the consensus motif for N-linked glycosylation. Significant numbers of high-resolution crystal structures of glycosylated proteins allow us to carry out structural analysis of the N-linked glycosylation sites (NGS). Our analysis shows that there is enough structural information from diverse glycoproteins to allow the development of rules which can be used to predict NGS. A Python-based tool was developed to investigate asparagines implicated in N-glycosylation in five species: Homo sapiens, Mus musculus, Drosophila melanogaster, Arabidopsis thaliana and Saccharo- myees cerevisiae. Our analysis shows that 78 % of all asparagines of NXS/T motif involved in N-gly- cosylation are localized in the loop/turn conformation in the human proteome. Similar distribution was revealed for all the other species examined. Comparative analysis of the occurrence of NXS/T motifs not known to be glycosylated and their reverse sequence (S/TXN) shows a similar distribu- tion across the secondary structural elements, indicating that the NXS/T motif in itself is not bio- logically relevant. Based on our analysis, we have defined rules to determine NGS. Using machine learning methods based on these rules we can predict with 93% accuracy if a particular site will be glycosylated. If structural information is not available the tool uses structural prediction results resulting in 74% accuracy. The tool was used to identify glycosylation sites in 108 human proteins with structures and 2247 proteins without structures that have acquired NXS/T site/s due to non-synonymous variation. The tool, Structure Feature Analysis Tool (SFAT), is freely available to the public at http://hive.biochemistry.gwu.edu/tools/sfat.
基金Support for this work came from The George Washington University funds to RMsupported in part by NIH (Grant No. U01 CA168926)an appointment to the Research Participation program at the Center for Biologics Evaluation and Research administered by the Oak Ridge Institute for Science and Education through an interagency agreement between the U.S. Department of Energy and the U.S. Food and Drug Administration
文摘Amino acid changes due to non-synonymous variation are included as annotations for individual proteins in UniProtKB/Swiss-Prot and RefSeq which present biological data in a pro- tein- or gene-centric fashion. Unfortunately, proteome-wide analysis of non-synonymous single- nucleotide variations (nsSNVs) is not easy to perform because information on nsSNVs and func- tionally important sites are not well integrated both within and between databases and their search engines. We have developed SNVDis that allows evaluation of proteome-wide nsSNV distribution in functional sites, domains and pathways. More specifically, we have integrated human-specific data from major variation databases (UniProtKB, dbSNP and COSMIC), comprehensive sequence feature annotation from UniProtKB, Pfam, RefSeq, Conserved Domain Database (CDD) and pathway information from Protein ANalysis THrough Evolutionary Relationships (PANTHER) and mapped all of them in a uniform and comprehensive way to the human reference proteome pro- vided by UniProtKB/Swiss-Prot. Integrated information of active sites, pathways, binding sites, domains, which are extracted from a number of different sources, provides a detailed overview of how nsSNVs are distributed over the human proteome and pathways and how they intersect with functional sites of proteins. Additionally, it is possible to find out whether there is an over- or under-representation of nsSNVs in specific domains, pathways or user-defined protein lists. The underlying datasets are updated once every 3 months. SNVDis is freely available at http://hive.bio- chemistry.gwu.edu/tool/snvdis.
基金This work was supported by grants from the National Basic Research Program(973 project)(2013CB933900,2012CB910101,and 2012CB911201)the National Natural Science Foundation of China(31171263,81272578,and 31071154)+1 种基金the International Science and Technology Cooperation Program of China(2014DFB30020)China Postdoctoral Science Foundation(2014M550392).
文摘Large-scale sequencing has characterized an enormous number of genetic variations(GVs),and the functional analysis of GVs is fundamental to understanding differences in disease susceptibility and therapeutic response among and within populations.Using a combination of a sequence-based predictor with known phosphorylation and protein–protein interaction information,we computationally detected 9606 potential phosSNPs(phosphorylation-related single nucleotide polymorphisms),including 720 known,disease-associated SNPs that dramatically modify the human phosSNP-associated kinase–substrate network.Further analyses demonstrated that the proteins in the network are heavily associated in various signaling and cancer pathways,while cancer genes and drug targets are significantly enriched.We re-constructed four population-specific kinase–substrate networks and found that several inherited disease or cancer genes,such as IRS1,RAF1,and EGFR,were differentially regulated by phosSNPs.Thus,phosSNPs may influence disease susceptibility and be involved in cancer development by reconfiguring phosphorylation networks in different populations.Moreover,by systematically characterizing potential phosphorylation-related cancer mutations(phosCMs)in 12 types of cancers,we observed that both types of GVs preferentially occur in the known cancer genes,while a considerable number of phosphorylated proteins,especially those over-representing cancer genes,contain both phosSNPs and phosCMs.Furthermore,it was observed that phosSNPs were significantly enriched in amplification genes identified from breast cancers and tyrosine kinase circuits of lung cancers.Taken together,these results should prove helpful for further elucidation of the functional impacts of disease-associated SNPs.