序列相似性网络聚类与蛋白质家族划分被引量：2

Effect of Culture Models on Metabolism and Protein Components of Microalgae Chlorella vulgaris

下载PDF

导出

摘要图聚类法是利用蛋白质序列信息推断其家族分类的有力手段。对于蛋白质数据集中家族内外存在着如同许多超家族一样的复杂关系,图聚类法达到较好表现必须两因素,1)输入的相似性图需要包含有足够的用于分类的信息;2)需要稳健的算法以识别被隐藏在相似性图中的模糊集团。作者测试模块度最优算法Contraction-Dilation(CD)算法,采用来自于Pfam中的具有高度序列差异的烯醇酶宗族作为测试数据集。结果表明使用CD算法在相关参数与相似性图比较恰当的情况下,得到聚类结果与Pfam中高度一致。该算法能在一般情况下,使用最佳参数附近较宽范围仍能表现出较好性能。 Graph clustering is a powerful methods to infer protein family classification from sequence only.To achieve good performance for a set of proteins that have complex intra-and inter-class relationships as in many protein superfamilies,two factors are essential：1） the similarity graph as input that contains enough information for classification and 2） a stable algorithm that can discover the obscure group structure hidden in the similarity graph.We tested a modularity optimization algorithm,called Contraction-Dilation （CD）,on a set of sequences from the Pfam clan enolase with broad sequence diversity.The results show that CD outputs are in high agreement with the Pfam classification when the algorithm parameters and similarity graph are appropriately set.The fact that best performance can be achieved in a wide range around optimal settings shows the capability of this approach in general situation.

作者时逢宽李炜疆

机构地区江南大学工业生物技术教育部重点实验室江南大学生物工程学院

出处《食品与生物技术学报》 CAS CSCD 北大核心 2014年第1期98-103,共6页 Journal of Food Science and Biotechnology

关键词图聚类蛋白质家族网络聚类 graph clustering protein family similarity graph

分类号 Q51 [生物学—生物化学]

引文网络
相关文献

参考文献20

1Altschul S F,Gish W,Miller W. Basic local alignment search tool[J].{H}Journal of Molecular Biology,1990,(3):403-410.
2Pearson W R. Effective protein sequence comparison[J].{H}Methods in Enzymology,1996.227-258.
3Spirin V,Mimy L A. Protein complexes and functional modules in molecular networks[J].{H}Proceedings of the National Academy of Sciences(USA),2003,(21):12123-12128.
4Enright A J,Van Dongen S,Ouzounis C A. An efficient algorithm for large-scale detection of protein families[J].{H}Nucleic Acids Research,2002,(7):1575-1584.
5Mei J,He S,Shi G. Revealing network communities through modularity maximization by a contraction-dilation method[J].{H}NEW JOURNAL OF PHYSICS,2009,(4).
6Punta M,Coggill P C,Eberhardt R Y. The pfam protein families database[J].{H}Nucleic Acids Research,2011,(D1):D290-D301.
7Mowat C G,Chapman S K. Multi-heme cytochromes-new structures,new chemistry[J].{H}Dalton Transactions,2005,(21):3381-3389.
8Foggia P,Percannella G,Sansone C. A graph-based clustering method and its applications[M].Springer Berlin / Heidelberg,2007.277-287.
9Bello-Orgaz G,Menéndez H D,Camacho D. Adaptive k-means algorithm for overlapped graph clustering[J].{H}International Journal of Neural Systems,2012,(5).
10Santini G,Soldano H,Pothier J. Automatic classification of protein structures relying on similarities between alignments[J].{H}BMC Bioinformatics,2012,(1).

同被引文献12

1Finn RD, Bateman A, Clements J, et al. Pfam: the protein families database [ J ]. Nucleic Acids Res ,2014,42 ( D1 ) : IY222-D230.
2Hubbard TJ, Mur'zin AG, Brenner SE, et al. SCOP: a structural classification of proteins database [J]. Nucleic Acids Research, 1997,25 ( 1 ) :236-239.
3Girvan M,Ncwrnan MEJ. Community structure in social and bio- logical networks [ J]. Proceedings of the National Academy of Sci- ences,2002,99(12) :7821-7826.
4Newman MEJ, Girvan M. Finding and evaluating community structure in networks [ J]. Physical review E, Statistical, nonlin- ear, and soft matter physics,2004,69(2 Pt 2).
5Mei J, Xiaojian Y, Weican Z. Revealing remote protein homology with sequence similarity and a modularity-based approach [ J']. Theor Biol Forum,2011,104( 1 ) :57-68.
6Altsehul SF, Gish W, Miller W, et al. Basic local alignment search tool [ J ]. Journal of molecular biology, 1990,215 ( 3 ) :403-410.
7Pearson WR, Lipman DJ. Improved tools for biological sequence comparison [ J]. Proceedings of the National Academy of Sciences of the United States of America, 1988,85 ( 8 ) : 2444-2448.
8Pearson WR. Effective protein sequence comparison [ J]. Methods in enzymology, 1996,266:227-258.
9Li WZ, Godzik A. Cd-hit: a fast program for clustering and com- paring large sets of protein or nucleotide sequences [ J]. Bioinfor- matics,2006, 22 (13) : 1658-1659.
10梅娟,王正祥,石贵阳,李炜疆.复杂生物网络分析的图聚类方法研究进展[J].食品与生物技术学报,2008,27(5):15-20. 被引量：6

引证文献2

1时逢宽,李炜疆.构建适用于蛋白质家族分类的相似性网络[J].工业微生物,2015,45(3):53-57.
2王钰,刘静,管骁,崔双龙,汤杏华.基于谷物蛋白质序列与PPI网络的功能预测研究[J].食品与生物技术学报,2023,42(4):75-84.

1刘欣.头发能检测亲子关系吗[J].共产党员（下半月）,2015(9):57-57.
2捷鹏.来自非洲的人类女始祖“夏娃”[J].黄金时代（下半月）,2006,0(6):50-51.
3郑媛,盛军,纪晓峰,郑兰红,孙谧.锌金属蛋白酶家族的结构与催化机理[J].中国生物化学与分子生物学报,2013,29(8):719-726. 被引量：6
4周庆新,戴炳业,陈蕾蕾,刘孝永,裘纪莹,陈相艳.瑞氏木霉中β-葡萄糖苷酶基因功能研究进展[J].中国农业科技导报,2014,16(2):74-78. 被引量：4
5周庆红,李成琼,匡全.植物蛋白激酶研究进展[J].生物学杂志,2003,20(3):1-4. 被引量：10
6王晓芳,贾宗维.一种新的图划分算法在PPI网络模块化中的研究[J].山西农业大学学报（自然科学版）,2012,32(6):574-576.
7武学鸿,费耀平,李敏.蛋白质网络聚类算法分析平台的设计与实现[J].生物信息学,2012,10(2):106-111. 被引量：1
8梅娟,赵吉,傅毅.基于图聚类和序列信息的蛋白质远同源性探测[J].计算机与应用化学,2015,32(8):945-950. 被引量：1
9费玉婷,乔建卫,蒋立勤.纤维素酶法提取山药多糖的工艺研究[J].农产品加工（下）,2008(6):31-33. 被引量：22
10Fu JD Li J Tweedie D Yu HM Chen L Wang R Riordon DR Brugh SA Wang SQ Boheler KR Yang HT.Crucial role of the sarcoplasmic reticulum in the developmental regulation of Ca2＋ transients and contraction in cardiomyocytes derived from embryonic stem cells[J].中国生物学文摘,2006,20(7):20-21. 被引量：9

食品与生物技术学报

2014年第1期

浏览历史

内容加载中请稍等...

序列相似性网络聚类与蛋白质家族划分被引量：2

参考文献20

同被引文献12

引证文献2

相关作者

相关机构

相关主题

浏览历史

序列相似性网络聚类与蛋白质家族划分 被引量：2

参考文献20

同被引文献12

引证文献2

相关作者

相关机构

相关主题

浏览历史

序列相似性网络聚类与蛋白质家族划分被引量：2