蛋白质数据库中匹配间隙序列标签的自动机算法

An automata approach to match gapped sequence tags against protein database

导出

摘要对于肽和蛋白质的分析鉴别,串联质谱(MS/MS)是极其重要的方法。解释MS/MS数据的一种方法是de novo序列,它正变得越来越准确和重要了。但de novo序列通常只能准确地判定序列的一部分,而对于不确定的部分只能通过“质量间隙”来表示,我们称这样部分确定的序列为间隙序列标签。对于蛋白质的分析鉴别,当在数据库中查询一个间隙序列标签时,其中确定的部分应与数据库蛋白质序列完全匹配,而对于每一个质量间隙也应匹配一个氨基酸子串,这些氨基酸子串的质量和应与质量间隙的质量和相等。在这种情况之下,标准的串匹配算法已经不再适用。在本文中,我们将提出一个新的且有效的算法,用以在蛋白质数据库中找到与间隙序列标签所匹配的序列。 Tandem mass spectrometry （MS/MS） is the most important method for the peptide and protein identification. One approach to interpret the MS/MS data is de novo sequencing, which is becoming more and more accurate and important, de novo sequencing usually can only confidently determine partial sequences, while the undetermined parts are represented by “mass gaps”. We call such a partially determined sequence a gapped sequence tag. When a gapped sequence tag is searched in a database for protein identification, the determined parts should match the database sequence exactly, while each mass gap should match a substring of amino acids whose masses total up to the value of the mass gap. In such a case, the standard string matching algorithm does not work any more. In this pa- per, we present a new efficient algorithm to find the matches of gapped sequence tags in a protein database.

作者张涛

机构地区中国科学院研究生院

出处《计算机与应用化学》 CAS CSCD 北大核心 2005年第10期845-850,共6页 Computers and Applied Chemistry

基金国家863计划重大专项资助项目(2002AA103061)国家自然科学基金资助项目(10171099)

关键词串联质谱(MS/MS) AHO-CORASICK算法质量间隙序列标签 mass spectrometry （MS/MS） , aho-corasick algorithm,gapped sequence tag

分类号 TP301.6 [自动化与计算机技术—计算机系统结构]

引文网络
相关文献

参考文献24

1Aebersold R and Mann M. Mass spectrometry-based proteomics.Nature, 2003, 422:198 - 207.
2Castelo AT, Wellington Martins and Gao GR. TROLL-tandem repeat occurrence locator. Bioinformatics, 2002, 18:634 -636.
3Altschul SF, et al. Basic local alignment search tool. J Mol Biol,1990, 215:403 -410.
4Bartels C. Fast algorithm for peptide sequencing by mass spectroscopy. BiomedEnviron. Mass Spectrom, 1990, 19:363 - 368.
5Brudno M, et al. Fast and sensitive multiple alignment of large genomic sequences. BMC Bioinformatics, 2003, 4:66.
6Chen T, et al. A dynamic programming approach to de novo peptide sequencingvia tandem mass pectrometry. J Comp Biology, 2001, 8(3) :325 -337.
7Dan · CV, et al. De novo protein sequencing via tandem massspectrometry. J Comp Biology, 1999, 6:327 - 341.
8Eng JK, McCormack AL and Yates JR. An approach to correlate tandem massspectral data of peptides with amino acid sequences in a protein database. J Am Soc Mass Spectrom, 1994, 5:976 -989.
9Fernandez-de-Cossio J, et al. Automated interpretation of high-energy collision-induced dissociation spectra of singly-protonated peptides by SeqMS, a softwareaid for de novo sequencing by MS/MS.Rapid Commun. Mass Spectrom, 1998, 12:1867 - 1878.
10Hines WM, et al. Pattern-based algorithm for peptide sequencing from tandemhigh energy collision-induced dissociation mass spectra.J Am Sco Mass Spectrom, 1992, 3:326 -336.

1闻凯,王从庆.元胞自动机在移动机器人路径规划上的应用[J].自动化应用,2016(12):64-66. 被引量：2
2闻凯,王从庆.一种机器人路径规划的元胞自动机算法[J].科技信息,2010(05X):31-32. 被引量：1
3李立宗,高铁杠,陈蓉,陈超.认证中心控制下的版权保护框架研究[J].计算机工程与应用,2009,45(14):87-89. 被引量：1
4王培凤,李莉.基于Aho-Corasick算法的多模式匹配算法研究[J].计算机应用研究,2011,28(4):1251-1253. 被引量：16
5王培凤,李莉.一种改进的多模式匹配算法在Snort中的应用[J].计算机科学,2012,39(2):72-74. 被引量：8
6王樱,杨丽,李锡辉.模式匹配技术在多序列比对中的应用[J].信息系统工程,2014,27(10):79-81.
7乔彦涛,缪佳铮,孙世伟,刘金刚,卜东波.串联质谱的蛋白质序列鉴定技术综述[J].计算机科学与探索,2010,4(2):97-107. 被引量：5
8新品涨跌随市存货一扫而光：清仓大甩卖合适您再买[J].摄影与摄像,2009(5):114-115.
9张雪松,田宏.面向入侵检测的Aho-Corasick算法内存消耗研究[J].辽宁石油化工大学学报,2008,28(1):66-69. 被引量：1
10Acer TravelMate 8100[J].个人电脑,2005,11(5):123-123.

计算机与应用化学

2005年第10期

浏览历史

内容加载中请稍等...

蛋白质数据库中匹配间隙序列标签的自动机算法

参考文献24

相关作者

相关机构

相关主题

浏览历史