摘要
对于肽和蛋白质的分析鉴别,串联质谱(MS/MS)是极其重要的方法。解释MS/MS数据的一种方法是de novo序列,它正变得越来越准确和重要了。但de novo序列通常只能准确地判定序列的一部分,而对于不确定的部分只能通过“质量间隙”来表示,我们称这样部分确定的序列为间隙序列标签。对于蛋白质的分析鉴别,当在数据库中查询一个间隙序列标签时,其中确定的部分应与数据库蛋白质序列完全匹配,而对于每一个质量间隙也应匹配一个氨基酸子串,这些氨基酸子串的质量和应与质量间隙的质量和相等。在这种情况之下,标准的串匹配算法已经不再适用。在本文中,我们将提出一个新的且有效的算法,用以在蛋白质数据库中找到与间隙序列标签所匹配的序列。
Tandem mass spectrometry (MS/MS) is the most important method for the peptide and protein identification. One approach to interpret the MS/MS data is de novo sequencing, which is becoming more and more accurate and important, de novo sequencing usually can only confidently determine partial sequences, while the undetermined parts are represented by “mass gaps”. We call such a partially determined sequence a gapped sequence tag. When a gapped sequence tag is searched in a database for protein identification, the determined parts should match the database sequence exactly, while each mass gap should match a substring of amino acids whose masses total up to the value of the mass gap. In such a case, the standard string matching algorithm does not work any more. In this pa- per, we present a new efficient algorithm to find the matches of gapped sequence tags in a protein database.
出处
《计算机与应用化学》
CAS
CSCD
北大核心
2005年第10期845-850,共6页
Computers and Applied Chemistry
基金
国家863计划重大专项资助项目(2002AA103061)国家自然科学基金资助项目(10171099)