摘要
在蛋白质组学研究中,通常使用数据库检索算法进行蛋白质的鉴定。使用完整性较高但注释不准确的数据库,可能能够鉴定到更多的蛋白质,但存在数据不准确的风险;使用注释准确但完整性较低的数据库,则有可能漏掉一些数据库中未收录的蛋白。如何兼顾蛋白质鉴定结果的完整性和准确性是一个重要的问题。本研究以人类蛋白质组为例,采用不同质谱仪及不同样品产生的蛋白质组数据,比较了常用的IPI数据库、UniProt数据库和Swiss-Prot数据库的检索结果。结果表明,3个数据库在不同的蛋白质组数据中表现各有优劣,但总体来讲差异很小;每个数据库可鉴定到的、特有的多肽数不超过总数的5%,蛋白数的差异为1%~5%。说明3个数据库都覆盖了常见的人类蛋白序列,完整性很高。因此,推荐采用通过人工注释、在不断更新中的Swiss-Prot数据库作为检索对象。当研究目的为鉴定或定量未收录在Swiss-Prot数据库中的蛋白序列(如一些特殊的蛋白异构体或突变体)时,可将目的序列加入该数据库进行检索,或考虑使用其他完整性更高的数据库。
Database searching is a common strategy to identify proteins in current proteomic studies. In this strategy, searching against a highly comprehensive database might produce more protein identifications, but have the risk of incorrect database annotations. In contrast, using a more accurate database might loss some correct protein identifications that are not included in the database due to less database completeness. Achieving both completeness and accuracy in protein identification is an important problem. Taking human proteomic study as an example, this study compared database searching results of three commonly used protein databases (IPI database, UniProt database and Swiss-Prot database) on three proteomic datasets that were obtained from different biological samples and mass spectrometers. In general, although these databases performed differently on various proteomic data, the differences among them were not significant. For each database, no more than 5% of the total peptide identifications were not identified by the other two databases, while the differences of protein identifications ranged from 1% to 5%. This result indicates that all of the databases are with high completeness by covering most of the commonly identified proteins in human samples. Therefore, we recommend using Swiss-Prot database, a manually curated and continuously updated database, for routine human proteomic analysis. In addition, if the aim of a study to identify or quantify some special sequences that are not included in Swiss-Prot database, such as protein isoforms or mutations, researchers can add the target protein sequences to Swiss-Prot database, or use a more complete database instead.
出处
《中国生物医学工程学报》
CAS
CSCD
北大核心
2013年第2期129-134,共6页
Chinese Journal of Biomedical Engineering
基金
国家自然科学基金青年基金项目(31200614)