期刊文献+

蛋白质组学肽段鉴定可信度评价方法

Validation Methods of Peptide Identification Results in Proteomics
下载PDF
导出
摘要 蛋白质组学基于质谱数据鉴定肽段和蛋白质,从而给出基因表达的直接证据,帮助解析蛋白质的结构和功能,研究蛋白质与疾病的关系,提供靶向治疗方案,而这些都取决于鉴定的肽段和蛋白质的准确性。蛋白质组学常采用目标-诱饵库方法 (target-decoy approach,TDA)对鉴定的肽段和蛋白质进行质量控制,并对其进行改进演化后应用到子类肽段(比如突变肽段和修饰肽段等)和交联肽段等特殊鉴定结果的可信度评价中。然而,TDA存在两个局限,即错误率估计值不够准确以及不能评价单个鉴定结果的可信度,经过TDA质量控制后的结果还需要进一步检验,因此领域内也提出了一系列其他方法 (本文统称为Beyond-TDA方法),协同加强肽段的可信度评价。本文对数据依赖模式下采集的质谱数据肽段层面的TDA常规方法和特殊方法进行了综述,对Beyond-TDA方法进行了分类阐述,并总结了各种方法的优势与不足。 Mass spectrometry-based proteomics aims to identify peptides and proteins to give direct proofs of gene expressions, analyze structures and functions of proteins, study the relationship between proteins and diseases, and provide targeted treatment options. All these studies are based on the credibility of identified peptides and proteins. However, it is impossible to manually check all identified peptides because a large number of identifications can be collected from one mass spectrometry experiment. Thus, target-decoy approach(TDA) is proposed and always used to control the quality of identified peptides and proteins, and has been expanded to subclasses of peptides(including ordinary subclasses of peptides, variant peptides, and modified peptides) and cross-linking peptides. However, TDA still has two limitations:(1) the estimation of false discovery rate(FDR) is inaccurate and(2) validation of single identification cannot be supported. Thus, the identification results that passed the TDA-based FDR control need to be further validated and other validation methods which are used after TDA-FDR filtration(referred to as Beyond-TDA methods) have been developed to enhance peptide validation. This paper reviews TDA and its extensions as well as Beyond-TDA methods and discusses the advantages and disadvantages of each method. In the first part of this paper, we introduce the goal of proteomics, the process of mass spectrometry acquisition and analysis, the validation problem, and the early statistical methods to evaluate the identification credibility. Then, in the second part of this paper, we describe in detail the ordinary TDA-FDR method, including the assumption that random matches are equally likely to appear in target and decoy databases, the construction methods to generate the decoy database, and the computational formula of TDA-FDR. We also introduce the extensions of TDA-FDR on ordinary subclasses of peptides, variant peptides, modified peptides, proteogenomics peptides, cross-linking peptides, and glycopeptides. However, TDA cannot model the homologous incorrect peptides, thus TDA-FDR underestimates the actual false rate. So, after TDA-FDR filtration, it is necessary to use more strict validation methods, i.e., Beyond-TDA methods, which are reviewed in detail in the third part of this paper, to control validation credibility. In this part, four kinds of methods are introduced, including validation methods based on search space(trap database validation and open search validation), spectra similarity(synthetic peptide validation and theoretical spectra prediction), chemical information(retention time prediction and stable isotopic labeling validation) and machine learning technology(Percolator, p Valid, and Deep Rescore). Lastly, we summarize the content of this paper and discuss the future improvement directions of validation methods.
作者 周文婧 曾文锋 迟浩 贺思敏 ZHOU Wen-Jing;ZENG Wen-Feng;CHI Hao;HE Si-Min(Key Laboratory of Intelligent Information Processing of Chinese Academy of Sciences,Institute of Computing Technology,Chinese Academy of Sciences,Beijing 100190,China;University of Chinese Academy of Sciences,Beijing 100049,China)
出处 《生物化学与生物物理进展》 SCIE CAS CSCD 北大核心 2023年第1期109-125,共17页 Progress In Biochemistry and Biophysics
基金 国家重点研发计划重点专项(2016YFA0501300) 国家自然科学基金优秀青年科学基金(32022046)资助项目。
关键词 蛋白质组学 质谱分析 目标-诱饵库方法 假发现率 可信度评价方法 proteomics mass spectrometry target-decoy approach false discovery rate validation methods
  • 相关文献

参考文献1

二级参考文献41

  • 1Elias J E, Gygi S P. Target-decoy search strategy for increased confidence in large-scale protein identifications by mass spectrometry. Nat Methods, 2007, 4 (3): 207-214
  • 2Ramos-Femandez A, Paradela A, Navajas R, et al. Generalized method for probability-based peptide and protein identification from tandem mass spectrometry data and sequence database searching. Mol Cell Proteomics, 2008, 7 (9): 1748- 1754
  • 3Nesvizhskii A I, Aebersold R. Interpretation of shotgun proteomic data: the protein inference problem. Mol Cell Prot, 2005, 4 (10): 1419-1440
  • 4Yang X, Dondeti V, Dezube R, et al. DBParser: web-based sottware for shotgun proteomic data analyses. J Proteome Res, 2004, 3 (5): 1002- 1008
  • 5Slotta D J, MeFarland M, Makusky A, et al. P18-T Mass Sieve: a new tool for mass spectrometry-based proteomics. J Biom Tech, 2007, 18 (1): 7
  • 6Kristensen D B, Brond J C, Nielsen P A, et ol. Experimental peptide identification repository (EPIR): an integrated peptide-centrie platform for validation and mining of tandem mass spectrometry data. Mol Cell Proteomics, 2004, 3 (10): 1023-1038
  • 7Resing K A, Meyer-Arendt K, Mendoza A M, et al. Improving reproducibility and sensitivity in identifying human proteins by shotgun proteomics. Anal Chem, 2004, 76 (13): 3556-3568
  • 8Tabb D L, McDonald W H, Yates III J R. DTASelect and Contrast: tools for assembling and comparing protein idantifieations from shotgun proteomics. J Proteome Res, 2002, 1 (1): 21-26
  • 9Stephan C, Reidegeld K A, Hamacher M, et al. Automated reprocessing pipeline for searching heterogeneous mass spectrometric data of the HUPO brain proteome project pilot phase. Proteomies, 2006, 6 (18): 5015-5029
  • 10Hamacher M, Apweiler R, Arnold G, et al. HUPO brain proteome project: summary of the pilot phase and introduction of a comprehensive data reprocessing strategy. Proteomics, 2006, 6 (18): 4890-4898

共引文献9

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部