Advances in mass spectrometry(MS)have enabled high-throughput analysis of proteomes in biological systems.The state-of-the-art MS data analysis relies on database search algorithms to quantify proteins by identifying ...Advances in mass spectrometry(MS)have enabled high-throughput analysis of proteomes in biological systems.The state-of-the-art MS data analysis relies on database search algorithms to quantify proteins by identifying peptide–spectrum matches(PSMs),which convert mass spectra to peptide sequences.Different database search algorithms use distinct search strategies and thus may identify unique PSMs.However,no existing approaches can aggregate all user-specified database search algorithms with a guaranteed increase in the number of identified peptides and a control on the false discovery rate(FDR).To fill in this gap,we proposed a statistical framework,Aggregation of Peptide Identification Results(APIR),that is universally compatible with all database search algorithms.Notably,under an FDR threshold,APIR is guaranteed to identify at least as many,if not more,peptides as individual database search algorithms do.Evaluation of APIR on a complex proteomics standard dataset showed that APIR outpowers individual database search algorithms and empirically controls the FDR.Real data studies showed that APIR can identify disease-related proteins and post-translational modifications missed by some individual database search algorithms.The APIR framework is easily extendable to aggregating discoveries made by multiple algorithms in other high-throughput biomedical data analysis,e.g.,differential gene expression analysis on RNA sequencing data.The APIR R package is available at https://github.com/yiling0210/APIR.展开更多
In this study we systematically analyzed the elution condition of tryptic peptides and the characteristics of identi?ed peptides in reverse phase liquid chromatogra- phy and electrospray tandem mass spectrometry (RPLC...In this study we systematically analyzed the elution condition of tryptic peptides and the characteristics of identi?ed peptides in reverse phase liquid chromatogra- phy and electrospray tandem mass spectrometry (RPLC-MS/MS) analysis. Fol- lowing protein digestion with trypsin, the peptide mixture was analyzed by on-line RPLC-MS/MS. Bovine serum albumin (BSA) was used to optimize acetonitrile (ACN) elution gradient for tryptic peptides, and Cytochrome C was used to retest the gradient and the sensitivity of LC-MS/MS. The characteristics of identi?ed peptides were also analyzed. In our experiments, the suitable ACN gradient is 5% to 30% for tryptic peptide elution and the sensitivity of LC-MS/MS is 50 fmol. Analysis of the tryptic peptides demonstrated that longer (more than 10 amino acids) and multi-charge state (+2, +3) peptides are likely to be identi?ed, and the hydropathicity of the peptides might not be related to whether it is more likely to be identi?ed or not. The number of identi?ed peptides for a protein might be used to estimate its loading amount under the same sample background. Moreover, in this study the identi?ed peptides present three types of redundancy, namely iden- ti?cation, charge, and sequence redundancy, which may repress low abundance protein identi?cation.展开更多
Fast atom bombardment mass spectrometry (FAB-MS) is applied to distinguish N- terminal series ions from C-terminal series ions of a peptide by on-probe acetylation, it provides valuable information about the sequence ...Fast atom bombardment mass spectrometry (FAB-MS) is applied to distinguish N- terminal series ions from C-terminal series ions of a peptide by on-probe acetylation, it provides valuable information about the sequence of an unknown peptide. The FAB mass spectra contain a number of characteristic ions at low-mass region in addition to the sequence ions at high-mass region. It was found that the ions below m/z 200 are characteristic of the amino acid composition of the peptide, from which the amino acid composition of the peptide could be estimated. Additionally, mixture analysis is also discussed.展开更多
Typically, detection of protein sequences in collision-induced dissociation (CID) tandem MS (MS2) dataset is performed by mapping identified peptide ions back to protein sequence by using the protein database sear...Typically, detection of protein sequences in collision-induced dissociation (CID) tandem MS (MS2) dataset is performed by mapping identified peptide ions back to protein sequence by using the protein database search (PDS) engine. Finding a particular peptide sequence of interest in CID MS2 records very often requires manual evaluation of the spectrum, regardless of whether the peptide-associated MS2 scan is identified by PDS algorithm or not. We have developed a com- pact cross-platform database-free command-line utility, pepgrep, which helps to find an MS2 finger- print for a selected peptide sequence by pattern-matching of modelled MS2 data using Peptide-to- MS2 scoring algorithm, pepgrep can incorporate dozens of mass offsets corresponding to a variety of post-translational modifications (PTMs) into the algorithm. Decoy peptide sequences are used with the tested peptide sequence to reduce false-positive results. The engine is capable of screening an MS2 data file at a high rate when using a cluster computing environment. The matched MS2 spectrum can be displayed by using built-in graphical application programming interface (API) or optionally recorded to file. Using this algorithm, we were able to find extra peptide sequences in studied CID spectra that were missed by PDS identification. Also we found pepgrep especially useful for examining a CID of small fractions of peptides resulting from, for example, affinity puri- fication techniques. The peptide sequences in such samples are less likely to be positively identified by using routine protein-centric algorithm implemented in PDS. The software is freely available at http://bsproteomics.essex.ac.uk:8080/data/download/pepgrep- 1.4.tgz.展开更多
In this study, we present a preprocessing method for quadrupole time-of-flight (Q-TOF) tandem mass spectra to increase the accuracy of database searching for peptide (protein) identification. Based on the natural ...In this study, we present a preprocessing method for quadrupole time-of-flight (Q-TOF) tandem mass spectra to increase the accuracy of database searching for peptide (protein) identification. Based on the natural isotopic information inherent in tandem mass spectra, we construct a decision tree after feature selection to classify the noise and ion peaks in tandem spectra. Furthermore, we recognize overlapping peaks to find the monoisotopic masses of ions for the following identification process. The experimental results show that this preprocessing method increases the search speed and the reliability of peptide identification.展开更多
基金supported by the following grants:the National Cancer Institute,USA(a part of the National Institutes of Health,USAGrant No.T32LM012424)to Yiling Elaine Chen+8 种基金the National Cancer Institute,USA(Grant No.K08CA201591)the Margaret E Early Medical Research Trust,USAthe Pediatric Cancer Research Foundation,USA to Leo David Wangthe National Cancer Institute under Cancer Center Support Grant,USA(Grant No.P30CA033572)to the MS facility at the City of Hopethe National Institute of General Medical Sciences,USA(a part of the National Institutes of Health,USAGrant Nos.R01GM120507 and R35GM140888)the National Science Foundation,USA(Grant Nos.DBI-1846216 and DMS-2113754)the Johnson&Johnson WiSTEM2D Award,USA,the Sloan Research Fellowship,USAthe UCLA David Geffen School of Medicine W.M.Keck Foundation Junior Faculty Award,USA,to Jingyi Jessica Li.
文摘Advances in mass spectrometry(MS)have enabled high-throughput analysis of proteomes in biological systems.The state-of-the-art MS data analysis relies on database search algorithms to quantify proteins by identifying peptide–spectrum matches(PSMs),which convert mass spectra to peptide sequences.Different database search algorithms use distinct search strategies and thus may identify unique PSMs.However,no existing approaches can aggregate all user-specified database search algorithms with a guaranteed increase in the number of identified peptides and a control on the false discovery rate(FDR).To fill in this gap,we proposed a statistical framework,Aggregation of Peptide Identification Results(APIR),that is universally compatible with all database search algorithms.Notably,under an FDR threshold,APIR is guaranteed to identify at least as many,if not more,peptides as individual database search algorithms do.Evaluation of APIR on a complex proteomics standard dataset showed that APIR outpowers individual database search algorithms and empirically controls the FDR.Real data studies showed that APIR can identify disease-related proteins and post-translational modifications missed by some individual database search algorithms.The APIR framework is easily extendable to aggregating discoveries made by multiple algorithms in other high-throughput biomedical data analysis,e.g.,differential gene expression analysis on RNA sequencing data.The APIR R package is available at https://github.com/yiling0210/APIR.
基金This work was partially supported by grants fromthe National Basic Research Program (973) of China(2004CB520804), and National Natural Science Foun-dation of China (No. 30270657, 30230150, 3037030).
文摘In this study we systematically analyzed the elution condition of tryptic peptides and the characteristics of identi?ed peptides in reverse phase liquid chromatogra- phy and electrospray tandem mass spectrometry (RPLC-MS/MS) analysis. Fol- lowing protein digestion with trypsin, the peptide mixture was analyzed by on-line RPLC-MS/MS. Bovine serum albumin (BSA) was used to optimize acetonitrile (ACN) elution gradient for tryptic peptides, and Cytochrome C was used to retest the gradient and the sensitivity of LC-MS/MS. The characteristics of identi?ed peptides were also analyzed. In our experiments, the suitable ACN gradient is 5% to 30% for tryptic peptide elution and the sensitivity of LC-MS/MS is 50 fmol. Analysis of the tryptic peptides demonstrated that longer (more than 10 amino acids) and multi-charge state (+2, +3) peptides are likely to be identi?ed, and the hydropathicity of the peptides might not be related to whether it is more likely to be identi?ed or not. The number of identi?ed peptides for a protein might be used to estimate its loading amount under the same sample background. Moreover, in this study the identi?ed peptides present three types of redundancy, namely iden- ti?cation, charge, and sequence redundancy, which may repress low abundance protein identi?cation.
文摘Fast atom bombardment mass spectrometry (FAB-MS) is applied to distinguish N- terminal series ions from C-terminal series ions of a peptide by on-probe acetylation, it provides valuable information about the sequence of an unknown peptide. The FAB mass spectra contain a number of characteristic ions at low-mass region in addition to the sequence ions at high-mass region. It was found that the ions below m/z 200 are characteristic of the amino acid composition of the peptide, from which the amino acid composition of the peptide could be estimated. Additionally, mixture analysis is also discussed.
文摘Typically, detection of protein sequences in collision-induced dissociation (CID) tandem MS (MS2) dataset is performed by mapping identified peptide ions back to protein sequence by using the protein database search (PDS) engine. Finding a particular peptide sequence of interest in CID MS2 records very often requires manual evaluation of the spectrum, regardless of whether the peptide-associated MS2 scan is identified by PDS algorithm or not. We have developed a com- pact cross-platform database-free command-line utility, pepgrep, which helps to find an MS2 finger- print for a selected peptide sequence by pattern-matching of modelled MS2 data using Peptide-to- MS2 scoring algorithm, pepgrep can incorporate dozens of mass offsets corresponding to a variety of post-translational modifications (PTMs) into the algorithm. Decoy peptide sequences are used with the tested peptide sequence to reduce false-positive results. The engine is capable of screening an MS2 data file at a high rate when using a cluster computing environment. The matched MS2 spectrum can be displayed by using built-in graphical application programming interface (API) or optionally recorded to file. Using this algorithm, we were able to find extra peptide sequences in studied CID spectra that were missed by PDS identification. Also we found pepgrep especially useful for examining a CID of small fractions of peptides resulting from, for example, affinity puri- fication techniques. The peptide sequences in such samples are less likely to be positively identified by using routine protein-centric algorithm implemented in PDS. The software is freely available at http://bsproteomics.essex.ac.uk:8080/data/download/pepgrep- 1.4.tgz.
基金supported by the National Basic Research Program(973 Program)of China(No.2002CB713807)the National Key Technologies R&D Program of China(No.2004BA711A21)
文摘In this study, we present a preprocessing method for quadrupole time-of-flight (Q-TOF) tandem mass spectra to increase the accuracy of database searching for peptide (protein) identification. Based on the natural isotopic information inherent in tandem mass spectra, we construct a decision tree after feature selection to classify the noise and ion peaks in tandem spectra. Furthermore, we recognize overlapping peaks to find the monoisotopic masses of ions for the following identification process. The experimental results show that this preprocessing method increases the search speed and the reliability of peptide identification.