Advances in mass spectrometry(MS)have enabled high-throughput analysis of proteomes in biological systems.The state-of-the-art MS data analysis relies on database search algorithms to quantify proteins by identifying ...Advances in mass spectrometry(MS)have enabled high-throughput analysis of proteomes in biological systems.The state-of-the-art MS data analysis relies on database search algorithms to quantify proteins by identifying peptide–spectrum matches(PSMs),which convert mass spectra to peptide sequences.Different database search algorithms use distinct search strategies and thus may identify unique PSMs.However,no existing approaches can aggregate all user-specified database search algorithms with a guaranteed increase in the number of identified peptides and a control on the false discovery rate(FDR).To fill in this gap,we proposed a statistical framework,Aggregation of Peptide Identification Results(APIR),that is universally compatible with all database search algorithms.Notably,under an FDR threshold,APIR is guaranteed to identify at least as many,if not more,peptides as individual database search algorithms do.Evaluation of APIR on a complex proteomics standard dataset showed that APIR outpowers individual database search algorithms and empirically controls the FDR.Real data studies showed that APIR can identify disease-related proteins and post-translational modifications missed by some individual database search algorithms.The APIR framework is easily extendable to aggregating discoveries made by multiple algorithms in other high-throughput biomedical data analysis,e.g.,differential gene expression analysis on RNA sequencing data.The APIR R package is available at https://github.com/yiling0210/APIR.展开更多
This year (2018) marks the 60th anniversary of the "central dogma", summarized as "DNA makes RNA makes protein", which was originally proposed by Francis Crick in 1958. Three years later, messenger RNA was ident...This year (2018) marks the 60th anniversary of the "central dogma", summarized as "DNA makes RNA makes protein", which was originally proposed by Francis Crick in 1958. Three years later, messenger RNA was identified as the template of protein synthesis. After 60 years of discovery, including discovery of the split nature of eukaryotic genes (Le., splicing), it becomes evident that messenger RNAs are not merely messengers, but a hub of co- and post-transcriptional regulation, which is fundamental to amplify the complexity encoded in the genome of higher eukaryotic organisms. The mature forms of RNA of protein-coding genes and their abundance have to be tightly regulated through multiple steps of sophisticated processing, including capping, splicing and polyadenylation. In addition, their function also critically depends on proper localization -- sometimes trafficking to the remote parts of the cell such as dendrites and axons of neurons -- and proper control of their stability. Furthermore, thousands of long and small noncoding RNAs are produced to play a wide range of roles in gene regulation. From our perspective, two overarching goals for RNA biology include (i) characterizing the spatial-temporal regulation of various RNA species and elucidating the underlying regulatory mechanisms; (ii) understanding the functional impact of such regulation on human physiology and disease.展开更多
基金supported by the following grants:the National Cancer Institute,USA(a part of the National Institutes of Health,USAGrant No.T32LM012424)to Yiling Elaine Chen+8 种基金the National Cancer Institute,USA(Grant No.K08CA201591)the Margaret E Early Medical Research Trust,USAthe Pediatric Cancer Research Foundation,USA to Leo David Wangthe National Cancer Institute under Cancer Center Support Grant,USA(Grant No.P30CA033572)to the MS facility at the City of Hopethe National Institute of General Medical Sciences,USA(a part of the National Institutes of Health,USAGrant Nos.R01GM120507 and R35GM140888)the National Science Foundation,USA(Grant Nos.DBI-1846216 and DMS-2113754)the Johnson&Johnson WiSTEM2D Award,USA,the Sloan Research Fellowship,USAthe UCLA David Geffen School of Medicine W.M.Keck Foundation Junior Faculty Award,USA,to Jingyi Jessica Li.
文摘Advances in mass spectrometry(MS)have enabled high-throughput analysis of proteomes in biological systems.The state-of-the-art MS data analysis relies on database search algorithms to quantify proteins by identifying peptide–spectrum matches(PSMs),which convert mass spectra to peptide sequences.Different database search algorithms use distinct search strategies and thus may identify unique PSMs.However,no existing approaches can aggregate all user-specified database search algorithms with a guaranteed increase in the number of identified peptides and a control on the false discovery rate(FDR).To fill in this gap,we proposed a statistical framework,Aggregation of Peptide Identification Results(APIR),that is universally compatible with all database search algorithms.Notably,under an FDR threshold,APIR is guaranteed to identify at least as many,if not more,peptides as individual database search algorithms do.Evaluation of APIR on a complex proteomics standard dataset showed that APIR outpowers individual database search algorithms and empirically controls the FDR.Real data studies showed that APIR can identify disease-related proteins and post-translational modifications missed by some individual database search algorithms.The APIR framework is easily extendable to aggregating discoveries made by multiple algorithms in other high-throughput biomedical data analysis,e.g.,differential gene expression analysis on RNA sequencing data.The APIR R package is available at https://github.com/yiling0210/APIR.
文摘This year (2018) marks the 60th anniversary of the "central dogma", summarized as "DNA makes RNA makes protein", which was originally proposed by Francis Crick in 1958. Three years later, messenger RNA was identified as the template of protein synthesis. After 60 years of discovery, including discovery of the split nature of eukaryotic genes (Le., splicing), it becomes evident that messenger RNAs are not merely messengers, but a hub of co- and post-transcriptional regulation, which is fundamental to amplify the complexity encoded in the genome of higher eukaryotic organisms. The mature forms of RNA of protein-coding genes and their abundance have to be tightly regulated through multiple steps of sophisticated processing, including capping, splicing and polyadenylation. In addition, their function also critically depends on proper localization -- sometimes trafficking to the remote parts of the cell such as dendrites and axons of neurons -- and proper control of their stability. Furthermore, thousands of long and small noncoding RNAs are produced to play a wide range of roles in gene regulation. From our perspective, two overarching goals for RNA biology include (i) characterizing the spatial-temporal regulation of various RNA species and elucidating the underlying regulatory mechanisms; (ii) understanding the functional impact of such regulation on human physiology and disease.