We report a significantly-enhanced bioinformatics suite and database for proteomics research called Yale Protein Expression Database(YPED) that is used by investigators at more than 300 institutions worldwide. YPED ...We report a significantly-enhanced bioinformatics suite and database for proteomics research called Yale Protein Expression Database(YPED) that is used by investigators at more than 300 institutions worldwide. YPED meets the data management, archival, and analysis needs of a high-throughput mass spectrometry-based proteomics research ranging from a singlelaboratory, group of laboratories within and beyond an institution, to the entire proteomics community. The current version is a significant improvement over the first version in that it contains new modules for liquid chromatography–tandem mass spectrometry(LC–MS/MS) database search results, label and label-free quantitative proteomic analysis, and several scoring outputs for phosphopeptide site localization. In addition, we have added both peptide and protein comparative analysis tools to enable pairwise analysis of distinct peptides/proteins in each sample and of overlapping peptides/proteins between all samples in multiple datasets. We have also implemented a targeted proteomics module for automated multiple reaction monitoring(MRM)/selective reaction monitoring(SRM) assay development. We have linked YPED's database search results and both label-based and label-free fold-change analysis to the Skyline Panorama repository for online spectra visualization. In addition, we have built enhanced functionality to curate peptide identifications into an MS/MS peptide spectral library for all of our protein database search identification results.展开更多
The massive flow of scholarly publications from traditional paper journals to online outlets has benefited biologists because of its ease to access. However, due to the sheer volume of available biological literature,...The massive flow of scholarly publications from traditional paper journals to online outlets has benefited biologists because of its ease to access. However, due to the sheer volume of available biological literature, researchers are finding it increasingly difficult to locate needed information. As a result, recent biology contests, notably JNLPBA and BioCreAtIvE, have focused on evaluating various methods in which the literature may be navigated. Among these methods, text-mining technology has shown the most promise. With recent advances in text-mining technology and the fact that publishers are now making the full texts of articles available in XML format, TMSs can be adapted to accelerate literature curation, maintain the integrity of information, and ensure proper linkage of data to other resources. Even so, several new challenges have emerged in relation to full text analysis, life-science terminology, complex relation extraction, and information fusion. These challenges must be overcome in order for text-mining to be more effective. In this paper, we identify the challenges, discuss how they might be overcome, and consider the resources that may be helpful in achieving that goal.展开更多
An ever-growing number of resources on model organisms have emerged with the continued development of sequencing technologies. In this paper, we review 13 databases of model organisms, most of which are reported by th...An ever-growing number of resources on model organisms have emerged with the continued development of sequencing technologies. In this paper, we review 13 databases of model organisms, most of which are reported by the National Institutes of Health of the United States(NIH; http://www.nih.gov/science/models/). We provide a brief description for each database, as well as detail its data source and types, functions, tools, and availability of access. In addition,we also provide a quality assessment about these databases. Significantly, the organism databases instituted in the early 1990s––such as the Mouse Genome Database(MGD), Saccharomyces Genome Database(SGD), and Fly Base––have developed into what are now comprehensive, core authority resources. Furthermore, all of the databases mentioned here update continually according to user feedback and with advancing technologies.展开更多
基金supported in part by the National Institutes of Health of the United States(Grant Nos.UL1 RR024139 to Yale Clinical and Translational Science Award,1S10OD018034-01 to 6500 QTrap Mass Spectrometer for Yale University,1S10RR026707-01 to 5500QTrap Mass Spectrometer for Yale University,P30DA018343 to Yale/NIDA Neuroproteomics Center and NIDDK-K01DK089006 awarded to JR)
文摘We report a significantly-enhanced bioinformatics suite and database for proteomics research called Yale Protein Expression Database(YPED) that is used by investigators at more than 300 institutions worldwide. YPED meets the data management, archival, and analysis needs of a high-throughput mass spectrometry-based proteomics research ranging from a singlelaboratory, group of laboratories within and beyond an institution, to the entire proteomics community. The current version is a significant improvement over the first version in that it contains new modules for liquid chromatography–tandem mass spectrometry(LC–MS/MS) database search results, label and label-free quantitative proteomic analysis, and several scoring outputs for phosphopeptide site localization. In addition, we have added both peptide and protein comparative analysis tools to enable pairwise analysis of distinct peptides/proteins in each sample and of overlapping peptides/proteins between all samples in multiple datasets. We have also implemented a targeted proteomics module for automated multiple reaction monitoring(MRM)/selective reaction monitoring(SRM) assay development. We have linked YPED's database search results and both label-based and label-free fold-change analysis to the Skyline Panorama repository for online spectra visualization. In addition, we have built enhanced functionality to curate peptide identifications into an MS/MS peptide spectral library for all of our protein database search identification results.
基金supported by the "National Science Council" under Grant Nos. NSC 97-2218-E-155-001 and NSC96-2752-E-001-001-PAEthe Research Center for Humanities and Social Sciencesthe Thematic Program of "Academia Sinica" under Grant No.AS95ASIA02
文摘The massive flow of scholarly publications from traditional paper journals to online outlets has benefited biologists because of its ease to access. However, due to the sheer volume of available biological literature, researchers are finding it increasingly difficult to locate needed information. As a result, recent biology contests, notably JNLPBA and BioCreAtIvE, have focused on evaluating various methods in which the literature may be navigated. Among these methods, text-mining technology has shown the most promise. With recent advances in text-mining technology and the fact that publishers are now making the full texts of articles available in XML format, TMSs can be adapted to accelerate literature curation, maintain the integrity of information, and ensure proper linkage of data to other resources. Even so, several new challenges have emerged in relation to full text analysis, life-science terminology, complex relation extraction, and information fusion. These challenges must be overcome in order for text-mining to be more effective. In this paper, we identify the challenges, discuss how they might be overcome, and consider the resources that may be helpful in achieving that goal.
基金supported by the Strategic Priority Research Program of the Chinese Academy of Sciences of China(Grant No.XDB13040500)
文摘An ever-growing number of resources on model organisms have emerged with the continued development of sequencing technologies. In this paper, we review 13 databases of model organisms, most of which are reported by the National Institutes of Health of the United States(NIH; http://www.nih.gov/science/models/). We provide a brief description for each database, as well as detail its data source and types, functions, tools, and availability of access. In addition,we also provide a quality assessment about these databases. Significantly, the organism databases instituted in the early 1990s––such as the Mouse Genome Database(MGD), Saccharomyces Genome Database(SGD), and Fly Base––have developed into what are now comprehensive, core authority resources. Furthermore, all of the databases mentioned here update continually according to user feedback and with advancing technologies.