摘要
以DNA微阵列、二维电泳、二维高压液相色谱和质谱等技术为代表的转录组和蛋白质组高通量实验技术,能够产生海量的基因或蛋白质数据,对这些基因或蛋白质的注释是对相关数据进行后期处理的基础和必要条件.海量数据的注释人工难以完成,而目前基因和蛋白质的批量注释网站给出的注释又往往不够全面.在比较常用基因和蛋白质的批量注释网站的基础上,本工作研发了基因和蛋白质的批量注释系统UBROAD(Unified Batch Retriever ofAnnotation Data),该系统整合了NCBI、Swiss-Prot、BIND、enzyme-expasy、gene2accession和gene2Unigene 6个有关基因和蛋白质的数据源;支持Uniprot/trEMBL AC、Uniprot Entryname、Genbank Protein Accession Number、Genbank mRNA gi、Genbank mRNA Accession Number、Gene name、Gene ID和Unigene ID 8种登录号混合查询;含有各种登录号以及基因或蛋白质的基本信息、功能分类、相互作用共38项注释项供选择;提供微软电子表格形式的注释结果.可以通过访问网页http://www.bioscience.org.cn/ubroad免费使用该系统.
High throughput experiments like microarray or protein mass spectrometry can produce huge amount of data efficiently. These data, after some basic preprocessing, are often presented in the form of a list of gene/protein identifiers accompanied by (semi) quantitative experimental data such as expression profiles. It is necessary to obtain detailed annotation information about these genes/proteins to further analyze these data and to extract biological meanings. We developed a web-based platform UBROAD (Unified Batch Retriever of Annotation Data) to facilitate quick and efficient retrieval of annotation data for genes and proteins from various sources. UBROAD integrates several biological data sources including NCBI, Swiss-Prot, BIND, Enzyme, genetounigene and genetoaccession, and supports mixed searches of eight types of identifiers including Uniprot/ trEMBL AC, Uniprot Entryname, Genbank Protein Accession Number, Genbank mRNA gi, Genbank mRNA Accession Number, Gene name, Gene ID and Unigene ID. The output file includes 38 annotation items downloadable in the form of Microsoft Excel. UBROAD is freely available at http : Ilwww. bioscience, org. cn/ ubroad.
出处
《上海大学学报(自然科学版)》
CAS
CSCD
北大核心
2007年第1期99-104,110,共7页
Journal of Shanghai University:Natural Science Edition
关键词
生物信息学
数据库整合
批量注释
bioinformatics
database integrated
batched annotation