期刊文献+

基于R语言的基因表达芯片注释流程 被引量:1

An R workflow for annotation of gene expression microarray
下载PDF
导出
摘要 基于R语言,将R程序包Rsubread、Rsamtools、refGenome和GenomicRanges整合为一个完整的流程,实现了基因表达芯片探针序列的自主注释。以应用范围最广的GPL570,GPL10558和曾使用的GPL21163芯片平台为测试数据进行重注释,并将GPL570的新注释与现存的注释做比较;对较新的长链非编码RNA表达芯片GPL16956进行自主注释,以测试流程的实用性。结果表明:GPL570的自主注释覆盖到了89.58%的探针,GPL10558、GPL21163和GPL16956的自主注释分别覆盖到了81.54%、84.68%和76.15%的探针。在GPL570新注释单独比对到的7107个基因中,有411个编码蛋白的基因能够富集到GO条目,而另外两种注释未能比对到这些基因,证明了本流程的可靠性和先进性。因此,本流程实用、有效,为数据挖掘工作提供了新的有力工具。 Based on the R language,the packages Rsubread,Rsamtools,refGenome,and GenomicRanges are integrated into a complete workflow to realize the self⁃annotation of the microarray gene expression.The most widely applied chip platform GPL570,GPL10558 and GPL21163 used as re⁃annotating datasets and the new annotation of GPL570 is compared with existing one.Self⁃annotation of the relatively new lincRNA expression chip GPL16956 is accomplished to test the practicality of the workflow.The annotation coverage rate of GPL570 was 89.58%whereas the rate of GPL10558,GPL21163 and GPL16956 were 81.54%,84.68%and 76.15%.Among the unique 7107 genes in this workflow,411 protein⁃coding gene were enriched to GO terms whereas the other two existing annotations could not,indicating the reliability and advancement of this study.Therefore,this workflow is practical and effective,and provides a new powerful tool for data mining.
作者 孙小洁 郑方强 曾健明 SUN Xiaojie;ZHENG Fangqiang;ZENG Jianming(College of Plant Protection,Shandong Agricultural University,Tai′an 271018,China;Zhuhai Jianming Biomedical Technology Co.,Ltd.,Zhuhai 519000,China)
出处 《生物加工过程》 CAS 2021年第1期17-22,共6页 Chinese Journal of Bioprocess Engineering
关键词 基因表达芯片(GEO) 数据挖掘 R语言 gene expression microarray(GEO) data mining R langrage
  • 相关文献

参考文献1

二级参考文献15

  • 1Ron E, Alex L. The gene expression omnibus (GEO) : a gene expression and hybridization reository. The NCBI Handbook, 2003(6) :1 -17
  • 2Edgar R,Domrachev M,Lash A E. Gene expression omnibus: NCBI gene expression and hybridization array data repository . Nucleic Acids Research, 2002,30( 1 ) :207 -210
  • 3Brazma A, Hingamp P, Quackenbush J, et al. Minimum information about a microarray experiment (MIAME) -toward standards for microarray data. Nature Genet, 2001,29 (4) :365-371
  • 4Ball C, Brazma A, Causton H, et al. Microarray data standards: An open letter. Environ Health Perspect, 2004,112 ( 12 ) : A666 - A667
  • 5Spellman P T, Miller M, Stewart J, et al. Design and implementation of microarray gene expression markup language ( MAGE-ML). Genome Biology, 2002,3 ( 9 ) : research0046.1 -0046.9
  • 6Schuler G D, Epstein J A, Ohkawa H, et al. Entrez: molecular biology database and retrieval system. Methods Enzymol, 1996, 266:141 - 162
  • 7Tanya B, Tugba O S, Dennis B T, et al. NCBI GEO: mining millions of expression profiles--database and tools. Nucleic Acids Research, 2005,33( Database issue) :562 -566
  • 8Altschul S F, Gish W, Miller W, et al. Basic local alignment search tool. J Mol Biol,1990,215:403 -410
  • 9Wheeler D L, Church D M, Edgar R, et al. Database resources of the National Center for Biotechnology Information:update. Nucleic Acids Res, 2004,32 ( Database issue) : 35 - 40
  • 10Tasheva E S,Ke A,Conrad G W. Analysis of the expression of chondroadherin in mouse ocular and non-ocular tissues. Mol Vis, 2004,10:544 -554

共引文献17

同被引文献2

引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部