摘要
规模化基因表达实验所产生的大量与生物组织特定时空状态相关的cDNA和表达序列标签(EST)等信息可用于新基因的发现、基因表达模式分析和基因组的注释,从而可为转录组研究提供实验设计和结果分析的参考标准。真核基因可变剪接的普遍性及其在机体生理与病理过程中的重要作用,使得可变剪接的系统分析已成为功能基因组研究中的热点之一。在面临海量表达数据的指数增长和不断有新的基因组获得测序的情况下,实现转录组序列分析的规模化、自动化计算迫在眉睫。讨论不同转录组分析系统中的数据分析算法及其计算需求,并提出适用于大规模可变剪接分析的策略。
Experiments on transcriptome analysis have resulted huge genes expression data related with specific temporal and spatial information. These data can be used in new genes identification, analysis of genes express patterns and the annotation of genomes, which may provide the reference standard for experiment design and result analysis of transcriptome experiments. Since the altemative splicing of eukaryotic genes have found to be universal and play an important role in physiology and pathology, systematic analysis of alternative sphcing is becoming a new hotspot of functional genome research. Facing the immense and exponential increase of experimental express data and more new genomes getting sequenced, there is exigent of the strategy which can handle transcriptome sequences in large scale and automatic way. We elucidate the algorithms, the computing requirements and programs in different transeriptome sequences analysis systems and propose a strategy more suitable for large scale analysis of alternative splicing.
出处
《国防科技大学学报》
EI
CAS
CSCD
北大核心
2006年第4期37-42,共6页
Journal of National University of Defense Technology
基金
国家并行与分布处理国防重点实验室基金资助项目(51484050304JB4401)
军事医学科学院科技创新启动基金资助项目(04010010402013)
关键词
转录组
EST聚类
EST装配
可变剪接
高性能计算
transcfiptome
EST clustering
EST assembly
alternative splicing
high performance computing