摘要
重复序列是真核生物基因组的重要组成成分,根据其序列特征及在基因组中的存在形式,可以进一步分为串联重复、片段重复和散在重复。其中,散在重复大多起源于转座子。根据转座介质的不同,转座子又可分为DNA和逆转录转座子。转座子的转座和扩增对基因的进化和基因组的稳定具有显著的影响;同时与其他类型的重复序列相比,转座子的结构和分类更为复杂多样,使得对转座子的鉴定和分类更为复杂和困难。鉴于此,文章简要概括了转座子的功能及分类,总结了真核生物转座子鉴定、分类和注释的3个步骤:(1)重复序列库的构建;(2)重复序列的校正和分类;(3)基因组注释。着重介绍了每一步骤所采用的不同计算方法,比较了不同方法的优缺点。只有把多种方法结合起来使用才能实现全基因组转座子的精确鉴定、分类和注释,这将为转座子的全基因组鉴定和分类提供借鉴意义。
Repetitive sequences(repeats) represent a significant fraction of the eukaryotic genomes and can be divided into tandem repeats,segmental duplications,and interspersed repeats on the basis of their sequence characteristics and how they are formed.Most interspersed repeats are derived from transposable elements(TEs).Eukaryotic TEs have been subdivided into two major classes according to the intermediate they use to move.The transposition and amplification of TEs have a great impact on the evolution of genes and the stability of genomes.However,identification and classification of TEs are complex and difficult due to the fact that their structure and classification are complex and diverse compared with those of other types of repeats.Here,we briefly introduced the function and classification of TEs,and summarized three different steps for identification,classification and annotation of TEs in eukaryotic genomes:(1) assembly of a repeat library,(2) repeat correction and classification,and(3) genome annotation.The existing computational approaches for each step were summarized and the advantages and disadvantages of the approaches were also highlighted in this review.To accurately identify,classify,and annotate the TEs in eukaryotic genomes requires combined methods.This review provides useful information for biologists who are not familiar with these approaches to find their way through the forest of programs
出处
《遗传》
CAS
CSCD
北大核心
2012年第8期1009-1019,共11页
Hereditas(Beijing)
基金
西南大学研究生科技创新基金项目(优博项目)(编号:kb2010106)资助
关键词
真核生物
重复序列
转座子
鉴定
分类
eukaryotic genome
repeats
transposable elements
identification
classification