摘要
为了开发甘蓝EST-SSR功能性标记,从NCBI公共数据库检索到63201条甘蓝ESTs,通过前处理和聚类去冗余后得到全长为14828.60kb的无冗余ESTs 23306条。在这些序列中搜索出3085个SSRs,分布于2616条ESTs中,出现频率是13.24%,与甘蓝开花过程中相关的EST-SSRs出现的频率为6.97%。这些EST-SSRs的平均长度为23.11bp,平均分布频率是1/4.81kb。在1~6核苷酸重复的基元中,二核苷酸重复和三核苷酸重复是主要类型,二者出现的频率基本相近,占总SSR的79.09%。AG/CT和AAG/CTT是二、三核苷酸中的优势重复类型,分别占二、三核苷酸重复的79%和34%,对这些含SSR的ESTs进行BLASTx同源性比较分析,发现2616条EST-SSRs中1981条在数据库中可以找到同源物,占EST-SSRs总数的75.73%,功能已知的蛋白质中拟南芥来源的EST-SSRs所占比例最大,为66.02%,同时针对与甘蓝开花相关的EST-SSRs进行功能分析。找到同源物的1981条EST通过GOA进行功能分类,有987条可划分为生物过程、分子功能和细胞成分3大功能类群,其中与生物代谢相关的EST-SSRs数量最多,为263条,994条未被功能注释。
63 201 ESTs of cabbage(Brassica oleracea var.capitata)in the database of NCBI were downloaded and analysed,resulting in 23 306 non-redundant ESTs with total length about 14 828.60kb.Totally 3 085 SSRs distributed in 2 616 ESTs were detected,which accounting for 13.24% of the non-redundant ESTs.The average length and distribution distance of the EST-SSRs were about 23.11bp and 4.81kb respectively.Dinucleotide and trinucleotide repeats with similar frequency are the main types,accounting for 79.09% of all the SSRs.AG/CT and AAG/CTT are the most frequent motifs,accounting for 79% and 34% in dinucleotide and trinucletide repeats,respectively.In the 2 616 EST-SSRs,1 981 EST-SSRs can be found in the database through BLASTx homology analysis,accounting for 75.73%.Most of the homologous with known functions were from Arabidopsis thaliana,and accounts for 66.02%.Functional classification by GOA,1 981 ESTs can be divided into three major functional categories,biological processes,molecular functions and cellular components.It was found that 263 EST-SSRs were relevant to biological metabolism,and 994 of EST-SSRs were not functional annotationed.
出处
《中国农业大学学报》
CAS
CSCD
北大核心
2010年第6期34-41,共8页
Journal of China Agricultural University
基金
江苏省科技支撑计划项目(BE2008378)
国家"863"计划项目(2006AA100108)
关键词
甘蓝
EST
SSR
功能分析
cabbage(Brassica oleracea var.capitata)
express sequence tags(EST)
simple sequence repeat(SSR)
functional analysis