Simple sequence repeats(SSRs) defined as sequence repeat units between 1 and 6 bp occur abundantly in both coding and non-coding regions in eukaryotic genomes and these repeats can affect gene expression. In this st...Simple sequence repeats(SSRs) defined as sequence repeat units between 1 and 6 bp occur abundantly in both coding and non-coding regions in eukaryotic genomes and these repeats can affect gene expression. In this study, ESTs(expressed sequence tags) of Betula pendula(silver birch) were analyzed for in silico mining of ESTSSRs, protein annotation, open reading frames(ORFs),designing primers, and identifying codon repetitions. In B.pendula, the frequency of ESTs containing SSRs was 7.8 %with an average of 1SSR/4. 78 kb of EST sequences. A total of 188 SSRs was identified by using MISA software and dinucleotide SSR motifs(65.9 %) were found to be the most abundant type of repeat motif followed by tri-(27.1 %),tetra-(4.8 %), and penta-(2.2 %) motifs. Based on ORF analysis, 175 of 178 sequences were predicted as ORFs and the most frequent SSRs were detected in 50 UTR(58.43 %),followed by in ORF(31.46 %) and in 30UTR(8.43 %). 102 of 178 ESTs were annotated as ribosomal protein, transport protein, membrane protein, carrier protein, binding protein,and transferase protein. For a total of 102 SSRs(57.3 %)with significant matches, a set of 102 primers(100 %) with forward and reverse strands was designed by using Primer 3 software. Serine(Ser, 19.6 %) was predominant in putative encoded amino acids and most of amino acids showed nonpolar(35.3 %) nature. Our data provide resources for B.pendula and can be useful for in silico comparative analyses of Betulaceae species, including SSR mining.展开更多
文摘Simple sequence repeats(SSRs) defined as sequence repeat units between 1 and 6 bp occur abundantly in both coding and non-coding regions in eukaryotic genomes and these repeats can affect gene expression. In this study, ESTs(expressed sequence tags) of Betula pendula(silver birch) were analyzed for in silico mining of ESTSSRs, protein annotation, open reading frames(ORFs),designing primers, and identifying codon repetitions. In B.pendula, the frequency of ESTs containing SSRs was 7.8 %with an average of 1SSR/4. 78 kb of EST sequences. A total of 188 SSRs was identified by using MISA software and dinucleotide SSR motifs(65.9 %) were found to be the most abundant type of repeat motif followed by tri-(27.1 %),tetra-(4.8 %), and penta-(2.2 %) motifs. Based on ORF analysis, 175 of 178 sequences were predicted as ORFs and the most frequent SSRs were detected in 50 UTR(58.43 %),followed by in ORF(31.46 %) and in 30UTR(8.43 %). 102 of 178 ESTs were annotated as ribosomal protein, transport protein, membrane protein, carrier protein, binding protein,and transferase protein. For a total of 102 SSRs(57.3 %)with significant matches, a set of 102 primers(100 %) with forward and reverse strands was designed by using Primer 3 software. Serine(Ser, 19.6 %) was predominant in putative encoded amino acids and most of amino acids showed nonpolar(35.3 %) nature. Our data provide resources for B.pendula and can be useful for in silico comparative analyses of Betulaceae species, including SSR mining.