本研究利用MISA软件筛选火龙果转录组测序获得的108 127条Unigenes,共检测出7 622个EST-SSR位点,其发生频率为6.02%,平均每9.00 kb出现1个位点。单核苷酸重复类型占优势,占总EST-SSR位点的56.59%,其次是二核苷酸和三核苷酸,分别占28.4%...本研究利用MISA软件筛选火龙果转录组测序获得的108 127条Unigenes,共检测出7 622个EST-SSR位点,其发生频率为6.02%,平均每9.00 kb出现1个位点。单核苷酸重复类型占优势,占总EST-SSR位点的56.59%,其次是二核苷酸和三核苷酸,分别占28.4%和14.04%;其它特性重复类型数量较少,所占比例均不足1%。二核苷酸重复基元类型中以AG/CT、AC/GT为优势重复基元,分别占总SSR位点数目的25.37%和2.02%;三核苷酸重复基元类型以AAG/CCT为主,占总SSR位点数目的 3.13%。设计合成125对EST-SSR引物,并随机选取8份形态学差异明显的火龙果种质提取基因组DNA,进行PCR扩增,采用琼脂糖凝胶电泳和10%聚丙烯变性凝胶电泳检测方法对引物进行初步检测,筛选出32对扩增条带锐利清晰的引物。选取38份火龙果种质对筛选出的引物进行多态性检测,获得16对多态性较好的引物,共扩增出47个多态性位点,多态信息含量(polymorphism information content, PIC)范围为0.243~0.667,平均多态性信息含量(polymorphism information content, PIC)为0.459,平均观测等位基因数(number of alleles, Na)为3,平均香农信息指数(Shannon's information index, I)为0.891;利用引物C31931、C13719和C32141等8种引物组合可以有效区分38份火龙果种质,构建其DNA的EST-SSR指纹图谱;UPGMA聚类分析,以0.62为阈值,可将38份火龙果种质分为3类:第一类包括红肉与粉红肉种质,第二类为白肉种质,第三类为蛇鞭柱属的2个种质。本研究基于火龙果转录组测序序列开发了一批具有高度多态性潜力的SSR引物,该引物可有效地将38份火龙果种质区分开来。因此,基于火龙果转录组测序开发的EST-SSR标记,可为火龙果种质鉴定、亲缘关系分析及遗传图谱构建等提供更丰富的标记来源。展开更多
De novo transcriptome assembly is an important approach in RNA-Seq data analysis and it can help us to reconstruct the transcriptome and investigate gene expression profiles without reference genome sequences.We carri...De novo transcriptome assembly is an important approach in RNA-Seq data analysis and it can help us to reconstruct the transcriptome and investigate gene expression profiles without reference genome sequences.We carried out transcriptome assemblies with two RNA-Seq datasets generated from human brain and cell line,respectively.We then determined an efficient way to yield an optimal overall assembly using three different strategies.We first assembled brain and cell line transcriptome using a single k-mer length.Next we tested a range of values of k-mer length and coverage cutoff in assembling.Lastly,we combined the assembled contigs from a range of k values to generate a final assembly.By comparing these assembly results,we found that using only one k-mer value for assembly is not enough to generate good assembly results,but combining the contigs from different k-mer values could yield longer contigs and greatly improve the overall assembly.展开更多
Transcriptome reconstruction is an important application of RNA-Seq,providing critical information for further analysis of transcriptome.Although RNA-Seq offers the potential to identify the whole picture of transcrip...Transcriptome reconstruction is an important application of RNA-Seq,providing critical information for further analysis of transcriptome.Although RNA-Seq offers the potential to identify the whole picture of transcriptome,it still presents special challenges.To handle these difficulties and reconstruct transcriptome as completely as possible,current computational approaches mainly employ two strategies:de novo assembly and genome-guided assembly.In order to find the similarities and differences between them,we firstly chose five representative assemblers belonging to the two classes respectively,and then investigated and compared their algorithm features in theory and real performances in practice.We found that all the methods can be reduced to graph reduction problems,yet they have different conceptual and practical implementations,thus each assembly method has its specific advantages and disadvantages,performing worse than others in certain aspects while outperforming others in anther aspects at the same time.Finally we merged assemblies of the five assemblers and obtained a much better assembly.Additionally we evaluated an assembler using genome-guided de novo assembly approach,and achieved good performance.Based on these results,we suggest that to obtain a comprehensive set of recovered transcripts,it is better to use a combination of de novo assembly and genome-guided assembly.展开更多
A total of 8375 genic simple sequence repeat(SSR) loci were discovered from a unigene set assembled from 116282 transcriptomic unigenes in this study.Dinucleotide repeat motifs were the most common with a frequency ...A total of 8375 genic simple sequence repeat(SSR) loci were discovered from a unigene set assembled from 116282 transcriptomic unigenes in this study.Dinucleotide repeat motifs were the most common with a frequency of 65.11%,followed by trinucleotide(32.81%).A total of 4100 primer pairs were designed from the SSR loci.Of these,343 primer pairs(repeat length≥15 bp) were synthesized with an M13 tail and tested for stable amplification and polymorphism in four Pyrus accessions.After the preliminary test,104 polymorphic genic SSR markers were developed; dinucleotide and trinucleotide repeats represented 97.11%(101) of these.Twenty-eight polymorphic genic SSR markers were selected randomly to further validate genetic diversity among 28 Pyrus accessions.These markers displayed a high level of polymorphism.The number of alleles at these SSR loci ranged from 2 to 17,with a mean of 9.43 alleles per locus,and the polymorphism information content(PIC) values ranged from 0.26 to 0.91.The UPGMA(unweighted pair-group method with arithmetic average) cluster analysis grouped the 28 Pyrus accessions into two groups: Oriental pears and Occidental pears,which are congruent to the traditional taxonomy,demonstrating their effectiveness in analyzing Pyrus phylogenetic relationships,enriching rare Pyrus EST-SSR resources,and confirming the potential value of a pear transcriptome database for the development of new SSR markers.展开更多
文摘本研究利用MISA软件筛选火龙果转录组测序获得的108 127条Unigenes,共检测出7 622个EST-SSR位点,其发生频率为6.02%,平均每9.00 kb出现1个位点。单核苷酸重复类型占优势,占总EST-SSR位点的56.59%,其次是二核苷酸和三核苷酸,分别占28.4%和14.04%;其它特性重复类型数量较少,所占比例均不足1%。二核苷酸重复基元类型中以AG/CT、AC/GT为优势重复基元,分别占总SSR位点数目的25.37%和2.02%;三核苷酸重复基元类型以AAG/CCT为主,占总SSR位点数目的 3.13%。设计合成125对EST-SSR引物,并随机选取8份形态学差异明显的火龙果种质提取基因组DNA,进行PCR扩增,采用琼脂糖凝胶电泳和10%聚丙烯变性凝胶电泳检测方法对引物进行初步检测,筛选出32对扩增条带锐利清晰的引物。选取38份火龙果种质对筛选出的引物进行多态性检测,获得16对多态性较好的引物,共扩增出47个多态性位点,多态信息含量(polymorphism information content, PIC)范围为0.243~0.667,平均多态性信息含量(polymorphism information content, PIC)为0.459,平均观测等位基因数(number of alleles, Na)为3,平均香农信息指数(Shannon's information index, I)为0.891;利用引物C31931、C13719和C32141等8种引物组合可以有效区分38份火龙果种质,构建其DNA的EST-SSR指纹图谱;UPGMA聚类分析,以0.62为阈值,可将38份火龙果种质分为3类:第一类包括红肉与粉红肉种质,第二类为白肉种质,第三类为蛇鞭柱属的2个种质。本研究基于火龙果转录组测序序列开发了一批具有高度多态性潜力的SSR引物,该引物可有效地将38份火龙果种质区分开来。因此,基于火龙果转录组测序开发的EST-SSR标记,可为火龙果种质鉴定、亲缘关系分析及遗传图谱构建等提供更丰富的标记来源。
基金supported by the National Basic Research Program of China (Grant Nos. 2010CB945401, 2007CB108800)National Natural Science Foundation of China (Grant Nos. 30870575, 31071162,31000590)the Science and Technology Commission of Shanghai Municipality (Grant No. 11DZ2260300)
文摘De novo transcriptome assembly is an important approach in RNA-Seq data analysis and it can help us to reconstruct the transcriptome and investigate gene expression profiles without reference genome sequences.We carried out transcriptome assemblies with two RNA-Seq datasets generated from human brain and cell line,respectively.We then determined an efficient way to yield an optimal overall assembly using three different strategies.We first assembled brain and cell line transcriptome using a single k-mer length.Next we tested a range of values of k-mer length and coverage cutoff in assembling.Lastly,we combined the assembled contigs from a range of k values to generate a final assembly.By comparing these assembly results,we found that using only one k-mer value for assembly is not enough to generate good assembly results,but combining the contigs from different k-mer values could yield longer contigs and greatly improve the overall assembly.
基金supported by the National Basic Research Program of China (2010CB945401)the National Natural Science Foundation of China (31240038, 31171264, 31071162, 31000590)the Science and Technology Commission of Shanghai Municipality (11DZ2260300)
文摘Transcriptome reconstruction is an important application of RNA-Seq,providing critical information for further analysis of transcriptome.Although RNA-Seq offers the potential to identify the whole picture of transcriptome,it still presents special challenges.To handle these difficulties and reconstruct transcriptome as completely as possible,current computational approaches mainly employ two strategies:de novo assembly and genome-guided assembly.In order to find the similarities and differences between them,we firstly chose five representative assemblers belonging to the two classes respectively,and then investigated and compared their algorithm features in theory and real performances in practice.We found that all the methods can be reduced to graph reduction problems,yet they have different conceptual and practical implementations,thus each assembly method has its specific advantages and disadvantages,performing worse than others in certain aspects while outperforming others in anther aspects at the same time.Finally we merged assemblies of the five assemblers and obtained a much better assembly.Additionally we evaluated an assembler using genome-guided de novo assembly approach,and achieved good performance.Based on these results,we suggest that to obtain a comprehensive set of recovered transcripts,it is better to use a combination of de novo assembly and genome-guided assembly.
基金supported by the National Natural Science Foundation of China(No.31201592)the Modern Agro-industry Technology Research System(No.nycytx-29-14)the Doctoral Program of Higher Education(No.20110101110091),China
文摘A total of 8375 genic simple sequence repeat(SSR) loci were discovered from a unigene set assembled from 116282 transcriptomic unigenes in this study.Dinucleotide repeat motifs were the most common with a frequency of 65.11%,followed by trinucleotide(32.81%).A total of 4100 primer pairs were designed from the SSR loci.Of these,343 primer pairs(repeat length≥15 bp) were synthesized with an M13 tail and tested for stable amplification and polymorphism in four Pyrus accessions.After the preliminary test,104 polymorphic genic SSR markers were developed; dinucleotide and trinucleotide repeats represented 97.11%(101) of these.Twenty-eight polymorphic genic SSR markers were selected randomly to further validate genetic diversity among 28 Pyrus accessions.These markers displayed a high level of polymorphism.The number of alleles at these SSR loci ranged from 2 to 17,with a mean of 9.43 alleles per locus,and the polymorphism information content(PIC) values ranged from 0.26 to 0.91.The UPGMA(unweighted pair-group method with arithmetic average) cluster analysis grouped the 28 Pyrus accessions into two groups: Oriental pears and Occidental pears,which are congruent to the traditional taxonomy,demonstrating their effectiveness in analyzing Pyrus phylogenetic relationships,enriching rare Pyrus EST-SSR resources,and confirming the potential value of a pear transcriptome database for the development of new SSR markers.