摘要
RNA-seq技术能够全面快速地获得物种在某一状态下的转录本序列信息,但测序并组装后的大量Unigene往往不包含完整ORF(Open reading frame)。转录组库具有一定的冗余性,存在着属于同一个转录本的Unigene,这些Unigene因为无重叠区不能拼接而存在转录组库中。基于这种情况,为了拼接铵转运蛋白家族Unigene,首先挑选注释为AMT(Ammonium transporter)且ORF不完整的所有Unigene(5条),通过分析Unigene在4个转录组的表达模式,其中2条Unigene(Uni4和Uni5)具有相同的表达模式,推测可能来自同一转录本。然后通过NCBI blastx将这2条Unigene与参考物种的AMT蛋白质比对,确定其在转录本的位置及序列相互间没有交叠(如果两条编码序列相互交叠则不能组成同一个转录本)。结果发现Uni4和Uni5分别位于参考转录本5′端和3′端位置,因此假定它们属于同一个转录本,中间空缺约120 bp未知序列。通过试验验证,分别在Uni4和Uni5上设计单正向引物和单反向引物,PCR扩增得到约800 bp片段,将其测序并与两条Unigene比对,证实Uni4和Uni5属于同一转录本且获得了缺失的未知序列。最终拼接得到1 667 bp序列,包含1 482 bp完整ORF,编码494个氨基酸,通过系统进化分析将其归类为amt1亚家族,命名为Seamt1。生物信息学手段预测Se AMT1蛋白与已知的其他物种AMT性质相似。本研究采用转录组Unigene表达模式聚类的方法挖掘潜在的同一转录本Unigene,并且通过另外两组Unigene检验了该方法的可行性。这一便捷方法有助于转录组中Unigene的延伸和拼接,有助于完整ORF的获得及后期基因功能研究。
RNA-seq can help us quickly obtain the whole transcriptome sequences of species under different conditions. Many Unigenes that are assembled by raw reads always do not contain complete open reading frame(ORF). In addition, it also has some redundancy in transcriptome library. Some Unigenes in the library, although belong to one transcript, cannot be assembled without overlapping. We found five incomplete Unigenes annotated ammonium transporter(AMT) from Salicornia europaea transcriptome, in which two Unigenes(Uni4 and Uni5) had identical expression patterns across four transcriptomes. The two Unigenes may come from one transcript. Analyzing the Unigene position of transcript by NCBI blastx, we found that Uni4 and Uni5 respectively located in 5′ end and 3′ end compared with the reference transcript, and an unknown gap of 120 bp may exist in a hypothetic transcript to which Uni4 and Uni5 both belong. To verify the hypothesis, single forward primer and single reverse primers were respectively designed on Uni4 and Uni5, and a fragment with about 800 bp was generated by PCR. Then it was sequenced and aligned with Uni4 and Uni5. Finally, we assembled a sequence with 1 667 bp, which contains a complete ORF(1 482 bp, coding 494 amino acids). It belongs to amt1 subfamily and was named Seamt1 via the phylogenetic analysis. It was pointed by bioinformatics tools that Se AMT1 protein conformed to the AMT characteristics of other species. This work clustered expression pattern to explore the Unigenes of one transcript, and the feasibility of this method was validated through the other two groups of Unigenes. The handy method will benefit extension and assembling of Unigene in transcriptome, it also helps achieve the complete ORF and gene function.
出处
《生物工程学报》
CAS
CSCD
北大核心
2014年第11期1763-1773,共11页
Chinese Journal of Biotechnology
基金
国家自然科学基金(No.31270660)
新疆杰出青年科技人才培养项目(No.2013711018)资助~~