期刊文献+

Vector cleaner:一种新的去除测序目的基因载体序列的方法 被引量:2

Vector cleaner: a novel method for vector sequence removal
下载PDF
导出
摘要 Sanger测序法测序目的基因常包含有目的基因和载体序列,为了快速去除测序目的基因载体序列,提出了一种新的目的基因载体序列去除方法并开发了程序Vector cleaner。首先利用该程序批量读取引物信息和目的基因测序序列;其次,程序在所读取的引物序列上建立引物半长的滑动窗口来产生种子,通过计数种子与测序序列的匹配次数,定位引物位置和删除引物两侧的载体序列;最后,程序通过比较上游引物序列和其反向互补序列分别与测序序列匹配种子数,判断和转换正义链。使用Vector cleaner对12条GhVIN1基因测序序列进行去载体测试,并与Seqclean和SeqMan软件相比较。结果表明:Vector cleaner能有效去除棉花GhVIN1基因测序载体序列,识别并翻译反义链序列。与Seqclean和SeqMan软件相比较,Vector cleaner正确率高,敏感性强。Vector cleaner、SeqMan和Seqclean所测试序列的总序列数正确率分别为100%、100%和91.6%,总碱基正确率分别为99.90%、99.00%和94.33%。与同类软件比较,Vector cleaner更适合实验人员批量去除测序目的基因载体序列,具有准确率高、敏感性强、自动翻译反义链的特点。 Sequenced target genes produced by automated Sanger sequencing machines frequently contain fragments of the vector sequences.Hence,to remove vector sequence in sequenced target gene and translate the antisense strand sequence,a novel method was proposed and a small software,Vector cleaner,was developed using Perl language.The key feature of Vector cleaner is that it can remove vector sequences in batch processes and translate the antisense strand sequence to sense strand sequence. Vector cleaner,works in three steps.First,Vector cleaner reads primers information and target gene sequencing information from input files.Second,a sliding window of half length of primers at every base was set in primers to generate seeds.The seeds are used to scan the target gene sequence to find the perfect matching.In this phase,Vector cleaner could identify the primer and remove vector sequences flanking the primers.Third,Vector cleaner detects the sense strand sequence by comparing the seeds matching times in slide window of the upstream primer and its reverse complement sequences.In this study,the proposed method was compared to softwares,SeqMan and Seqclean with similar function,using 12 sequencing results of the cotton gene GhVIN1.12 sequences were amplified from Gossypium arboreum cv.JLZM and Gossypium raimondii.The cDNA fragments were cloned into the pMD19-T vector and sequenced.Seqclean is a software based on NCBI's UniVec database and run in default parameters to screen vector.SeqMan imported plasmids pMD19-T sequences and run in default parameters.The results of Vector cleaner,SeqMan and Seqclean were analysed using multiple sequence alignment software Clustal X.The results showed that Vector cleaner successfully removed the vector sequences of cotton gene GhVIN1 and exported the detail results including primer information,product size and target gene sequence to an excel file.Sequences of GhVIN1-1,GhVIN1-2,GhVIN1-3,GhVIN1-4,GhVIN1-7,GhVIN1-8,GhVIN1-10,GhVIN1-12 were detected to be antisense strand sequences and automatically be translated into sense strand sequences.GhVIN1-2 with 2 bases mismatch in primers can also be identified and corrected.Compared with Seqclean and SeqMan,Vector cleaner has a higher accuracy and sensitivity.The rate of correct sequences cleaned by Vector cleaner,SeqMan and Seqclean was 100%,100% and 91. 6% respectively and the rate of correct nucleotide bases obtained by Vector cleaner,SeqMan and Seqclean was 99.90%,99.00% and 94.33%,meaning SeqMan and Seqclean has more nucleotide bases bias.Thus,Vector cleaner is a highly optimized software in vector sequence removal for gene cloning.It outperforms other traditional software in terms of accuracy,its function for translating antisense strand sequence and it tackles the weaknesses of traditional Vector cleaner requiring vector sequences.
作者 赵汀 周宝良
出处 《南京农业大学学报》 CAS CSCD 北大核心 2014年第4期9-14,共6页 Journal of Nanjing Agricultural University
基金 国家973计划项目(2011CB109300)
关键词 目的基因测序序列 载体序列去除 VECTOR CLEANER PERL语言 target gene sequencing vector sequence removing Vector cleaner Perl language
  • 相关文献

参考文献13

  • 1White J R, Roberts M, Yorke J A, et al.Figaro:a novel statistical method for vector sequence removal[J].Bioinformatics, 2008, 24(4):462-467.
  • 2Falgueras J, Lara A J, Fernández-Pozo N, et al.SeqTrim:a high-throughput pipeline for pre-processing any type of sequence read[J].BMC Bioinformatics, 2010, 11(1):38.
  • 3Chou H H, Holmes M H.DNA sequence quality trimming and vector removal[J].Bioinformatics, 2001, 17(12):1093-1104.
  • 4Pertea G, Huang X, Liang F, et al.TIGR gene indices clustering tools(TGICL):a software system for fast clustering of large EST datasets[J].Bioinformatics, 2003, 19(5):651-652.
  • 5Burland T G.DNASTAR’s laser gene sequence analysis software[M]//Misener S, Krawetz S A.Methods in Molecular Biology.Totowa, New Jersey:Humana Press, 2000, 132:71-91.
  • 6Altschul S F, Gish W, Miller W, et al.Basic local alignment search tool[J].Journal of Molecular Biology, 1990, 215(3):403-410.
  • 7向福,陈悟,余龙江.基于Bioperl的基因序列获取的程序设计与实现[J].生物技术,2004,14(6):64-66. 被引量:10
  • 8周猛,童春发,施季森.充分利用Bioperl加速生物信息学的研究[J].生物信息学,2008,6(1):43-45. 被引量:4
  • 9Wang L, Ruan Y L.Unraveling mechanisms of cell expansion linking solute transport, metabolism, plasmodesmtal gating and cell wall dynamics[J]. Plant Signaling and Behavior, 2010, 5(12):1561-1564.
  • 10Wang L, Li X R, Lian H, et al.Evidence that high activity of vacuolar invertase is required for cotton fiber and Arabidopsis root elongation through osmotic dependent and independent pathways, respectively[J].Plant Physiology, 2010, 154(2):744-756.

二级参考文献33

  • 1向福,陈悟,余龙江.基于Bioperl的基因序列获取的程序设计与实现[J].生物技术,2004,14(6):64-66. 被引量:10
  • 2[2]Jason E.Stajich,David Block,Kris Boulez,et al.The Biopert Toolkit:Perl Modules for the Life Sciences[J].Genome Res.,2002,12:1611-1618.
  • 3Ryan L,Brian D G,Joseph R E.Next is now:new technologies for sequencing of genomes,transcriptomes,and beyond[J].Curr Opin Plant Biol,2009,12(2):107-18.
  • 4Altschul S F,Gish W,Miller W,et al.Basic local alignment search tool[J].J Mol Biol,1990,215:403-412.
  • 5Altschul S F,Madden T L,Schaffer A A,et al.Gapped BLAST andPSI-BLAST:a new generation of protein database search programs[J].NucleicAcids Research,1997,25(17):3389-3402.
  • 6Ye J,McGinnis S,Madden T L.BLAST:improvements for better sequence analysis[J].Nucleic Acids Res.,2006,34:W6-9.
  • 7Edgar RC.MUSCLE:multiple sequence alignment with high accuracy and high throughput[J].Nucleic Acids Res.,2004,32:1792-1797.
  • 8Felsenstein J.PHYLIP-Phylogeny Inference Package (Version 3.2)[J].Cladistics,1989,5:164-166.
  • 9Eddy S R.Profile hidden Markov models[J].Bioinformatics,1998,14 (9):755-763.
  • 10于澄宇 金平安.高通量植物蛋白组学研究方法.生物信息学,2003,1(1):1-5.

共引文献17

同被引文献37

二级引证文献4

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部