
基于MapReduce的基因读段定位算法 被引量:2

Gene Read Mapping Algorithms Based on MapReduce
摘要 RNA-seq测序技术的高速发展所产生的海量数据在执行效率上给原有读段定位算法带来严峻的挑战.为此,提出基于MapReduce的不跨越剪切位的空位种子索引算法(PSeqMap)和跨越剪切位的空位种子索引算法(PJuncSeqMap),以及一种负载平衡解决方案.该算法利用MapReduce框架实现空位种子索引算法的并行化,在拟南芥菜基因数据集上的实验结果表明文中提出的算法能够充分利用集群的存储和计算能力,高效处理海量基因数据. Massive data generated by the rapid development of RNA-seq sequencing technology make serious challenges to the original read mapping algorithm in the efficiency. A spaced seed indexing algorithm without considering splice site based on MapReduee (PSeqMap), a spaced seed indexing algorithm considering splice site (PJuncSeqMap), and a load-balancing solution are proposed. The MapReduce framework is employed to parallelize spaced seed indexing algorithms. The experimental results on the Arabidopsis gene datasets show that the proposed algorithms take full advantage of storage and computing power of the clusters and process massive genetic data efficiently.
出处 《模式识别与人工智能》 EI CSCD 北大核心 2014年第3期206-212,共7页 Pattern Recognition and Artificial Intelligence
基金 国家自然科学基金项目(No.61272222 61003116) 江苏省自然科学基金重点重大专项项目(No.BK2011005) 江苏省自然科学基金项目(No.BK2011782) 江苏省普通高校研究生科研创新计划项目(No.CXLX12_0415)资助
关键词 读段定位 SeqMap MAPREDUCE Read Mapping, SeqMap, MapReduce
  • 相关文献


  • 1Smith AD, Xuan Z, Zhang M Q. Using Quality Scores and Longer Reads Improves Accuracy of Solexa Read Mapping. BMC Bioinformatics, 2008. DOI: 10.118611471-2105-9-128.
  • 2iang H, Wong W H. SeqMap: Mapping Massive Amount of Oligonucleotides to the Genome. Bioinformatics, 2008, 24 (20) : 2395- 2396.
  • 3Langmead B, Trapnell C, Pop M. Ultrafast and Memory-Efficient Alignment of Short DNA Sequences to the Human Genome. Genome Biology, 2009. DOl:IO.1186/gb-2009-1O-3-r25.
  • 4Li R G, Yu C, Li Y R, et al. SOAP2: An Improved Ultrafast Tool for Short Read Alignment. Bioinformatics, 2009, 25 ( 15): 1966- 1967.
  • 5Li H, Durbin R. Fast and Accurate Short Read Alignment with Burrows-Wheeler Transform. Bioinformatics, 2009, 25 (14): 1754- 1760.
  • 6Trapnell C, Pachter L, Salzberg S L. TopHat: Discovering Splice Junctions with RNA-Seq. Bioinformatics, 2009, 25(9): 1105-1111.
  • 7Au K F, Jiang H, Lin L, et al. Detection of Splice Junctions from Paired-End RNA-Seq Data by SpliceMap. Nucleic Acids Research, 2010, 38 (14) : 4570-4578.
  • 8Wang K, Singh D, Zeng Z, et al. MapSplice: Accurate Mapping of RNA-Seq Reads for Splice Junction Discovery. Nucleic Acids Research, 2010. DOI:10.1093/nar/gkq622.
  • 9Homer N, Merriman B, NelsonSF. BFAST: An Alignment Tool for Large Scale Genome Resequencing. PLoS One, 2009. DOI: 10. 1371/journal. pone. 0007767.
  • 10Olson C B, Kim M, Clauson C, et al. Hardware Acceleration of Short Read Mapping II Proc of the 20th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines. Toronto, Canada, 2012: 161-168.


  • 1Jiang H, Wong W H. SeqMap.. mapping massive amount of oli- gonucleotides to the genome[J]. Bioinformatics, 2008,24 (20) .. 2395-2396.
  • 2Langmead B, Trapnell C, Pop M. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome[J]. Genome Biol,2009,10(3) : 25.
  • 3Wang K, Singh D, Zeng Z. MapSplice: accurate mapping of RNA-seq reads [or splice junction discovery[J]. Nucleic Acids Res,2010,38(18) : 178.
  • 4Homer N, Merriman B, Nelson S F. BFAST: an alignment tool for large scale genome resequeneing[J]. PLoS One, 2009,4 (11) : 7767.
  • 5Smith T F,Waterman M S. Identification of common molecular subsequences[J]. J Mol Bio1,1981,147(1) : 195-197.
  • 6Dean J,Ghemawat S. MapReduce: Simplified data processing on large clusters[J]. ACM, 2008,51(1) : 137-150.
  • 7Schatz M C. CloudBurst: highly sensitive read mapping with Map- Reduce[J]. Bioinformatics, 2009,25(11) : 1363-1369.
  • 8王曦,汪小我,王立坤,冯智星,张学工.新一代高通量RNA测序数据的处理与分析[J].生物化学与生物物理进展,2010,37(8):834-846. 被引量:64
  • 9付天新,刘正军,闫浩文.基于MapReduce模型的生物量遥感并行反演方法研究[J].干旱区资源与环境,2013,27(1):130-136. 被引量:6
  • 10王晓佳,杨善林,陈志强.大数据时代下的情报分析与挖掘技术研究——电信客户流失情况分析[J].情报学报,2013,32(6):564-574. 被引量:20










使用帮助 返回顶部