基于MapReduce的基因读段定位算法被引量：2

Gene Read Mapping Algorithms Based on MapReduce

下载PDF

导出

摘要 RNA-seq测序技术的高速发展所产生的海量数据在执行效率上给原有读段定位算法带来严峻的挑战.为此,提出基于MapReduce的不跨越剪切位的空位种子索引算法(PSeqMap)和跨越剪切位的空位种子索引算法(PJuncSeqMap),以及一种负载平衡解决方案.该算法利用MapReduce框架实现空位种子索引算法的并行化,在拟南芥菜基因数据集上的实验结果表明文中提出的算法能够充分利用集群的存储和计算能力,高效处理海量基因数据. Massive data generated by the rapid development of RNA-seq sequencing technology make serious challenges to the original read mapping algorithm in the efficiency. A spaced seed indexing algorithm without considering splice site based on MapReduee （PSeqMap）, a spaced seed indexing algorithm considering splice site （PJuncSeqMap）, and a load-balancing solution are proposed. The MapReduce framework is employed to parallelize spaced seed indexing algorithms. The experimental results on the Arabidopsis gene datasets show that the proposed algorithms take full advantage of storage and computing power of the clusters and process massive genetic data efficiently.

作者涂金金杨明郭丽娜

机构地区南京师范大学计算机科学与技术学院

出处《模式识别与人工智能》 EI CSCD 北大核心 2014年第3期206-212,共7页 Pattern Recognition and Artificial Intelligence

基金国家自然科学基金项目(No.61272222 61003116) 江苏省自然科学基金重点重大专项项目(No.BK2011005) 江苏省自然科学基金项目(No.BK2011782) 江苏省普通高校研究生科研创新计划项目(No.CXLX12_0415)资助

关键词读段定位 SeqMap MAPREDUCE Read Mapping, SeqMap, MapReduce

分类号 Q75 [生物学—分子生物学] TP301.6 [自动化与计算机技术—计算机系统结构]

引文网络
相关文献

参考文献12

1Smith AD, Xuan Z, Zhang M Q. Using Quality Scores and Longer Reads Improves Accuracy of Solexa Read Mapping. BMC Bioinformatics, 2008. DOI: 10.118611471-2105-9-128.
2iang H, Wong W H. SeqMap: Mapping Massive Amount of Oligonucleotides to the Genome. Bioinformatics, 2008, 24 (20) : 2395- 2396.
3Langmead B, Trapnell C, Pop M. Ultrafast and Memory-Efficient Alignment of Short DNA Sequences to the Human Genome. Genome Biology, 2009. DOl:IO.1186/gb-2009-1O-3-r25.
4Li R G, Yu C, Li Y R, et al. SOAP2: An Improved Ultrafast Tool for Short Read Alignment. Bioinformatics, 2009, 25 ( 15): 1966- 1967.
5Li H, Durbin R. Fast and Accurate Short Read Alignment with Burrows-Wheeler Transform. Bioinformatics, 2009, 25 (14): 1754- 1760.
6Trapnell C, Pachter L, Salzberg S L. TopHat: Discovering Splice Junctions with RNA-Seq. Bioinformatics, 2009, 25(9): 1105-1111.
7Au K F, Jiang H, Lin L, et al. Detection of Splice Junctions from Paired-End RNA-Seq Data by SpliceMap. Nucleic Acids Research, 2010, 38 (14) : 4570-4578.
8Wang K, Singh D, Zeng Z, et al. MapSplice: Accurate Mapping of RNA-Seq Reads for Splice Junction Discovery. Nucleic Acids Research, 2010. DOI:10.1093/nar/gkq622.
9Homer N, Merriman B, NelsonSF. BFAST: An Alignment Tool for Large Scale Genome Resequencing. PLoS One, 2009. DOI: 10. 1371/journal. pone. 0007767.
10Olson C B, Kim M, Clauson C, et al. Hardware Acceleration of Short Read Mapping II Proc of the 20th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines. Toronto, Canada, 2012: 161-168.

同被引文献11

1Jiang H, Wong W H. SeqMap.. mapping massive amount of oli- gonucleotides to the genome[J]. Bioinformatics, 2008,24 (20) .. 2395-2396.
2Langmead B, Trapnell C, Pop M. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome[J]. Genome Biol,2009,10(3) : 25.
3Wang K, Singh D, Zeng Z. MapSplice: accurate mapping of RNA-seq reads [or splice junction discovery[J]. Nucleic Acids Res,2010,38(18) : 178.
4Homer N, Merriman B, Nelson S F. BFAST: an alignment tool for large scale genome resequeneing[J]. PLoS One, 2009,4 (11) : 7767.
5Smith T F,Waterman M S. Identification of common molecular subsequences[J]. J Mol Bio1,1981,147(1) : 195-197.
6Dean J,Ghemawat S. MapReduce: Simplified data processing on large clusters[J]. ACM, 2008,51(1) : 137-150.
7Schatz M C. CloudBurst: highly sensitive read mapping with Map- Reduce[J]. Bioinformatics, 2009,25(11) : 1363-1369.
8王曦,汪小我,王立坤,冯智星,张学工.新一代高通量RNA测序数据的处理与分析[J].生物化学与生物物理进展,2010,37(8):834-846. 被引量：64
9付天新,刘正军,闫浩文.基于MapReduce模型的生物量遥感并行反演方法研究[J].干旱区资源与环境,2013,27(1):130-136. 被引量：6
10王晓佳,杨善林,陈志强.大数据时代下的情报分析与挖掘技术研究——电信客户流失情况分析[J].情报学报,2013,32(6):564-574. 被引量：20

引证文献2

1涂金金,杨明,郭丽娜.基于MapReduce的基因读段定位改进算法[J].计算机科学,2015,42(8):82-85. 被引量：1
2张霄宏,孙江峰,赵文涛.基于PUSH机制的任务调度方法[J].中南大学学报（自然科学版）,2016,47(7):2334-2340.

二级引证文献1

1周国军,程裕强,吴庆军.基于Hadoop的并行朴素贝叶斯分类算法[J].玉林师范学院学报,2015,36(5):105-110.

1白杨,王亚东.基于RNA-Seq数据识别外显子跳跃事件的方法研究综述[J].智能计算机与应用,2016,6(2):1-4.
2涂金金,杨明,郭丽娜.基于MapReduce的基因读段定位改进算法[J].计算机科学,2015,42(8):82-85. 被引量：1
3陈科,朱清新,杨曦.最优搜索机制下寻找最优插入-删除种子[J].电子科技大学学报,2011,40(2):292-295.
4石新新,刘学军,张礼.改进的RNA-Seq数据转录组表达分析研究[J].数据采集与处理,2015,30(5):1028-1035. 被引量：3
5张礼,刘学军,陈松灿.基于多样本RNA-Seq数据的表达水平估计方法[J].计算机科学与探索,2016,10(2):210-219. 被引量：1
6宋东光,陈小英.基于MySQL数据库的植物编码基因数据挖掘初步分析[J].生物信息学,2008,6(1):12-13.
7欧书华,刘学军,张礼.基于平滑LDA的RNA-Seq数据表达分析研究[J].计算机科学与探索,2016,10(3):381-388. 被引量：1
8王黎黎,刘学军,张礼.基于RNA-seq数据的差异基因和异构体检测[J].南京大学学报（自然科学版）,2016,52(2):253-260. 被引量：2
9CHEN Geng,WANG Charles,SHI TieLiu.Overview of available methods for diverse RNA-Seq data analyses[J].Science China(Life Sciences),2011,54(12):1121-1128. 被引量：16
10欧书华,刘学军,张礼.基于KL散度的RNA-Seq数据差异异构体比例检测[J].计算机工程与科学,2017,39(1):158-164. 被引量：3

模式识别与人工智能

2014年第3期

浏览历史

内容加载中请稍等...

基于MapReduce的基因读段定位算法被引量：2

参考文献12

同被引文献11

引证文献2

二级引证文献1

相关作者

相关机构

相关主题

浏览历史

基于MapReduce的基因读段定位算法 被引量：2

参考文献12

同被引文献11

引证文献2

二级引证文献1

相关作者

相关机构

相关主题

浏览历史

基于MapReduce的基因读段定位算法被引量：2