摘要
使用聚合酶合成技术的Illumina和454平台以及使用连接酶合成测序技术的SOLiD平台是目前三种主流的第二代测序平台.对第二代测序平台产生的高通量序列片段进行比对的方法一般分为两步:①预处理,②序列比对.预处理方法有两类,即基于哈希表的方法和基于后缀trie的Burrows-Wheeler转换思想.序列比对方法也可分为两类,一是空位种子片段索引,二是Smith-Waterman动态规划算法.本文使用Illumina和SOLiD两种平台产生的数据对常用的比对软件SHRiMP,MAQ,BFAST,BWA,BOWTIE等进行了单机测试,结果显示:BOW-TIE在对Illumina平台数据进行比对时,在内存使用、比对速度以及准确性等方面表现比其他几种好,BWA比较适合用于比对SOLiD平台产生的数据.在处理第二代以及以纳米孔技术为标志的第三代测序平台高通量数据时,第二代比对技术仍不能完全满足要求,本文认为以云计算为基础的新序列比对方法是未来研究和发展的一个重要方向.
Illumina, SOLID and 454 are three widly used platforms for the second generation sequencing. Among them, both Illumina and SOLID rely on the polymerase chain reaction (PCR) technique, while 454 relies on the DNA ligase. When dealing with the data produced by the platform, two steps are needed.. (1) the preprocess of the high throughput data; (2) sequence alignment. Generally, there are two kinds of preprocessing methods., hash table method and the method based on suffix trie of the Burrows-Wheeler transform. And there are two ways of sequence align- ment: spaced seed indexing and the Smith-Waterman algorithm based on dynamic programming strategy. This paper chooses to evaluate several commonly used ones, such as.. SHRIMP, MAQ, BFAST, BWA, and BOWTIE, by using two kinds of data produced by Illumina and SOLID respectively. The results show that BOWTIE fits for aligning the sequences produced by Illumina in terms of the memory usage, speed and accuracy, while BWA suits for aligning se- quences from SOLID. Considering the situation of disharmony between processing speed and data volume produced by the second generation sequencing platforms or even the third generation sequencing platforms represented by Nanoporous, the paper suggests that new sequence alignment methods based on the cloud computing is an important direction of the future research.
出处
《武汉大学学报(理学版)》
CAS
CSCD
北大核心
2012年第5期463-470,共8页
Journal of Wuhan University:Natural Science Edition
基金
国家自然科学基金(60970063)
教育部博士点基金(20090141110026)
新世纪优秀人才计划(NCET-10-0644)资助项目
关键词
第二代测序技术
读段
序列比对
second generation of sequencing
read
alignment