摘要
高通量测序技术产生的DNA序列数据长度较短,而且数据量非常巨大。分析了高通量测序环境下大数据的挑战和机遇,总结并讨论了数据压缩、宏基因组数据序列拼接、宏基因组数据序列分析方面的算法和工具等研究成果。最后,展望了高通量测序下DNA短读序列数据研究的发展趋势。
DNA sequence data generated by high-throughput sequencing technology is short in length, and the amount of data is enormous. The challenges and opportunities of the big data in high-throughput sequencing environment were analyzed. The data compression, the assembly of metagenomic sequence data, and algorithms and tools of metagenomic sequence data analysis also were summarized and discussed. Finally, the future of the study on short read DNA sequence data in high-throughput sequencing environment was discussed.
出处
《大数据》
2016年第2期76-87,共12页
Big Data Research
基金
黑龙江省自然科学基金资助项目(No.F201313)
黑龙江省教育厅科学技术研究资助项目(No.12541124)
哈尔滨市科技创新(No.2013RFQXJ114)~~
关键词
高通量DNA测序
生物信息学
短读序列数据压缩
短读序列数据拼接
短读序列数据分析
high-throughput DNA sequencing
bioinformatics
short read sequence data compression
short read sequence data splicing
short read sequence data analysis