摘要
随着环境基因组学及深度测序技术的发展,基于16SrRNA基因序列研究微生物种群结构取得了长足进展。然而,由于环境样本的复杂性,尤其缺少真实背景信息,定量研究环境微生物种群结构仍是当前的研究难点。测序算法仿真平台研究,不仅有助于定量、定性分析微生物种群组成及结构,而且有助于建立基准数据库来评价当前微生物数据分析算法。分别基于易错PCR误差模型和正态分布过程,模拟454测序仪乳液PCR过程及边合成边测序过程,提出454测序仪模拟测序算法(Tsim)。仿真结果表明:该模拟算法能较好地模拟454测序过程。
Recent advance of environment genome and deep sequencing technologies has expanded our understanding of composition and structure of microbial community based on 16S rRNA gene sequences. However, the complexity and difficulty of separation of the environmental samples and lack of ground-truth make it difficult to analyze the microbes quantificationally. Thus, simulation datasets will be useful in developing novel softwares because it not only helps us ex- plore the microbial structure quantitatively, but also allow us to construct benchmark studies for evaluating existing methods for processing 16S rRNA sequences data. In the present work, based on error-prone PCR model and making use of the normal distribution model, a simulation algorithm for 454 sequencer (Tsim) was established to simulate the process of sequencing by synthesis. The simulation results show that the simulator can effectively simulate 454 sequen-cing process.
出处
《计算机科学》
CSCD
北大核心
2014年第2期261-263,284,共4页
Computer Science
基金
国家自然基金重点项目(61135001)
国家自然科学基金(61170134
60775012)
航空基金(20100853010)
西北工业大学博士论文创新基金(cx201017)资助