摘要
结构变异作为人类基因组上的一种大规模的变异类型,对分子与细胞进程、调节功能、基因表达调控、个体表型具有重要的影响,检测群体中基因组结构变异有助于绘制群体基因组变异图谱,刻画群体遗传进化特征,为疾病诊治、精准医疗的发展提供支撑。本研究提出一种面向高通量测序的群体基因组结构变异检测工作流,该工作流通过使用多种高性能基因组结构变异检测算法实现全面、精准的结构变异挖掘,使用多层融合与过滤获得高精度群体结构变异候选集合,利用基因型重新校正、变异修剪、类型校对,最终完整绘制群体基因组结构变异图谱。基于该工作流对由267个样本组成的人群进行群体结构变异检测,检测出了96202个结构变异,其变异种类和频率分布与其他国际基因组计划相符,这些结果证明了本工作流具有良好的群体结构变异检测能力。同时,工作流通过并行的方式在内存可控的基础上显著降低了分析时间,为大规模人群基因组结构变异的高效检测提供了重要支撑。
Structural variation is an important type of genome variation,which affects molecular and cellular processes,regulatory functions,and brings great influence of the regulation of gene expression and individual phenotype.The accurate detection of population⁃scale structural variation helps to draw the full spectrum of population genome variation,which reveals the characteristics of population genetics and evolution,and gives support for disease analysis and precision medicine.This paper provides a workflow of structural variation detection from population genomes based on high⁃throughput sequencing data.The workflow achieves comprehensive and accurate structural variation detection through multiple high⁃performance structural variation detection algorithms.The multilayer integration and filter were applied to achieve set of candidate structural variation with high precision.By performing genotype correction,variation trimming,and type revising,the spectrum of structural variation of population genomes was obtained.In this study,structural variation detection was performed through the workflow on a population group containing 267 individuals,and 96202 structural variations were reported.The types of variation and distributions of variation frequencies corresponded to those in other international genome projects,which indicates that the workflow has outstanding ability for structural variation detection from population genomes.Meanwhile,the parallel workflow significantly decreases the analysis time while maintaining the memory cost,which gives strong support for large⁃scale population structural variation detection.
作者
曹舒淇
刘诗琦
姜涛
CAO Shuqi;LIU Shiqi;JIANG Tao(Faculty of Computing,Harbin Institute of Technology,Harbin 150001,China)
出处
《生物信息学》
2021年第4期232-239,共8页
Chinese Journal of Bioinformatics
基金
国家重点研发项目(No.2017YFC0907503)
国家自然科学基金项目(No.32000467).
关键词
群体基因组
结构变异
变异检测
变异融合
Population genomes
Structural variation
Variation detection
Variant integration