期刊文献+

棘腹蛙线粒体局部重复序列非排序聚类

Clustering Mitochondrial DNA Sequences Experienced Tandem Duplication Based on Alignment-free Comparison in Quasipaa boulengeri
下载PDF
导出
摘要 动物线粒体基因组发生局部串联复制后,涉及区域具有多基因拷贝、假基因化、大量插入缺失的特点,难以排序和构建基因树。而不依赖排序的聚类方法理论上可用来归纳和展示这类序列的差异,但未见相关评估和运用。本研究选取棘腹蛙Quasipaa boulengeri 19号个体,以3类常用的基于特定长度(k)子序列集的非排序算法,依次设k值为4、6、8……20,对其轻链复制起点邻近复制区域583~695 bp的序列进行聚类。构建相同个体线粒体1 518 bp蛋白编码序列最大似然树为参照,计算和考查两者间拓扑结构距离和差异。所评估的28种算法中,半数可在主要为8的特定k值下产生和最大似然树拓扑结构相差仅2个节点(11.8%)的聚类树,部分算法在不同k值下均表现不佳,较小的k值(4)适合解析差异程度相对较高的序列间关系。这些结果例证了动物线粒体重复序列非排序聚类的可行性,其中的算法、k值理想组合可能适合类似系统。建议对其他类型的复制重排系统进行类似评估。 Animal mitochondrial genome regions experienced tandem duplication and the following random loss are often hypervariable and hence challenging for alignment algorithms. In theory,alignment-free comparison methods( AFM) can be used to summarize and visually present the relationships and similarities of such sequences. To our knowledge,relevant evaluations and applications are lacking. We evaluated 3 types of commonly used k-mer-based AFM with a system of intraspecific sequence variation for one such region around the origin of light strand replication. From the frog species Quasipaa boulengeri,19 sequences ranging from 583 bp to 695 bp were clustered using 28 AFM. For each method,substrings of length k = 4,6,8,10,12,14,16,18,and 20 bp were tried. From the same individuals,the mitochondrial protein-coding sequences with length of 1 518 bp were used to reconstruct a Maximum Likelihood tree as the reference topology. Between the reference and AFM topologies,the Robinson-Foulds distance was calculated and the major topological difference was recorded. Using a k value of typically 8,half of the methods produced a tree different from the reference by only 2 nodes( 11. 8%). However,poor performances were constantly observed for some methods. A small k value of 4 was found to be suitable for inferring the relationships among sequence groups. These findings support a successful application of AFM on animal mitochondrial tandem duplication regions. The combinations between methods and k values with ideal performance obtained here may be applied to similar systems. For different systems,similar evaluations will be helpful.
作者 曹跃 夏云 郑渝池 CAO Yue;XIA Yun;ZHENG Yuchi(Department of Herpetology, Chengdu Institute of Biology, Chinese Academy of Sciences, Chengdu 610041, China;University of Chinese Academy of Sciences, Beijing 100049, China)
出处 《四川动物》 北大核心 2018年第3期261-267,共7页 Sichuan Journal of Zoology
基金 国家自然科学基金项目(31372181 31572243)
关键词 棘腹蛙 线粒体DNA 非排序比对 聚类 重复序列 拓扑结构距离 蛋白编码序列 最大似然树 Quasipaa boulengeri mitochondrial DNA alignment-free comparison clustering duplication region Robinson-Foulds distance protein-coding gene Maximum Likelihood tree
  • 相关文献

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部