摘要
针对生物信息学领域中种系发生树构建这一重要课题的需要,利用DNA碱基序列的频度混沌游走表示法,提出一种碱基序列自重复性的度量和一种序列间相关性的度量,并由此出发,提出了一种新的以此种相关性为依据的聚类方法。利用这样的方法,通过GenBank中获取的线粒体DNA数据构建了一棵包含20个物种的种系发生树。实验结果验证了新提出的度量方法以及聚类方法在种系发生树构建问题上的有效性。此外,由于这种方法使用碱基序列的图形表示法,而非传统的串形表示法,避免了建树过程中序列间联配的步骤。
Phylogenetic tree-building is one of the most significant issues in bioinformatics. In order to develop a new treebuilding method, is paper proposed new measures for the sequence repeatability and the sequence correlation based on the cha- os game representation with frequencies (FCGR) of DNA sequences. Adopting such measures,then gave a new clustering approach for phylogenetic tree-building. The experiment obtained a phylogenetic tree of 20 species based on their mitochondrial DNA sequence data collected from GenBank. The result shows that the new measures and the clustering approach benefit to the phylogenetic tree-building. Moreover, such a tree-building avoids the alignment process since it is based on FCGR instead of the traditional representation of DNA sequences.
出处
《计算机应用研究》
CSCD
北大核心
2012年第8期2956-2960,共5页
Application Research of Computers
基金
国家自然科学基金资助项目(41076059)
福建省自然科学基金计划项目(2012J05114)
福建省科技创新平台计划项目(2009J1007)
关键词
种系发生树
混沌游走表示法
聚类
序列分析
生物信息学
phylogenetic tree
chaos game representation (CGR)
clustering
sequence analysis
bioinformatics