摘要
对于一组给定的DNA或蛋白质序列,UPGMA算法构建的二叉进化树可能是不惟一的,其具体拓扑结构与序列输入顺序相关,这一现象通常被称为"tied trees"。提出了UPGMA的一种改进算法——不加权算术平均组群方法(UMGMA),用以解决UPGMA树的不惟一问题。在UPGMA树惟一时,该方法产生的进化树与UPGMA树相同;而在UPGMA树不惟一时,该方法可以产生一棵惟一的、与序列输入顺序无关的多叉进化树,而且该算法还具有一个可调的容差参数,来控制生成进化树的主要分枝结构,这对于突出大规模进化树的总体脉络具有重要意义。
Given a set of DNA or protein sequences, UPGMA may produce non - unique bifurcating phylogenetie trees, which are usually called "tied trees", depending on the input order of the sequences.In this paper,we first point out the reason for the non- uniqueness of UPGMA trees,and then present an improved method for UPGMA,namely, UMGMA (Unweighted Multiple Group Method with Arithmetic Mean)to solve the "tired trees"problem.if a UPGMA tree is unique,the UMGMA method can produce the same tree,otherwise,it will output a unique multi - furcating phylogenetic tree which is independent on the input order of sequences. By specifying a proper tolerant parameter, furthermore, UMGMA can be used to control the main branches of phylogenetie trees, as may be important for outlining the overall structures of large trees.
出处
《生物信息学》
2007年第4期160-162,共3页
Chinese Journal of Bioinformatics
基金
北京市自然科学基金资助项目(4052005)
关键词
二叉树
多叉树
系统发育分析
距离法
惟一性
bifurcating- tree
multi - furcating tree
phylogenetic analysis
distance- based method
uniqueness