摘要
Phylogenomics, the inference of phylogenetic trees using genome-scale data, is becoming the rule for resolving difficult parts of the tree of life. Its promise resides in the large amount of information available, which should eliminate stochastic error. However, systematic error, which is due to limitations of reconstruction meth-ods, is becoming more apparent. We will illustrate, using animal phylogeny as a case study, the three most effi-cient approaches to avoid the pitfalls of phylogenomics: (1) using a dense taxon sampling, (2) using probabilistic methods with complex models of sequence evolution that more accurately detect multiple substitutions, and (3) removing the fastest evolving part of the data (e.g., species and positions). The analysis of a dataset of 55 animal species and 102 proteins (25712 amino acid positions) shows that standard site-homogeneous model inference is sensitive to long-branch attraction artifact, whereas the site-heterogeneous CAT model is less so. The latter model correctly locates three very fast evolving species, the appendicularian tunicate Oikopleura, the acoel Convoluta and the myxozoan Buddenbrockia. Overall, the resulting tree is in excellent agreement with the new animal phylogeny, confirming that "simple" organisms like platyhelminths and nematodes are not necessarily of basal emergence. This further emphasizes the importance of secondary simplification in animals, and for organismal evolution in general.
Phylogenomics, the inference of phylogenetic trees using genome-scale data, is becoming the rule for resolving difficult parts of the tree of life. Its promise resides in the large amount of information available, which should eliminate stochastic error. However, systematic error, which is due to limitations of reconstruction methods, is becoming more apparent. We will illustrate, using animal phylogeny as a case study, the three most efficient approaches to avoid the pitfalls of phylogenomics: (1) using a dense taxon sampling, (2) using probabilistic methods with complex models of sequence evolution that more accurately detect multiple substitutions, and (3) removing the fastest evolving part of the data (e.g., species and positions). The analysis of a dataset of 55 animal species and 102 proteins (25712 amino acid positions) shows that standard site-homogeneous model inference is sensitive to long-branch attraction artifact, whereas the site-heterogeneous CAT model is less so. The latter model correctly locates three very fast evolving species, the appendicularian tunicate Oikopleura, the acoel Convoluta and the myxozoan Buddenbrockia. Overall, the resulting tree is in excellent agreement with the new animal phylogeny, confirming that "simple" organisms like emergence. This further emphasizes the importance evolution in general. platyhelminths and nematodes are not necessarily of basal of secondary simplification in animals, and for organismal evolution in general.
出处
《植物分类学报》
CSCD
北大核心
2008年第3期274-286,共13页
Acta Phytotaxonomica Sinica
关键词
动物
发展史
测序方法
缺陷
改进措施
long-branch attraction (LBA) artifact, new animal phylogeny, phylogenomics, random error, systematic error.