Testing congruence in phylogenomic analysis.

Testing congruence in phylogenomic analysis.

Syst Biol. 2008 Feb;57(1):104-15

Authors: Leigh JW, Susko E, Baumgartner M, Roger AJ

Abstract
Phylogenomic analyses of large sets of genes or proteins have the potential to revolutionize our understanding of the tree of life. However, problems arise because estimated phylogenies from individual loci often differ because of different histories, systematic bias, or stochastic error. We have developed Concaterpillar, a hierarchical clustering method based on likelihood-ratio testing that identifies congruent loci for phylogenomic analysis. Concaterpillar also includes a test for shared relative evolutionary rates between genes indicating whether they should be analyzed separately or by concatenation. In simulation studies, the performance of this method is excellent when a multiple comparison correction is applied. We analyzed a phylogenomic data set of 60 translational protein sequences from the major supergroups of eukaryotes and identified three congruent subsets of proteins. Analysis of the largest set indicates improved congruence relative to the full data set and produced a phylogeny with stronger support for five eukaryote supergroups including the Opisthokonts, the Plantae, the stramenopiles + Apicomplexa (chromalveolates), the Amoebozoa, and the Excavata. In contrast, the phylogeny of the second largest set indicates a close relationship between stramenopiles and red algae, to the exclusion of alveolates, suggesting gene transfer from the red algal secondary symbiont to the ancestral stramenopile host nucleus during the origin of their chloroplast. Investigating phylogenomic data sets for conflicting signals has the potential to both improve phylogenetic accuracy and inform our understanding of genome evolution.

PMID: 18288620 [PubMed - indexed for MEDLINE]