<%@ Page Language="C#" MasterPageFile="~/Main.master" AutoEventWireup="true" Title="Volume 13 Issue 2" %>
Groves of Phylogenetic Trees
C\'{e}cile An\'{e}1, Oliver Eulenstein2, Raul Piaggio-Talice2, and Michael J. Sanderson3
1Departments of Statistics and of Botany, University of Wisconsin-Madison, Madison, WI 53706, USA
2Department of Computer Science, Iowa State University, Ames, IA 50011-1041, USA
{oeulenst, rpiaggio}@cs.iastate.edu
3Department of Ecology and Evolutionary Biology, University of Arizona, Tucson 85721, USA
Annals of Combinatorics 13 (2) pp.139-167 June, 2009
AMS Subject Classification: 05C05, 68R10, 92D15
A major challenge in biological sciences is the reconstruction of the Tree of Life. To this effect, large genomic databases like GenBank and SwissProt are being mined for clusters from which phylogenies can be inferred. Systematists and comparative biologists commonly combine such phylogenies into informative supertrees that reveal information which was not explicitly displayed in any of the original phylogenies. However, whether a supertree is informative depends on particular overlap properties among the clusters from which it originates. In this work we formally introduce the concept of groves --- sets of clusters with the potential to construct informative supertrees. Thus maximal potential candidate clusters for informative supertree construction can be identified in large databases through groves, prior to inferring trees for each cluster. Groves also have the potential to lead to informative supermatrix construction. We developed methods that (i) efficiently identify particular types of groves and (ii) find lower and upper bounds on the minimal number of groves needed to cover all the trees or data sets in a database. Finally, we apply our methods to the green plant sequences from GenBank.
Keywords: supertree, supermatrix, triplets, clustering, evolution


1. A.V. Aho, Y. Sagiv, T.G. Szymanski, and J.D. Ullman, Inferring a tree from lowest common ancestors with an application to the optimization of relational expressions, SIAM J. Comput. 10 (3) (1981) 405–-421.

2. O.R.P. Bininda-Emonds, Phylogenetic Supertrees: Combining Information to Reveal the Tree of Life, Computational Biology, Vol. 4, Kluwer Academic Publishers, Dordrecht, 2004.

3. O.R.P. Bininda-Emonds, J.L. Gittleman, and M. Steel, The (super)tree of life: procedures, problems, and prospects, Annu. Rev. Ecol. Syst. 33 (2002) 265-–289.

4. D.J. Bryant and M. Steel, Extension operations on sets of leaf-labelled trees, Adv. Appl. Math. 16 (4) (1995) 425–-453.

5. J.A. Cotton, C.S.C. Slater, and M. Wilkinson, Discriminating supported and unsupported relationships in supertrees using triplets, Syst. Biol. 55 (2) (2006) 345-–350.

6. T.J. Davies, T.G. Barraclough, M.W. Chase, P.S. Soltis, D.E. Soltis, and V. Savolainen, Darwin's abominable mystery: insights from a supertree of the angiosperms, Proc. Natl. Acad. Sci. USA 101 (2004) 1904–-1909.

7. A.C. Driskell, C. An´e, J.G. Burleigh, M.M. McMahon, B.C. O'Meara, and M.J. Sanderson, Prospects for building the tree of life from large sequence databases, Science 306 (2004) 1172-–1174.

8. L.R. Foulds and R.L. Graham, The Steiner problem in Phylogeny is NP-complete, Adv. Appl. Math. 3 (1982) 43–-49.

9. M. Hall Jr., Combinatorial Theory, John Wiley & Sons, New York, 1986.

10. M. Kennedy and R.D.M. Page, Seabird supertrees: combining partial estimates of procellariiform phylogeny, Auk 119 (1) (2002) 88-–108.

11. F.-G. Liu, M.M. Miyamoto, N.P. Freire, P.Q. Ong, M.R. Tennant, T.S. Young, and K.F. Gugel, Molecular and morphological supertrees for eutherian (plancental) mammals, Science 291 (2001) 1786-–1789.

12. R.D.M. Page, Phyloinformatics: towards a phylogenetic database, In: Data Mining in Bioinformatics, J.T.L.Wang, M.J. Zaki, H.T.T. Toivonen, and D.E. Shasha, Eds., Springer, Heidelberg, (2005) pp. 219-–241.

13. R.D.M. Page, Taxonomy, supertrees, and the Tree of Life, In: Phylogenetic Supertrees: Combining Information to Reveal the Tree of Life, O.R.P. Bininda-Emonds Ed., Kluwer Academic Publishers, Dordrecht, (2004) pp. 247–-265.

14. C.H. Papadimitriou, Computational Complexity, Addison-Wesley, Reading, Massachusetts, 1994.

15. D. Pisani, A.M. Yates, M.C. Langer, and M.J. Benton, A genus-level supertree of the Dinosauria, Proc. R. Soc. Lond. B 269 (2002) 915-–921.

16. M.J. Sanderson, C. An´e, O. Eulenstein, D. Fern´andez-Baca, J. Kim, M.M. McMahon, and R. Piaggio-Talice, Fragmentation of large data sets in phylogenetic analyses, In: Reconstructing Evolution: New Mathematical and Computational Advances, Mike Steel and Olivier Gascuel, Eds., Oxford University Press, Oxford, (2007) pp. 199-–216.

17. M.J. Sanderson and A.C. Driskell, The challenge of constructing large phylogenetic trees, Trends Plant Sci. 8 (2003) 374-–379.

18. M.J. Sanderson, A. Purvis, and C. Henze, Phylogenetic supertrees: assembling the trees of life, Trends Ecol. Evol. 13 (3) (1998) 105–-109.

19. C. Semple and M. Steel, Phylogenetics, Oxford University Press, New York, 2003.

20. M. Steel, The complexity of reconstructing trees from qualitative characters and subtrees, J. Classication 9 (1992) 91–-116.