Annals of Combinatorics 3 (1999) 103-113
Inverting Random Functions
Michael A. Steel and László A. Székely
Biomathematics Research Centre, University of Canterbury,
Private Bag 4800, Christchurch, New Zeal
Department of Mathematics, University of South Carolina, Columbia, SC 29208, USA
Received September 23, 1998
AMS Subject Classification: 92D15, 62C20, 90C46, 90C47
Abstract. In this paper we study how to invert random functions under different criteria. The motivation for this study is phylogeny reconstruction, since the evolution of biomolecular sequences may be considered as a random function from the set of possible phylogenetic trees to the set of collections of biomolecular sequences of observed species. Our results may affect how we think about maximum likelihood estimation (MLE) in phylogeny. For inverting random functions, MLE is optimal under a first criterion, although it is not optimal under a second criterion which is at least equally natural but more conservative. Furthermore, MLE has to be used differently from the way it has been used in the phylogeny literature, if we have a prior distribution on trees and mutation mechanisms and want to keep MLE optimal under the same first criterion. Some of the results of this paper have been known in the setting of statistical decision theory, but have never been discussed in the context of phylogeny.
Keywords: random function, maximum likelihood estimation, minimax problem, phylogeny reconstruction
1. D.J. Aldous, Probability distributions on cladograms, In: Discrete Random Structures, D.J. Aldous and R. Permantle, Eds., IMA Vol. in Mathematics and its Applications, Vol. 76, Springer-Verlag, 1995, pp. 118.
2. A. Ambainis, R. Desper, M. Farach, and S. Kannan, Nearly tight bounds on the learnability of evolution, Proc. of the 38th IEEE Conference on the Foundations of Computer Science (FOCS97), 1997, pp. 524533.
3. J.O. Berger, Statistical Decision Theory and Bayesian Analysis, 2nd Ed., Springer Series in Statistics, Springer-Verlag, 1985.
4. J.K.M. Brown, Probabilities of evolutionary trees, Syst. Biol. 43 (1994) 7891.
5. J.A. Cavender, Taxonomy with confidence, Math. Biosci. 40 (1978) 270280.
6. M. Cryan, L.A. Goldberg, and P.W. Goldberg, Evolutionary trees can be learned in polynomial time in the two-state general Markov model, Proc. 39th Foundations of Computer Science (FOCS98), 1998, pp. 436445.
7. R. Durbin, S. Eddy, A. Krogh, and G. Mitchison, Biological sequence analysis: Probabilistic models of proteins and nucleic acids, Cambridge University Press, Cambridge, 1998.
8. P.L. Erdös, M.A. Steel, L.A. Székely, and T.Warnow, A few logs suffice to build (almost) all trees I, Random Structures and Algorithms 14 (1999) 153184.
9. P.L. Erdös, M.A. Steel, L.A. Székely, and T.Warnow, A few logs suffice to build (almost) all trees II, to appear in Theoretical Computer Science.
10. M. Farach and S. Kannan, Efficient algorithms for inverting evolution, Proceedings of the ACM Symposium on the Foundations of Computer Science, 1996, pp. 230236.
11. J. Felsenstein, Cases in which parsimony or compatibility methods will be positively misleading, Syst. Zool. 27 (1978) 401410.
12. J. Felsenstein, Evolutionary trees from DNA sequences: A maximum likelihood approach, J. Mol. Evol. 17 (1981) 368376.
13. E.F. Harding, The probabilities of rooted tree shapes generated by random bifurcation, Adv. Appl. Probab. 3 (1971) 4477.
14. M. Kimura, Estimation of evolutionary distances between homologous nucleotide sequences, Proc. Nat. Acad. Sci. USA 78 (1981) 454458.
15. M. Marcus and H. Minc, A Survey of Matrix Theory and Matrix Inequalities, Dover Publications Inc., New York, 1992.
16. J. Neyman, Molecular studies of evolution: A source of novel statistical problems, In: Statistical Decision Theory and Related Topics, S.S. Gupta and J. Yackel, Eds., Academic Press, New York, 1971, pp. 127.
17. A. Schrijver, Theory of Linear and Integer Programming, Wiley-Interscience Series in Discrete Mathematics, John Wiley & Sons Ltd., Chichester, 1986.
18. M.E. Siddall, Success of parsimony in the four-taxon case: Long-branch repulsion by likelihood in the Farris Zone, Cladistics 14 (1998) 209220.
19. M. Steel, M.D. Hendy, and D. Penny, Reconstructing trees from nucleotide pattern probabilities: A survey and some new results, Discrete Appl. Math. 88 (1998) 367396.
20. D.L. Swofford, G.J. Olsen, P.J. Waddell, and D.M. Hillis, Chapter 11: Phylogenetic inference, In: Molecular Systematics, D.M. Hillis, C. Moritz, and B.K. Mable, Eds., 2nd Ed., Sinauer Associates, Inc., Sunderland, 1996, pp. 407514.
21. Z. Yang and B. Rannala, A Markov Chain Monte Carlo Method, Mol. Biol. Evol. 14 (1997) 714724.