Annals of Combinatorics 3 (1999) 81-93

The Combinatorics and Extreme Value Statistics of Protein Threading

John L. Spouge1, Aron Marchler-Bauer, and Stephen Bryant

1National Center for Biotechnology Information, National Library of Medicine, Bethesda, MD 20894, USA

Received May 26, 1998

AMS Subject Classification: 62E20, 62E25, 92C40

Abstract.In protein threading, one is given a protein sequence, together with a database of protein core structures that may contain the natural structure of the sequence. The object of protein threading is to correctly identify the structure(s) corresponding to the sequence. Since the core structures are already associated with specific biological functions, threading has the potential to provide biologists with useful insights about the function of a newly discovered protein sequence. Statistical tests for threading results based on the theory of extreme values suggest several combinatorial problems. For example, what is the number of ways m'=#t{Li>xi}_{i=0}^n of choosing a sequence {Xi}_{i=1}^n from the set {1,2,..., t}, subject to the difference constraints {Li=Xi+1-Xi>xi}_{i=0}^n, where X0=0, Xn+1=t+1, and {xi}_{i=0}^n is an arbitrary sequence of integers? The quantity m' has many attractive combinatorial interpretations and reduces in special continuous limits to a probabilistic formula discovered by de Finetti. Just as many important probabilities can be derived from de Finetti's formula, many interesting combinatorial quantities can be derived from m'. Empirical results presented here show that the combinatorial approach to threading statistics appears promising, but that structural periodicities in proteins and energetically unimportant structure elements probably introduce statistical correlations that must be better understood.

Keywords: protein threading, extreme value statistics, Poisson clumping heuristic, probabilities related to uniform distributions

Get the DVI| PS | PDF file of this abstract.