Annals of Combinatorics 4 (2000) 247-255

Avoided Strings in Bacterial Complete Genomes and a Related Combinatorial Problem

Bailin Hao, Huimin Xie, Zuguo Yu, and Guoyi Chen

Institute of Theoretical Physics, Academia Sinica, P. O. Box 2735, Beijing 100080, China

Received December 12, 1998

AMS Subject Classification: 05A15, 92C40

Abstract. The visualization of avoided and under-represented strings in some bacterial complete genomes raises a combinatorial problem which may be solved either by using the Goulden-Jackson cluster method or by construction of the minimal finite automaton defined by the set of forbidden words of the corresponding language.

Keywords: complete genomes, avoided strings, language, enumeration, fractal


1.  G. Deckert et al., The complete genome of the hyperthermophilic bacterium Aquifex aeolicus, Nature 392 353–358.

2.  I. Goulden and D.M. Jackson, An inversion theorem for cluster decomposition of sequences with distinguished subsequences, J. London Math. Soc. 20 (1979) 567–576.

3.  L.J. Guibas and A.M. Odlyzko, Periods in strings, J. Combin. Theory A 30 (1981) 19–42.

4.  B.-L. Hao, H.-C. Lee, and S.-Y. Zhang, Fractals related to long DNA sequences and complete genomes, Chaos, Solitons and Fractals, 11 (2000) 825–836.

5.  B.-L. Hao and W.-M. Zheng, Applied Symbolic Dynamics and Chaos, World Scientific, Singapore, 1998.

6.  J. Noonan and D. Zeilberger, The Goulden–Jackson cluster method: extensions, applications and implementations; ˜zeilberg.

7.  N.J.A. Sloane and S. Plouffe, The Encyclopedia of Integer Sequences, Academic Press, 1995;

8.  S. Wolfram, Computation theory of cellular automata, Commun. Math. Phys. 96 (1984) 15–57.

9.  Huimin Xie, Grammatical Complexity and One-Dimensional Dynamical Systems, World Scientific, Singapore, 1996.

Get the DVI | PS | PDF file of this abstract.