Discrete Mathematics in Bioinformatics

Bailin Hao
1T-Life Research Center, Fudan University, Shanghai 200433
2Institute of Theoretical Physics, Academia Sinica, Beijing 100080


Abstract    Full Text PDF

Statistical methods are indispensible in studying biological sequences. However, statistical approaches alone are not powerful enough to amplify the difference between DNA or protein sequences and their randomized counterparts. There is an urgent need for more "deterministic" analysis of biological data. Various methods from discrete mathematics may be of great help, especially, when one widens interest to consider biology-inspired problems as well. We will describe three case studies. A problem inferred from counting avoided short strings in bacterial complete genomes may be solved by using the Goulden-Jackson cluster method in combinatorics. An equivalent problem of true and redundant avoided strings may be solved by making use of formal language theory. A problem related to the uniqueness of decomposition and reconstruction of protein sequences has a natural connection with the problem of Eulerian loops in a graph. All these problems come from our own study of real biological data in recent years.