Discrete Mathematics in Bioinformatics
Bailin Hao
1T-Life Research Center, Fudan University, Shanghai 200433
2Institute of Theoretical Physics, Academia Sinica, Beijing 100080
Abstract Full Text PDF
Statistical methods are indispensible in studying biological sequences.
However, statistical approaches alone are not powerful enough to amplify the
difference between DNA or protein sequences and their randomized counterparts.
There is an urgent need for more "deterministic" analysis of biological data.
Various methods from discrete mathematics may be of great help, especially,
when one widens interest to consider biology-inspired problems as well. We will
describe three case studies. A problem inferred from counting avoided short
strings in bacterial complete genomes may be solved by using the
Goulden-Jackson cluster method in combinatorics. An equivalent problem of true
and redundant avoided strings may be solved by making use of formal language
theory. A problem related to the uniqueness of decomposition and reconstruction
of protein sequences has a natural connection with the problem of Eulerian
loops in a graph. All these problems come from our own study of real biological
data in recent years.