A Word Count Statistic in Computational Biology
Michael S. Waterman
University of Southern California
835 W. 37th Street
Los Angeles, CA 90089-1340
USA
Abstract Full Text PDF
Sequence comparison and database searching are among of the most frequent
and useful activities in computational biology and bioinformatics. The goal
is to discover relationships between sequences and thus to suggest
biological features previously unknown. As the sizes of biological sequence
databases grow, more efficient comparison methods are required to carry out
the large number of comparisons. The statistic consdered in this talk is
based on the number of k-words common to two random sequences. Estimates of
significance use both Poisson and normal approximations to the distribution
of the random variables.