<%@ Page Language="C#" MasterPageFile="~/Main.master" AutoEventWireup="true" Title="Volume8 Issue4" %>
Constructing Hierarchical Set Systems
Claudine Devauchelle1, AndreasW.M. Dress2, Alexander Grossmann1, Stefan Grünewald3 and Alain Henaut1
1Laboratoire Génome et Informatique, Tour Evry2, 523 Place des Terrasses, 91034 Evry Cedex France
devauchelle@genopole.cnrs.fr, alain.henaut@wanadoo.fr
2 Max Planck Institute for Mathematics in the Sciences, Inselstrasse 22-26, 04103 Leipzig Germany
3 Linnaeus Centre for Bioinformatics, BMC, Box 598, SE 75124 Uppsala, Sweden
Annals of Combinatorics 8 (4) p.441-456 December, 2004
AMS Subject Classification: 05C90, 92B10, 92D15, 62G30, 62-07
In this note, it is shown that by applying ranking procedures to data that allow, for any three objects a1, a2, b in a collection X of objects of interest, to make consistent decisions about which of the two objects a1 or a2 is more similar to b, a family of cluster systems A(k)(k = 0, 1, . . .) can be constructed that start with the associated Apresjan Hierarchy and keep growing until, for k = #X - 1, the full set P(X) of all subsets of X is reached. Various ideas regarding canonical modifications of the similarity values so that these cluster systems contain as many clusters as possible for small values of k (and in particular for k := 0 ) and/or are rooted at a specific element in X, possible applications, e.g. concerning (i) the comparison of distinct dissimilarity data defined on the same set X or (ii) diversity optimization, and new tasks arising in ranking statistics are also discussed.
Keywords: set systems, hierarchies, Apresjan hierarchies, similarity, dissimilarity, Farris transform, ranking procedures, ranking statistics, diversity optimization


1. J.D. Apresjan, Algoritmy Postroenija Klassov po Matritse rasstojaiij (Algorithms for Building Classes on the Matrix of Distances), Masinnyj Perevod i Prikladnaja Lingvistika, 9 (1966) 3--18.

2. J.D. Apresjan, Experimental'noe issledovanie semantiki russkogo glagola (Experimental investigation of the Semantics of the Russian Verb), Moscow, Nauka, 1967.

3. S. Böcker and A. Dress, Maximal hierarchies, Adv. Math. 151 (2000) 270--282.

4. D. Bryant and V. Berry, A family of clustering and tree construction methods, Adv. Math. 27 (2001) 705--732.

5. P. Buneman, The recovery of trees from measures of dissimilarity, In: Mathematics in the Archaeological and Historical Sciences, F. Hodson et al. Eds, Edinburgh University Press, 1971, pp. 387--395.

6. C. Darwin, On The Origin of Species by Means of Natural Selection, or The Preservation of Favoured Races in the Struggle for Life, London, John Murray, Albemarle Street, 1859.

7. P. Diaconis and R.L. Graham, Spearman's footrule as a measure of disarray, J. Roy. Statist. Soc. Ser. B 39 (1977).

8. A. Dress, B. Holland, K.T. Huber, J.H. Koolen, V. Moulton, and J.W. Menkhoff, D additive and D ultra-additive maps, Gromov's trees, and the Farris transform, Discrete Appl. Math., to appear.

9. A. Dress, K.T. Huber, and V. Moulton, The Farris transform in phylogenetic analysis, in preparation.

10. K. Florek, J. Lukaszewicz, J. Perkal, H. Steinhaus, and S. Zubrzycki, Sur la liaison et la division des points d'un ensemble fini, Colloq. Math. 2 (1951) 282--285.

11. C. Devauchelle, A. Dress, S. Grünewald, A. Grossmann, A. Henaut, S.M. Zadegh, M. Monnerot, and J.L. Risler, A rank method for classification, in preparation.

12. C. Devauchelle, A. Dress, S. Grünewald, A. Grossmann, A. Henaut, S.M. Zadegh, and M. Monnerot, Testing rank-based classification methods on protein dissimilarity data, in preparation.

13. A. Guénoche, Order distances in tree reconstruction, DIMACS Ser. Discrete Math. Theoret. Comput. Sci. 37 (1997) 171--182.

14. E. Haeckel, Generelle Morphologie der Organismen, Berlin, G. Reimer, 1866.

15. M.G. Kendall, Rank correlation methods, 4th, Ed., Theory and Applications of Rank Orderstatistics, London, Charles Griffin & Co. Ltd. VIII, 1970.

16. J.B. Kruskal, On the shortest spanning subtree of a graph and the traveling salesman problem, Proc. Amer. Math. Soc. 7 (1956) 48--50.

17. B.G. Mirkin and I. Muchnik, Clustering and multidimensional scaling in Russia (1960-- 1990): a review, In: Clustering and Classification, P. Arabie, L.J. Hubert, and G. de Soete, Eds., World Scientific, London, 1996, pp. 295--339.

18. A.F.P. Rhodes and R.M. Needham, A reduction method for non-arithmetic data, and its application to thesauric translation, Information Processing (Proceedings of an International UNESCO Conference on Information Processing), Paris, 1960, pp. 321--325.

19. JQian et al, Extremal combinatorics of k-hierarchies, in preparation.