The multivariate Watson distribution: Maximum-likelihood estimation and other aspects

This paper studies fundamental aspects of modelling data using multivariate Watson distributions. Although these distributions are natural for modelling axially symmetric data (i.e., unit vectors where +/-x are equivalent), for high-dimensions using them can be difficult-largely because for Watson distributions even basic tasks such as maximum-likelihood are numerically challenging. To tackle the numerical difficulties some approximations have been derived. But these are either grossly inaccurate in high-dimensions [K.V. Mardia, P. Jupp, Directional Statistics, second ed., John Wiley & Sons, 2000] or when reasonably accurate [A. Bijral, M. Breitenbach, G.Z. Grudic, Mixture of Watson distributions: a generative model for hyperspherical embeddings, in: Artificial Intelligence and Statistics, AISTATS 2007, 2007, pp. 35-42], they lack theoretical justification. We derive new approximations to the maximum-likelihood estimates; our approximations are theoretically well-defined, numerically accurate, and easy to compute. We build on our parameter estimation and discuss mixture-modelling with Watson distributions; here we uncover a hitherto unknown connection to the ''diametrical clustering'' algorithm of Dhillon et al. [I.S. Dhillon, E.M. Marcotte, U. Roshan, Diametrical clustering for identifying anticorrelated gene clusters, Bioinformatics 19 (13) (2003) 1612-1619].

[1]  D. Karp,et al.  Log-convexity and log-concavity of hypergeometric-like functions , 2009, 0902.3073.

[2]  Annie A. M. Cuyt,et al.  Handbook of Continued Fractions for Special Functions , 2008 .

[3]  T. MacRobert Higher Transcendental Functions , 1955, Nature.

[4]  Inderjit S. Dhillon,et al.  Generative model-based clustering of directional data , 2003, KDD '03.

[5]  Shin Ishii,et al.  Parameter estimation for von Mises–Fisher distributions , 2007, Comput. Stat..

[6]  Markus Breitenbach,et al.  Mixture of Watson Distributions: A Generative Model for Hyperspherical Embeddings , 2007, AISTATS.

[7]  I. Dhillon,et al.  Matrix nearness problems in data mining , 2007 .

[8]  Yudong D. He,et al.  Functional Discovery via a Compendium of Expression Profiles , 2000, Cell.

[9]  Constantin P. Niculescu,et al.  Convex Functions and Their Applications: A Contemporary Approach , 2005 .

[10]  A. Erdélyi,et al.  Higher Transcendental Functions , 1954 .

[11]  Michael Ruogu Zhang,et al.  Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization. , 1998, Molecular biology of the cell.

[12]  D. Botstein,et al.  The transcriptional program in the response of human fibroblasts to serum. , 1999, Science.

[13]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[14]  S. R. Jammalamadaka,et al.  Directional Statistics, I , 2011 .

[15]  G. S. Watson,et al.  Equatorial distributions on a sphere , 1965 .

[16]  Peter D. Hoff,et al.  Simulation of the Matrix Bingham–von Mises–Fisher Distribution, With Applications to Multivariate and Relational Data , 2007, 0712.4166.

[17]  Leon M. Hall,et al.  Special Functions , 1998 .

[18]  Walter Gautschi,et al.  Anomalous Convergence of a Continued Fraction for Ratios of Kummer Functions. , 1977 .

[19]  Inderjit S. Dhillon,et al.  Clustering on the Unit Hypersphere using von Mises-Fisher Distributions , 2005, J. Mach. Learn. Res..

[20]  Inderjit S. Dhillon,et al.  Diametrical clustering for identifying anti-correlated gene clusters , 2003, Bioinform..

[21]  Nico M. Temme,et al.  Numerical methods for special functions , 2007 .

[22]  Dmitrii Karp,et al.  Turán’s inequality for the Kummer function of the phase shift of two parameters , 2011 .