Supervised multidimensional scaling for visualization, classification, and bipartite ranking

Least squares multidimensional scaling (MDS) is a classical method for representing a nxn dissimilarity matrix D. One seeks a set of configuration points z"1,...,z"n@?R^S such that D is well approximated by the Euclidean distances between the configuration points: D"i"j~@?z"i-z"j@?"2. Suppose that in addition to D, a vector of associated binary class labels y@?{1,2}^n corresponding to the n observations is available. We propose an extension to MDS that incorporates this outcome vector. Our proposal, supervised multidimensional scaling (SMDS), seeks a set of configuration points z"1,...,z"n@?R^S such that D"i"j~@?z"i-z"j@?"2, and such that z"i"s>z"j"s for s=1,...,S tends to occur when y"i>y"j. This results in a new way to visualize the observations. In addition, we show that SMDS leads to a method for the classification of test observations, which can also be interpreted as a solution to the bipartite ranking problem. This method is explored in a simulation study, as well as on a prostate cancer gene expression data set and on a handwritten digits data set.

[1]  Yang Jing L1 Regularization Path Algorithm for Generalized Linear Models , 2008 .

[2]  P. Groenen,et al.  Modern multidimensional scaling , 1996 .

[3]  Deborah F. Swayne,et al.  Data Visualization With Multidimensional Scaling , 2008 .

[4]  Trevor F. Cox,et al.  Discriminant analysis using non-metric multidimensional scaling , 1993, Pattern Recognit..

[5]  Mee Young Park,et al.  L 1-regularization path algorithm for generalized linear models , 2006 .

[6]  R. Shepard The analysis of proximities: Multidimensional scaling with an unknown distance function. I. , 1962 .

[7]  T. Hastie,et al.  Metrics and Models for Handwritten Character Recognition , 1998 .

[8]  Koby Crammer,et al.  Pranking with Ranking , 2001, NIPS.

[9]  Trevor Hastie,et al.  The Elements of Statistical Learning , 2001 .

[10]  Paul E. Green,et al.  Multidimensional Scaling: Concepts and Applications , 1989 .

[11]  Yoram Singer,et al.  An Efficient Boosting Algorithm for Combining Preferences by , 2013 .

[12]  Yann LeCun,et al.  Efficient Pattern Recognition Using a New Transformation Distance , 1992, NIPS.

[13]  Robert Tibshirani,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition , 2001, Springer Series in Statistics.

[14]  R. Shepard The analysis of proximities: Multidimensional scaling with an unknown distance function. II , 1962 .

[15]  Shivani Agarwal,et al.  Generalization Bounds for Ranking Algorithms via Algorithmic Stability , 2009, J. Mach. Learn. Res..

[16]  Yoram Singer,et al.  Learning to Order Things , 1997, NIPS.

[17]  J. Kruskal Multidimensional scaling by optimizing goodness of fit to a nonmetric hypothesis , 1964 .

[18]  Dan Roth,et al.  Generalization Bounds for the Area Under the ROC Curve , 2005, J. Mach. Learn. Res..

[19]  E. Lander,et al.  Gene expression correlates of clinical prostate cancer behavior. , 2002, Cancer cell.

[20]  Ryan M. Rifkin,et al.  In Defense of One-Vs-All Classification , 2004, J. Mach. Learn. Res..