Face recognition from caption-based supervision

In this paper, we present methods for face recognition using a collection of images with captions. We consider two tasks: retrieving all faces of a particular person in a data set, and establishing the correct association between the names in the captions and the faces in the images. This is challenging because of the very large appearance variation in the images, as well as the potential mismatch between images and their captions.For both tasks, we compare generative and discriminative probabilistic models, as well as methods that maximize subgraph densities in similarity graphs. We extend them by considering different metric learning techniques to obtain appropriate face representations that reduce intra person variability and increase inter person separation. For the retrieval task, we also study the benefit of query expansion.To evaluate performance, we use a new fully labeled data set of 31147 faces which extends the recent Labeled Faces in the Wild data set. We present extensive experimental results which show that metric learning significantly improves the performance of all approaches on both tasks.

[1]  Ron Bekkerman,et al.  Multi-modal Clustering for Multimedia Collections , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[2]  D. Bertsekas On the Goldstein-Levitin-Polyak gradient projection method , 1974, CDC 1974.

[3]  Frédéric Jurie,et al.  Learning Visual Similarity Measures for Comparing Never Seen Objects , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[4]  Paul A. Viola,et al.  Robust Real-time Object Detection , 2001 .

[5]  Dragomir Anguelov,et al.  Contextual Identity Recognition in Personal Photo Albums , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[6]  Yaniv Taigman,et al.  Descriptor Based Methods in the Wild , 2008 .

[7]  Andrew McCallum,et al.  People-LDA: Anchoring Topics to People using Face Recognition , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[8]  Cordelia Schmid,et al.  Automatic face naming with caption-based supervision , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[9]  M. Turk,et al.  Eigenfaces for Recognition , 1991, Journal of Cognitive Neuroscience.

[10]  Andrew Zisserman,et al.  Hello! My name is... Buffy'' -- Automatic Naming of Characters in TV Video , 2006, BMVC.

[11]  Trevor Darrell,et al.  Autotagging Facebook: Social network context improves photo annotation , 2008, 2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[12]  Nicolas Pinto,et al.  How far can you get with a modern face recognition test set using only simple features? , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[13]  Pinar Duygulu Sahin,et al.  A Graph Based Approach for Naming Faces in News Photos , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[14]  Mor Naaman,et al.  Leveraging context to resolve identity in photo albums , 2005, Proceedings of the 5th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL '05).

[15]  Cordelia Schmid,et al.  Learning realistic human actions from movies , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[16]  Tal Hassner,et al.  Multiple One-Shots for Utilizing Class Label Information , 2009, BMVC.

[17]  Cordelia Schmid,et al.  Affine-invariant local descriptors and neighborhood statistics for texture recognition , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[18]  Andrew Zisserman,et al.  "Who are you?" - Learning person specific classifiers from video , 2009, CVPR.

[19]  Amir Globerson,et al.  Metric Learning by Collapsing Classes , 2005, NIPS.

[20]  Cordelia Schmid,et al.  TagProp: Discriminative metric learning in nearest neighbor models for image auto-annotation , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[21]  Pietro Perona,et al.  Learning object categories from Google's image search , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[22]  Erik G. Learned-Miller,et al.  Discriminative Training of Hyper-feature Models for Object Identification , 2006, BMVC.

[23]  Thomas Mensink,et al.  Improving People Search Using Query Expansions , 2008, ECCV.

[24]  Deva Ramanan,et al.  Local distance functions: A taxonomy, new algorithms, and an evaluation , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[25]  Erik G. Learned-Miller,et al.  Learning to Locate Informative Features for Visual Identification , 2008, International Journal of Computer Vision.

[26]  Pinar Duygulu Sahin,et al.  Interesting faces: A graph-based approach for finding people in news , 2010, Pattern Recognit..

[27]  Bill Triggs,et al.  Region Classification with Markov Field Aspect Models , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[28]  Thomas H. Cormen,et al.  Introduction to algorithms [2nd ed.] , 2001 .

[29]  Fei-Fei Li,et al.  OPTIMOL: Automatic Online Picture Collection via Incremental Model Learning , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[30]  Geoffrey E. Hinton,et al.  A View of the Em Algorithm that Justifies Incremental, Sparse, and other Variants , 1998, Learning in Graphical Models.

[31]  Samy Bengio,et al.  A Discriminative Approach for the Retrieval of Images from Text Queries , 2006, ECML.

[32]  Marie-Francine Moens,et al.  Linking names and faces: seeing the problem in different ways , 2008, ECCV 2008.

[33]  Marcel Worring,et al.  Content-Based Image Retrieval at the End of the Early Years , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[34]  Michael I. Jordan Learning in Graphical Models , 1999, NATO ASI Series.

[35]  Rohini K. Srihari,et al.  Piction: A System That Uses Captions to Label Human Faces in Newspaper Photographs , 1991, AAAI.

[36]  Michael I. Jordan,et al.  Distance Metric Learning with Application to Clustering with Side-Information , 2002, NIPS.

[37]  Ronald L. Rivest,et al.  Introduction to Algorithms , 1990 .

[38]  Erik G. Learned-Miller,et al.  Unsupervised Joint Alignment of Complex Images , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[39]  Clifford Stein,et al.  Introduction to Algorithms, 2nd edition. , 2001 .

[40]  Pietro Perona,et al.  One-shot learning of object categories , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[41]  Michael Isard,et al.  Total Recall: Automatic Query Expansion with a Generative Feature Model for Object Retrieval , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[42]  Claire Cardie,et al.  Constrained K-means Clustering with Background Knowledge , 2001, ICML.

[43]  Marwan Mattar,et al.  Labeled Faces in the Wild: A Database forStudying Face Recognition in Unconstrained Environments , 2008 .

[44]  Yuandong Tian,et al.  A Face Annotation Framework with Partial Clustering and Interactive Labeling , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[45]  Cordelia Schmid,et al.  Is that you? Metric learning approaches for face identification , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[46]  Kilian Q. Weinberger,et al.  Distance Metric Learning for Large Margin Nearest Neighbor Classification , 2005, NIPS.

[47]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[48]  Paul A. Viola,et al.  Robust Real-Time Face Detection , 2001, International Journal of Computer Vision.

[49]  Lawrence Carin,et al.  Sparse multinomial logistic regression: fast algorithms and generalization bounds , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[50]  James Allan,et al.  Automatic Query Expansion Using SMART: TREC 3 , 1994, TREC.

[51]  David A. Forsyth,et al.  Matching Words and Pictures , 2003, J. Mach. Learn. Res..

[52]  Takeo Kanade,et al.  Name-It: Naming and Detecting Faces in News Videos , 1999, IEEE Multim..

[53]  Cordelia Schmid,et al.  Multiple Instance Metric Learning from Automatically Labeled Bags of Faces , 2010, ECCV.

[54]  Tomer Hertz,et al.  Learning a Mahalanobis Metric from Equivalence Constraints , 2005, J. Mach. Learn. Res..

[55]  Shree K. Nayar,et al.  Attribute and simile classifiers for face verification , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[56]  Pietro Perona,et al.  Unsupervised clustering for google searches of celebrity images , 2008, 2008 8th IEEE International Conference on Automatic Face & Gesture Recognition.

[57]  Yee Whye Teh,et al.  Names and faces in the news , 2004, CVPR 2004.

[58]  Yuxiao Hu,et al.  Efficient propagation for face annotation in family albums , 2004, MULTIMEDIA '04.

[59]  Thomas Mensink,et al.  Improving People Search Using Query Expansions , 2008, ECCV.

[60]  Moses Charikar,et al.  Greedy approximation algorithms for finding dense components in a graph , 2000, APPROX.

[61]  Ronald L. Rivest,et al.  Introduction to Algorithms, Second Edition , 2001 .

[62]  David J. Kriegman,et al.  From Few to Many: Illumination Cone Models for Face Recognition under Variable Lighting and Pose , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[63]  Thomas B. Moeslund,et al.  Proceedings of the International Conference on Computer Vision Theory and Applications , 2012 .

[64]  Pavel Brazdil,et al.  Proceedings of the European Conference on Machine Learning , 1993 .

[65]  Inderjit S. Dhillon,et al.  Information-theoretic metric learning , 2006, ICML '07.

[66]  Yann LeCun,et al.  Learning a similarity metric discriminatively, with application to face verification , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[67]  David A. Forsyth,et al.  Animals on the Web , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[68]  Marie-Francine Moens,et al.  Efficient Hierarchical Entity Classifier Using Conditional Random Fields , 2006, OntologyLearning@COLING/ACL.