Hierarchical loss for classification

Failing to distinguish between a sheepdog and a skyscraper should be worse and penalized more than failing to distinguish between a sheepdog and a poodle; after all, sheepdogs and poodles are both breeds of dogs. However, existing metrics of failure (so-called "loss" or "win") used in textual or visual classification/recognition via neural networks seldom view a sheepdog as more similar to a poodle than to a skyscraper. We define a metric that, inter alia, can penalize failure to distinguish between a sheepdog and a skyscraper more than failure to distinguish between a sheepdog and a poodle. Unlike previously employed possibilities, this metric is based on an ultrametric tree associated with any given tree organization into a semantically meaningful hierarchy of a classifier's classes.

[1]  Ali Farhadi,et al.  YOLO9000: Better, Faster, Stronger , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Georgios Paliouras,et al.  Evaluation measures for hierarchical classification: a unified view and novel approaches , 2013, Data Mining and Knowledge Discovery.

[3]  Fei-Fei Li,et al.  Hierarchical semantic indexing for large scale image retrieval , 2011, CVPR 2011.

[4]  Ke Wang,et al.  Building Hierarchical Classifiers Using Class Proximity , 1999, VLDB.

[5]  Motoaki Kawanabe,et al.  Efficient Classification of Images with Taxonomies , 2009, ACCV.

[6]  Georgios Paliouras,et al.  LSHTC: A Benchmark for Large-Scale Text Classification , 2015, ArXiv.

[7]  Alex A. Freitas,et al.  A survey of hierarchical classification across different application domains , 2010, Data Mining and Knowledge Discovery.

[8]  Tomas Mikolov,et al.  Bag of Tricks for Efficient Text Classification , 2016, EACL.

[9]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[10]  Yiming Yang,et al.  RCV1: A New Benchmark Collection for Text Categorization Research , 2004, J. Mach. Learn. Res..

[11]  Thomas Hofmann,et al.  Hierarchical document categorization with support vector machines , 2004, CIKM '04.

[12]  Alex A. Freitas,et al.  A review of performance evaluation measures for hierarchical classifiers , 2007 .

[13]  Georgios Paliouras,et al.  Probabilistic Cascading for Large Scale Hierarchical Classification , 2015, ArXiv.

[14]  Jens Lehmann,et al.  DBpedia - A large-scale, multilingual knowledge base extracted from Wikipedia , 2015, Semantic Web.

[15]  Kyoung Mu Lee,et al.  Large margin learning of hierarchical semantic similarity for image classification , 2015, Comput. Vis. Image Underst..

[16]  Xiang Zhang,et al.  Character-level Convolutional Networks for Text Classification , 2015, NIPS.

[17]  Fei-Fei Li,et al.  What Does Classifying More Than 10, 000 Image Categories Tell Us? , 2010, ECCV.