The DeepScoresV2 Dataset and Benchmark for Music Object Detection

In this paper, we present DeepScoresV2, an extended version of the DeepScores dataset for optical music recognition (OMR). We improve upon the original DeepScores dataset by providing much more detailed annotations, namely (a) annotations for 135 classes including fundamental symbols of non-fixed size and shape, increasing the number of annotated symbols by 23%; (b) oriented bounding boxes; (c) higher-level rhythm and pitch information (onset beat for all symbols and line position for noteheads); and (d) a compatibility mode for easy use in conjunction with the MUSCIMA++ dataset for OMR on handwritten documents. These additions open up the potential for future advancement in OMR research. Additionally, we release two state-of-the-art baselines for DeepScoresV2 based on Faster R-CNN and the Deep Watershed Detector. An analysis of the baselines shows that regular orthogonal bounding boxes are unsuitable for objects which are long, small, and potentially rotated, such as ties and beams, which demonstrates the need for detection algorithms that naturally incorporate object angles. The dataset, code and pre-trained models, as well as user instructions, are publicly available at https://zenodo.org/record/4012193.

[1]  Pavel Pecina,et al.  The MUSCIMA++ Dataset for Handwritten Optical Music Recognition , 2017, 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR).

[2]  Jürgen Schmidhuber,et al.  Deep Watershed Detector for Music Object Recognition , 2018, ISMIR.

[3]  Jianguo Zhang,et al.  The PASCAL Visual Object Classes Challenge , 2006 .

[4]  Jan Hajic,et al.  Understanding Optical Music Recognition , 2019, ACM Comput. Surv..

[5]  Marcello Pelillo,et al.  DeepScores and Deep Watershed Detection: current state and open issues , 2018, ArXiv.

[6]  Robert Piéchaud,et al.  STANDARD MUSIC FONT LAYOUT (SMuFL) , 2015 .

[7]  José Oncina,et al.  Recognition of Pen-Based Music Notation: The HOMUS Dataset , 2014, 2014 22nd International Conference on Pattern Recognition.

[8]  Jorge Calvo-Zaragoza,et al.  Camera-PrIMuS: Neural End-to-End Optical Music Recognition on Realistic Monophonic Scores , 2018, ISMIR.

[9]  Alicia Fornés,et al.  CVC-MUSCIMA: a ground truth of handwritten music score images for writer identification and staff removal , 2012, International Journal on Document Analysis and Recognition (IJDAR).

[10]  Benjamin Bruno Meier,et al.  Deep Learning in the Wild , 2018, ANNPR.

[11]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[12]  Jürgen Schmidhuber,et al.  Deep learning in neural networks: An overview , 2014, Neural Networks.

[13]  Trevor Darrell,et al.  Fully Convolutional Networks for Semantic Segmentation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  Gerhard Widmer,et al.  Learning Audio-Sheet Music Correspondences for Cross-Modal Retrieval and Piece Identification , 2018, Trans. Int. Soc. Music. Inf. Retr..

[15]  Jan Hajic,et al.  A Baseline for General Music Object Detection with Deep Learning , 2018, Applied Sciences.

[16]  Jiebo Luo,et al.  DOTA: A Large-Scale Dataset for Object Detection in Aerial Images , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[17]  Dong Liu,et al.  Deep High-Resolution Representation Learning for Human Pose Estimation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Jan Hajič Optical Recognition of Handwritten Music Notation , 2019 .

[19]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[20]  Alejandro Héctor Toselli,et al.  Handwritten Music Recognition for Mensural notation with convolutional recurrent neural networks , 2019, Pattern Recognit. Lett..

[21]  Jürgen Schmidhuber,et al.  DeepScores-A Dataset for Segmentation, Detection and Classification of Tiny Objects , 2018, 2018 24th International Conference on Pattern Recognition (ICPR).

[22]  Xiang Jia,et al.  State-of-the-Art Model for Music Object Recognition with Deep Learning , 2019, Applied Sciences.

[23]  Ross B. Girshick,et al.  Fast R-CNN , 2015, 1504.08083.

[24]  Luc Van Gool,et al.  The Pascal Visual Object Classes Challenge: A Retrospective , 2014, International Journal of Computer Vision.