Deep Watershed Detector for Music Object Recognition

Optical Music Recognition (OMR) is an important and challenging area within music information retrieval, the accurate detection of music symbols in digital images is a core functionality of any OMR pipeline. In this paper, we introduce a novel object detection method, based on synthetic energy maps and the watershed transform, called Deep Watershed Detector (DWD). Our method is specifically tailored to deal with high resolution images that contain a large number of very small objects and is therefore able to process full pages of written music. We present state-of-the-art detection results of common music symbols and show DWD's ability to work with synthetic scores equally well as on handwritten music.

[1]  Timothy C. Bell,et al.  The Challenge of Optical Music Recognition , 2001, Comput. Humanit..

[2]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[3]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[4]  Pavel Pecina,et al.  Detecting Noteheads in Handwritten Scores with ConvNets and Bounding Box Regression , 2017, ArXiv.

[5]  Min Bai,et al.  Deep Watershed Transform for Instance Segmentation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Lawrence D. Jackel,et al.  Backpropagation Applied to Handwritten Zip Code Recognition , 1989, Neural Computation.

[7]  Jürgen Schmidhuber,et al.  Training Very Deep Networks , 2015, NIPS.

[8]  Kunihiko Fukushima,et al.  Neocognitron: A new algorithm for pattern recognition tolerant of deformations and shifts in position , 1982, Pattern Recognit..

[9]  Jürgen Schmidhuber,et al.  Deep learning in neural networks: An overview , 2014, Neural Networks.

[10]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  Yoshua Bengio,et al.  How transferable are features in deep neural networks? , 2014, NIPS.

[12]  Alicia Fornés,et al.  Towards the Recognition of Compound Music Notes in Handwritten Music Scores , 2016, 2016 15th International Conference on Frontiers in Handwriting Recognition (ICFHR).

[13]  Ian D. Reid,et al.  RefineNet: Multi-path Refinement Networks for High-Resolution Semantic Segmentation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[15]  Luc Van Gool,et al.  The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[16]  Serge Beucher,et al.  THE WATERSHED TRANSFORMATION APPLIED TO IMAGE SEGMENTATION , 2009 .

[17]  Jorge Calvo-Zaragoza,et al.  Staff-line removal with selectional auto-encoders , 2017, Expert Syst. Appl..

[18]  Jorge Calvo-Zaragoza,et al.  End-to-End Optical Music Recognition Using Neural Networks , 2017, ISMIR.

[19]  Bertrand Coüasnon,et al.  Bootstrapping Samples of Accidentals in Dense Piano Scores for CNN-Based Detection , 2017, 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR).

[20]  Wei Liu,et al.  SSD: Single Shot MultiBox Detector , 2015, ECCV.

[21]  Horst M. Eidenberger,et al.  Towards Self-Learning Optical Music Recognition , 2017, 2017 16th IEEE International Conference on Machine Learning and Applications (ICMLA).

[22]  Horst M. Eidenberger,et al.  Handwritten Music Object Detection: Open Issues and Baseline Results , 2018, 2018 13th IAPR International Workshop on Document Analysis Systems (DAS).

[23]  Jiebo Luo,et al.  DOTA: A Large-Scale Dataset for Object Detection in Aerial Images , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[24]  Jürgen Schmidhuber,et al.  DeepScores-A Dataset for Segmentation, Detection and Classification of Tiny Objects , 2018, 2018 24th International Conference on Pattern Recognition (ICPR).

[25]  Ali Farhadi,et al.  YOLO9000: Better, Faster, Stronger , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Pavel Pecina,et al.  The MUSCIMA++ Dataset for Handwritten Optical Music Recognition , 2017, 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR).

[27]  Isabelle Bloch,et al.  Robust and Adaptive OMR System Including Fuzzy Modeling, Fusion of Musical Rules, and Possible Error Detection , 2007, EURASIP J. Adv. Signal Process..

[28]  Kesheng Wu,et al.  Optimizing two-pass connected-component labeling algorithms , 2009, Pattern Analysis and Applications.

[29]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[31]  Jaime S. Cardoso,et al.  Optical recognition of music symbols , 2010, International Journal on Document Analysis and Recognition (IJDAR).