Service interruption on Monday 11 July from 12:30 to 13:00: all the sites of the CCSD (HAL, Epiciences, SciencesConf, AureHAL) will be inaccessible (network hardware connection).
Skip to Main content Skip to Navigation
Conference papers

Learning Multi-Pitch Estimation From Weakly Aligned Score-Audio Pairs Using a Multi-Label CTC Loss

Christof Weiss 1, 2, 3, * Geoffroy Peeters 1, 2, 3 
* Corresponding author
3 S2A - Signal, Statistique et Apprentissage
LTCI - Laboratoire Traitement et Communication de l'Information
Abstract : Detecting the simultaneous activity of pitches in music audio recordings is a central task within music processing, commonly known as multi-pitch estimation or frame-wise polyphonic music transcription. Deep-learning approaches recently achieved major improvements for this task, but the lack of annotated, large-size datasets beyond the piano solo scenario is still a limitation for fully exploiting their potential. In this paper, we propose a strategy for training a CNN-based multi-pitch estimator on weakly aligned score--audio pairs of pieces in different instrumentations. To this end, we make use of a multi-label variant of the connectionist temporal classification loss (MCTC), recently proposed for image recognition tasks. We re-formalize the MCTC loss to be applicable for multi-pitch estimation and perform several systematic experiments to analyze its behavior and robustness to training conditions. Finally, we report on multi-pitch estimation results for common datasets using weakly aligned training with MCTC, which performs similar than systems trained on strongly aligned scores.
Complete list of metadata
Contributor : Christof Weiss Connect in order to contact the contributor
Submitted on : Monday, September 20, 2021 - 4:19:04 PM
Last modification on : Monday, November 29, 2021 - 5:32:28 PM


  • HAL Id : hal-03349673, version 1



Christof Weiss, Geoffroy Peeters. Learning Multi-Pitch Estimation From Weakly Aligned Score-Audio Pairs Using a Multi-Label CTC Loss. IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), Oct 2021, Mohonk Mountain House, New Paltz, NY, United States. ⟨hal-03349673⟩



Record views