Learning Multi-Pitch Estimation From Weakly Aligned Score-Audio Pairs Using a Multi-Label CTC Loss

Christof Weiss; Geoffroy Peeters

Communication Dans Un Congrès Année : 2021

Learning Multi-Pitch Estimation From Weakly Aligned Score-Audio Pairs Using a Multi-Label CTC Loss

(1, 2, 3) , (1, 2, 3)

1
2
3

Christof Weiss

Fonction : Auteur correspondant
PersonId : 748463
IdHAL : christof-weiss
ORCID : 0000-0003-2143-4679

Connectez-vous pour contacter l'auteur

Institut Polytechnique de Paris

Département Images, Données, Signal

Signal, Statistique et Apprentissage

Geoffroy Peeters

Fonction : Auteur
PersonId : 6738
IdHAL : geoffroy-peeters
ORCID : 0000-0001-5255-3019
IdRef : 187470472

Institut Polytechnique de Paris

Département Images, Données, Signal

Signal, Statistique et Apprentissage

Résumé

Detecting the simultaneous activity of pitches in music audio recordings is a central task within music processing, commonly known as multi-pitch estimation or frame-wise polyphonic music transcription. Deep-learning approaches recently achieved major improvements for this task, but the lack of annotated, large-size datasets beyond the piano solo scenario is still a limitation for fully exploiting their potential. In this paper, we propose a strategy for training a CNN-based multi-pitch estimator on weakly aligned score--audio pairs of pieces in different instrumentations. To this end, we make use of a multi-label variant of the connectionist temporal classification loss (MCTC), recently proposed for image recognition tasks. We re-formalize the MCTC loss to be applicable for multi-pitch estimation and perform several systematic experiments to analyze its behavior and robustness to training conditions. Finally, we report on multi-pitch estimation results for common datasets using weakly aligned training with MCTC, which performs similar than systems trained on strongly aligned scores.

Mots clés

Music processing convolutional neural networks CTC multi-pitch estimation music transcription

Domaines

Acoustique [physics.class-ph] Traitement du signal et de l'image [eess.SP] Musique, musicologie et arts de la scène

Christof Weiss : Connectez-vous pour contacter le contributeur

https://telecom-paris.hal.science/hal-03349673

Soumis le : lundi 20 septembre 2021-16:19:04

Dernière modification le : lundi 9 octobre 2023-12:49:43

Dates et versions

hal-03349673 , version 1 (20-09-2021)

Identifiants

HAL Id : hal-03349673 , version 1

Citer

Christof Weiss, Geoffroy Peeters. Learning Multi-Pitch Estimation From Weakly Aligned Score-Audio Pairs Using a Multi-Label CTC Loss. IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), Oct 2021, Mohonk Mountain House, New Paltz, NY, United States. ⟨hal-03349673⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

INSTITUT-TELECOM LTCI IDS S2A IP_PARIS

136 Consultations

0 Téléchargements

Learning Multi-Pitch Estimation From Weakly Aligned Score-Audio Pairs Using a Multi-Label CTC Loss

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager