Comparing Representations for Audio Synthesis Using Generative Adversarial Networks

Gaël Richard; Javier Nistal; Stefan Plattner

doi:10.23919/Eusipco47968.2020.9287799

Communication Dans Un Congrès Année : 2021

Comparing Representations for Audio Synthesis Using Generative Adversarial Networks

(1, 2) , (3, 1, 2) , (3)

1
2
3

Gaël Richard

Fonction : Auteur
PersonId : 14146
IdHAL : gael-richard
IdRef : 094977208

Signal, Statistique et Apprentissage

Département Images, Données, Signal

Javier Nistal

Fonction : Auteur
PersonId : 741885
IdHAL : javier-nistal

Sony Computer Science Laboratory Paris

Signal, Statistique et Apprentissage

Département Images, Données, Signal

Stefan Plattner

Fonction : Auteur
PersonId : 1298446
ORCID : 0000-0001-5185-5437

Sony Computer Science Laboratory Paris

Résumé

—In this paper, we compare different audio signal representations, including the raw audio waveform and a variety of time-frequency representations, for the task of audio synthesis with Generative Adversarial Networks (GANs). We conduct the experiments on a subset of the NSynth dataset. The architecture follows the benchmark Progressive Growing Wasserstein GAN. We perform experiments both in a fully non-conditional manner as well as conditioning the network on the pitch information. We quantitatively evaluate the generated material utilizing standard metrics for assessing generative models, and compare training and sampling times. We show that complex-valued as well as the magnitude and Instantaneous Frequency of the ShortTime Fourier Transform achieve the best results, and yield fast generation and inversion times. The code for feature extraction, training and evaluating the model is available online.

Mots clés

Audio Representations Synthesis Generative Adversarial

Domaines

Sciences de l'ingénieur [physics] Traitement du signal et de l'image [eess.SP]

Fichier principal

2006.09266.pdf (275.38 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Javier Nistal : Connectez-vous pour contacter le contributeur

https://hal.science/hal-03073936

Soumis le : mercredi 16 décembre 2020-12:45:37

Dernière modification le : jeudi 15 février 2024-14:44:23

Archivage à long terme le : mercredi 17 mars 2021-19:11:06

Dates et versions

hal-03073936 , version 1 (16-12-2020)

Identifiants

HAL Id : hal-03073936 , version 1
DOI : 10.23919/Eusipco47968.2020.9287799

Citer

Gaël Richard, Javier Nistal, Stefan Plattner. Comparing Representations for Audio Synthesis Using Generative Adversarial Networks. 2020 28th European Signal Processing Conference (EUSIPCO), Jan 2021, Amsterdam (Virtual), Netherlands. pp.161-165, ⟨10.23919/Eusipco47968.2020.9287799⟩. ⟨hal-03073936⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

INSTITUT-TELECOM LTCI IDS S2A IP_PARIS

127 Consultations

345 Téléchargements

Comparing Representations for Audio Synthesis Using Generative Adversarial Networks

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager