Comparing Representations for Audio Synthesis Using Generative Adversarial Networks - Equipe Signal, Statistique et Apprentissage Accéder directement au contenu
Communication Dans Un Congrès Année : 2021

Comparing Representations for Audio Synthesis Using Generative Adversarial Networks

Résumé

—In this paper, we compare different audio signal representations, including the raw audio waveform and a variety of time-frequency representations, for the task of audio synthesis with Generative Adversarial Networks (GANs). We conduct the experiments on a subset of the NSynth dataset. The architecture follows the benchmark Progressive Growing Wasserstein GAN. We perform experiments both in a fully non-conditional manner as well as conditioning the network on the pitch information. We quantitatively evaluate the generated material utilizing standard metrics for assessing generative models, and compare training and sampling times. We show that complex-valued as well as the magnitude and Instantaneous Frequency of the ShortTime Fourier Transform achieve the best results, and yield fast generation and inversion times. The code for feature extraction, training and evaluating the model is available online.
Fichier principal
Vignette du fichier
2006.09266.pdf (275.38 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)

Dates et versions

hal-03073936 , version 1 (16-12-2020)

Identifiants

Citer

Gaël Richard, Javier Nistal, Stefan Plattner. Comparing Representations for Audio Synthesis Using Generative Adversarial Networks. 2020 28th European Signal Processing Conference (EUSIPCO), Jan 2021, Amsterdam (Virtual), Netherlands. pp.161-165, ⟨10.23919/Eusipco47968.2020.9287799⟩. ⟨hal-03073936⟩
126 Consultations
343 Téléchargements

Altmetric

Partager

Gmail Facebook X LinkedIn More