Skip to Main content Skip to Navigation
Conference papers

Cauchy Multichannel Speech Enhancement with a Deep Speech Prior

Mathieu Fontaine 1, 2 Aditya Arie Nugraha 3 Roland Badeau 4, 5 Kazuyoshi yoshii 3, 6 Antoine Liutkus 7 
1 MULTISPEECH - Speech Modeling for Facilitating Oral-Based Communication
Inria Nancy - Grand Est, LORIA - NLPKD - Department of Natural Language Processing & Knowledge Discovery
4 S2A - Signal, Statistique et Apprentissage
LTCI - Laboratoire Traitement et Communication de l'Information
7 ZENITH - Scientific Data Management
LIRMM - Laboratoire d'Informatique de Robotique et de Microélectronique de Montpellier, CRISAM - Inria Sophia Antipolis - Méditerranée
Abstract : We propose a semi-supervised multichannel speech enhancement system based on a probabilistic model which assumes that both speech and noise follow the heavy-tailed multi-variate complex Cauchy distribution. As we advocate, this allows handling strong and adverse noisy conditions. Consequently, the model is parameterized by the source magnitude spectrograms and the source spatial scatter matrices. To deal with the non-additivity of scatter matrices, our first contribution is to perform the enhancement on a projected space. Then, our second contribution is to combine a latent variable model for speech, which is trained by following the variational autoencoder framework, with a low-rank model for the noise source. At test time, an iterative inference algorithm is applied, which produces estimated parameters to use for separation. The speech latent variables are estimated first from the noisy speech and then updated by a gradient descent method, while a majorization-equalization strategy is used to update both the noise and the spatial parameters of both sources. Our experimental results show that the Cauchy model outperforms the state-of-art methods. The standard deviation scores also reveal that the proposed method is more robust against non-stationary noise.
Document type :
Conference papers
Complete list of metadata

Cited literature [38 references]  Display  Hide  Download
Contributor : TelecomParis HAL Connect in order to contact the contributor
Submitted on : Wednesday, October 16, 2019 - 3:54:18 PM
Last modification on : Monday, May 2, 2022 - 8:50:02 AM
Long-term archiving on: : Friday, January 17, 2020 - 4:25:31 PM


Files produced by the author(s)



Mathieu Fontaine, Aditya Arie Nugraha, Roland Badeau, Kazuyoshi yoshii, Antoine Liutkus. Cauchy Multichannel Speech Enhancement with a Deep Speech Prior. EUSIPCO 2019 - 27th European Signal Processing Conference, Sep 2019, Coruña, Spain. ⟨10.23919/EUSIPCO.2019.8903091⟩. ⟨hal-02288063⟩



Record views


Files downloads