Convolutional neural networks for phoneme recognition - Télécom Paris Accéder directement au contenu
Communication Dans Un Congrès Année : 2018

Convolutional neural networks for phoneme recognition

Résumé

This paper presents a novel application of convolutional neural networks to phoneme recognition. The phonetic transcription of the TIMIT speech corpus is used to label spectrogram segments for training the convolutional neural network. A window of a fixed size slides over the spectrogram of the TIMIT utterances and the resulting spectrogram patches are assigned to the appropriate phone class by parsing TIMIT’s phone transcription. The convolutional neural network is the standard GoogLeNet implementation trained with stochastic gradient descent with mini batches. After training, phonetic rescoring is performed in the usual way to map the TIMIT phone set to the smaller standard set. Benchmark results are presented for comparison to other state-of-the-art approaches. Finally, conclusions and future directions with regard to extending the approach are discussed.

Mots clés

Dates et versions

hal-02287732 , version 1 (13-09-2019)

Identifiants

Citer

Cornelius Glackin, Julie Wall, Gérard Chollet, Nazim Dugan, Nigel Cannings. Convolutional neural networks for phoneme recognition. 7th International Conference on Pattern Recognition Applications and Methods, Jan 2018, Funchal, Portugal. pp.190-195, ⟨10.5220/0006653001900195⟩. ⟨hal-02287732⟩
31 Consultations
0 Téléchargements

Altmetric

Partager

Gmail Facebook X LinkedIn More