Skip to Main content Skip to Navigation
Book sections

TIMIT and NTIMIT Phone Recognition Using Convolutional Neural Networks

Cornelius Glackin Julie Wall Gérard Chollet 1, 2 Nazim Dugan Nigel Cannings 
1 MM - Multimédia
LTCI - Laboratoire Traitement et Communication de l'Information
Abstract : A novel application of convolutional neural networks to phone recognition is presented in this paper. Both the TIMIT and NTIMIT speech corpora have been employed. The phonetic transcriptions of these corpora have been used to label spectrogram segments for training the convolutional neural network. A sliding window extracted fixed sized images from the spectrograms produced for the TIMIT and NTIMIT utterances. These images were assigned to the appropriate phone class by parsing the TIMIT and NTIMIT phone transcriptions. The GoogLeNet convolutional neural network was implemented and trained using stochastic gradient descent with mini batches. Post training, phonetic rescoring was performed to map each phone set to the smaller standard set, i.e. the 61 phone set was mapped to the 39 phone set. Benchmark results of both datasets are presented for comparison to other state-of-the-art approaches. It will be shown that this convolutional neural network approach is particularly well suited to network noise and the distortion of speech data, as demonstrated by the state-of-the-art benchmark results for NTIMIT.
Complete list of metadata
Contributor : TelecomParis HAL Connect in order to contact the contributor
Submitted on : Friday, September 13, 2019 - 5:34:32 PM
Last modification on : Wednesday, November 3, 2021 - 6:18:55 AM


  • HAL Id : hal-02287997, version 1


Cornelius Glackin, Julie Wall, Gérard Chollet, Nazim Dugan, Nigel Cannings. TIMIT and NTIMIT Phone Recognition Using Convolutional Neural Networks. Pattern Recognition Applications and Methods, Springer, pp.89-100, 2019. ⟨hal-02287997⟩



Record views