TIMIT and NTIMIT Phone Recognition Using Convolutional Neural Networks - Archive ouverte HAL Access content directly
Book Sections Year : 2019

TIMIT and NTIMIT Phone Recognition Using Convolutional Neural Networks

Cornelius Glackin
  • Function : Author
Julie Wall
  • Function : Author
Nazim Dugan
  • Function : Author
Nigel Cannings
  • Function : Author


A novel application of convolutional neural networks to phone recognition is presented in this paper. Both the TIMIT and NTIMIT speech corpora have been employed. The phonetic transcriptions of these corpora have been used to label spectrogram segments for training the convolutional neural network. A sliding window extracted fixed sized images from the spectrograms produced for the TIMIT and NTIMIT utterances. These images were assigned to the appropriate phone class by parsing the TIMIT and NTIMIT phone transcriptions. The GoogLeNet convolutional neural network was implemented and trained using stochastic gradient descent with mini batches. Post training, phonetic rescoring was performed to map each phone set to the smaller standard set, i.e. the 61 phone set was mapped to the 39 phone set. Benchmark results of both datasets are presented for comparison to other state-of-the-art approaches. It will be shown that this convolutional neural network approach is particularly well suited to network noise and the distortion of speech data, as demonstrated by the state-of-the-art benchmark results for NTIMIT.
Not file

Dates and versions

hal-02287997 , version 1 (13-09-2019)


  • HAL Id : hal-02287997 , version 1


Cornelius Glackin, Julie Wall, Gérard Chollet, Nazim Dugan, Nigel Cannings. TIMIT and NTIMIT Phone Recognition Using Convolutional Neural Networks. Pattern Recognition Applications and Methods, Springer, pp.89-100, 2019. ⟨hal-02287997⟩
107 View
0 Download


Gmail Facebook Twitter LinkedIn More