Skip to Main content Skip to Navigation
Conference papers

Prosodic Boundary Prediction Model for Vietnamese Text-To-Speech

Nguyen Thi Thu Trang Nguyen Hoang Ky Albert Rilliard 1 Christophe d'Alessandro 2 
1 TLP - Traitement du Langage Parlé
LISN - Laboratoire Interdisciplinaire des Sciences du Numérique, STL - Sciences et Technologies des Langues
2 IJLRDA-LAM - Lutheries - Acoustique - Musique
DALEMBERT - Institut Jean le Rond d'Alembert
Abstract : This research aims to build a prosodic boundary prediction model for improving the naturalness of Vietnamese speech synthesis. This model can be used directly to predict prosodic boundaries in the synthesis phase of the statistical parametric or end-to-end speech systems. Beside conventional features related to Part-Of-Speech (POS), this paper proposes two efficient features to predict prosodic boundaries: syntactic blocks and syntactic links, based on a thorough analysis of a Vietnamese dataset. Syntactic blocks are syntactic phrases whose sizes are bounded in their constituent syntactic tree. A syntactic link of two adjacent words is calculated based on the distance between them in the syntax tree. The experimental results show that the two proposed predictors improve the quality of the boundary prediction model using a decision tree classification algorithm, about 36.4% (F1 score) higher than the model with only POS features. The final boundary prediction model with POS, syntactic block, and syntactic link features using the LightGBM algorithm gives the best F1-score results at 87.0% in test data. The proposed model helps the TTS systems, developed by either HMM-based, DNN-based, or End-to-end speech synthesis techniques, improve about 0.3 MOS points (i.e. 6 to 10%) compared to the ones without the proposed model.
Complete list of metadata
Contributor : Christophe d'Alessandro Connect in order to contact the contributor
Submitted on : Monday, August 30, 2021 - 4:01:55 PM
Last modification on : Wednesday, March 16, 2022 - 3:43:13 AM


Publisher files allowed on an open archive



Nguyen Thi Thu Trang, Nguyen Hoang Ky, Albert Rilliard, Christophe d'Alessandro. Prosodic Boundary Prediction Model for Vietnamese Text-To-Speech. Interspeech 2021, Aug 2021, Brno, Czech Republic. pp.3885-3889, ⟨10.21437/interspeech.2021-125⟩. ⟨hal-03329116⟩



Record views


Files downloads