A. M. Kruspe, Do You Understand What Is Required in a Doctoral Dissertation or Thesis?, Writing your Doctoral Dissertation or Thesis Faster: A Proven Map to Success, pp.2-21

A. Mesaros, Singing voice identification and lyrics transcription for music information retrieval invited paper, 2013 7th Conference on Speech Technology and Human - Computer Dialogue (SpeD), 2013.

H. Fujihara, M. Goto, J. Ogata, and H. G. Okuno, LyricSynchronizer: Automatic Synchronization System Between Musical Audio Signals and Lyrics, IEEE Journal of Selected Topics in Signal Processing, vol.5, issue.6, pp.1252-1261, 2011.

A. Kruspe, Automatic B**** Detection, Proc. International Society for Music Information Retrieval Conference, pp.3-4, 2016.

C. W. Wightman and D. T. Talkin, The Aligner: Text-to-Speech Alignment Using Markov Models, Progress in Speech Synthesis, pp.313-323, 1997.

A. Haubold and J. R. Kender, Alignment of Speech to Highly Imperfect Text Transcriptions, Multimedia and Expo, 2007 IEEE International Conference on, pp.224-227, 2007.

B. Sharma, C. Gupta, H. Li, and Y. Wang, Automatic Lyrics-to-audio Alignment on Polyphonic Music Using Singing-adapted Acoustic Models, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp.396-400, 2019.

. &na;, ASPRSN?S 21st National Convention October 8?12, 1995 Radisson Hotel, Montreal, Canada, Plastic Surgical Nursing, vol.15, issue.4, pp.220-221, 1995.

D. Stoller, S. Durand, and S. Ewert, End-to-end Lyrics Alignment for Polyphonic Music Using an Audio-to-character Recognition Model, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2019.

C. Gupta, E. Yilmaz, and H. Li, Automatic Lyrics Alignment and Transcription in Polyphonic Music: Does Background Music Help?, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2020.

E. E. Davies and A. Bentahila, Translation and Code Switching in the Lyrics of Bilingual Popular Songs, The Translator, vol.14, issue.2, pp.247-272, 2008.

S. Watanabe, T. Hori, and J. R. Hershey, Language independent end-to-end architecture for joint language identification and speech recognition, 2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), 2017.

J. Cho, M. K. Baskar, R. Li, M. Wiesner, S. H. Mallidi et al., Multilingual Sequence-to-Sequence Speech Recognition: Architecture, Transfer Learning, and Language Modeling, 2018 IEEE Spoken Language Technology Workshop (SLT), pp.521-527, 2018.

A. Mesaros and T. Virtanen, Automatic Recognition of Lyrics in Singing, EURASIP Journal on Audio, Speech, and Music Processing, vol.2010, issue.1, p.546047, 2010.

M. Mauch, H. Fujihara, and M. Goto, Integrating Additional Chord Information Into HMM-Based Lyrics-to-Audio Alignment, IEEE Transactions on Audio, Speech, and Language Processing, vol.20, issue.1, pp.200-210, 2012.

G. Dzhambazov and X. Serra, Modeling of phoneme durations for alignment between polyphonic audio and lyrics, Proc. of the 12th International Conference in Sound and Music Computing, pp.281-286, 2015.

G. Dzhambazov and A. Srinivasamurthy, On the Use of Note Onsets for Improved Lyrics-To-Audio Alignment in Turkish Makam Music, Proc. 17th International Society for Music Information Retrieval Conference (ISMIR), pp.716-722, 2016.

G. and A. Cohen-hadria, Dali : a Large Dataset of Synchronized Audio , Lyrics and Notes , Automatically Created Using Teacher-Student Machine Learning Paradigm, Proc. International Society on Music Information Retrieval Conference (ISMIR), 2018.
URL : https://hal.archives-ouvertes.fr/hal-02019115

R. Hennequin, A. Khlif, F. Voituret, and M. Moussallam, Spleeter: a fast and efficient music source separation tool with pre-trained models, Journal of Open Source Software, vol.5, issue.50, p.2154, 2020.

S. Toshniwal, T. N. Sainath, R. J. Weiss, B. Li, P. Moreno et al., Multilingual Speech Recognition with a Single End-to-End Model, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp.4904-4908, 2018.

M. Müller, S. Stüker, and A. Waibel, Language Adaptive Multilingual CTC Speech Recognition, Speech and Computer, pp.473-482, 2017.

K. Li, J. Li, G. Ye, R. Zhao, and Y. Gong, Towards Code-switching ASR for End-to-end CTC Models, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp.6076-6080, 2019.

H. Sak, F. De-chaumont-quitry, T. Sainath, and K. Rao, Acoustic modelling with cd-ctc-smbr lstm rnns, Proc. IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), pp.604-609, 2015.

A. Graves, S. Fernández, F. Gomez, and J. Schmidhuber, Connectionist temporal classification, Proceedings of the 23rd international conference on Machine learning - ICML '06, vol.148, pp.369-376, 2006.

T. Schultz and A. Waibel, Language-independent and language-adaptive acoustic modeling for speech recognition, Speech Communication, vol.35, issue.1-2, pp.31-51, 2001.

A. Vaglio, R. Hennequin, M. Moussallam, G. Richard, and F. D'alche-buc, Audio-Based Detection of Explicit Content in Music, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp.526-530, 2020.
URL : https://hal.archives-ouvertes.fr/hal-02747449

D. G. Forney, The viterbi algorithm, Proceedings of the IEEE, vol.61, issue.3, pp.268-278, 1973.

A. Hannun, Sequence Modeling with CTC, Distill, vol.2, issue.11, p.8, 2017.

A. Flexer, A closer look on artist filters for musical genre classification, Proc. of the 8th International Conference on Music Information Retrieval, pp.16-17, 2007.

T. Alumäe, S. Tsakalidis, and R. Schwartz, Improved Multilingual Training of Stacked Neural Network Acoustic Models for Low Resource Languages, Interspeech 2016, pp.3883-3887, 2016.

. &na;, ASPRSN?S 21st National Convention October 8?12, 1995 Radisson Hotel, Montreal, Canada, Plastic Surgical Nursing, vol.15, issue.4, pp.220-221, 1995.

G. Dzhambazov, Knowledge-Based Probabilistic Modeling For Tracking Lyrics In Music Audio Signals, 2017.

J. K. Hansen, Recognition of Phonemes in A-cappella Recordings using Temporal Patterns and Mel Frequency Cepstral Coefficients, Proc. 9th Sound and Music Computing Conference, pp.494-499, 2012.

. &na;, ASPRSN?S 21st National Convention October 8?12, 1995 Radisson Hotel, Montreal, Canada, Plastic Surgical Nursing, vol.15, issue.4, pp.220-221, 1995.