Skip to Main content Skip to Navigation

Leveraging lyrics from audio for MIR

Abstract : Lyrics provide a lot of information about music since they encapsulate a lot of the semantics of songs. Such information could help users navigate easily through a large collection of songs and to recommend new music to them. However, this information is often unavailable in its textual form. To get around this problem, singing voice recognition systems could be used to obtain transcripts directly from the audio. These approaches are generally adapted from the speech recognition ones. Speech transcription is a decades-old domain that has lately seen significant advancements due to developments in machine learning techniques. When applied to the singing voice, however, these algorithms provide poor results. For a number of reasons, the process of lyrics transcription remains difficult. In this thesis, we investigate several scientifically and industrially difficult ’Music Information Retrieval’ problems by utilizing lyrics information generated straight from audio. The emphasis is on making approaches as relevant in real-world settings as possible. This entails testing them on vast and diverse datasets and investigating their scalability. To do so, a huge publicly available annotated lyrics dataset is used, and several state-of-the-art lyrics recognition algorithms are successfully adapted. We notably present, for the first time, a system that detects explicit content directly from audio. The first research on the creation of a multilingual lyrics-toaudio system are as well described. The lyrics-toaudio alignment task is further studied in two experiments quantifying the perception of audio and lyrics synchronization. A novel phonotactic method for language identification is also presented. Finally, we provide the first cover song detection algorithm that makes explicit use of lyrics information extracted from audio.
Complete list of metadata
Contributor : ABES STAR :  Contact
Submitted on : Friday, February 4, 2022 - 5:12:30 PM
Last modification on : Tuesday, August 16, 2022 - 11:25:10 AM
Long-term archiving on: : Thursday, May 5, 2022 - 7:42:25 PM


Version validated by the jury (STAR)


  • HAL Id : tel-03558515, version 1



Andrea Vaglio. Leveraging lyrics from audio for MIR. Signal and Image processing. Institut Polytechnique de Paris, 2021. English. ⟨NNT : 2021IPPAT027⟩. ⟨tel-03558515⟩



Record views


Files downloads