Skip to Main content Skip to Navigation
Book sections

ZeroBERTo: Leveraging Zero-Shot Text Classification by Topic Modeling

Abstract : Traditional text classification approaches often require a good amount of labeled data, which is difficult to obtain, especially in restricted domains or less widespread languages. This lack of labeled data has led to the rise of low-resource methods, that assume low data availability in natural language processing. Among them, zero-shot learning stands out, which consists of learning a classifier without any previously labeled data. The best results reported with this approach use language models such as Transformers, but fall into two problems: high execution time and inability to handle long texts as input. This paper proposes a new model, ZeroBERTo, which leverages an unsupervised clustering step to obtain a compressed data representation before the classification task. We show that ZeroBERTo has better performance for long inputs and shorter execution time, outperforming XLM-R by about 12% in the F1 score in the FolhaUOL dataset.
Complete list of metadata

https://hal.telecom-paris.fr/hal-03628242
Contributor : Thomas Palmeira Ferraz Connect in order to contact the contributor
Submitted on : Saturday, June 4, 2022 - 10:57:46 PM
Last modification on : Saturday, June 25, 2022 - 3:14:29 AM

File

preprint_zeroberto.pdf
Files produced by the author(s)

Identifiers

Citation

Alexandre Alcoforado, Thomas Palmeira Ferraz, Rodrigo Gerber, Enzo Bustos, André Seidel Oliveira, et al.. ZeroBERTo: Leveraging Zero-Shot Text Classification by Topic Modeling. Vládia Pinheiro; Pablo Gamallo; Raquel Amaro; Carolina Scarton; Fernando Batista; Diego Silva; Catarina Magro; Hugo Pinto. Computational Processing of the Portuguese Language. 15th International Conference, PROPOR 2022, Fortaleza, Brazil, March 21–23, 2022, Proceedings, 13208, Springer International Publishing, pp.125-136, 2022, Lecture Notes in Computer Science, 978-3-030-98304-8. ⟨10.1007/978-3-030-98305-5_12⟩. ⟨hal-03628242⟩

Share

Metrics

Record views

16

Files downloads

13