Measuring the Quality of Semantic Data Augmentation for Sarcasm Detection

Alif Tri Handoyo; Aurélien Diot; Hidayaturrahman Hidayaturrahman; Derwin Suhartono; Bart Lamiroy

Article Dans Une Revue International Journal of Intelligent Engineering and Systems Année : 2023

Measuring the Quality of Semantic Data Augmentation for Sarcasm Detection

(1) , (2) , (1) , (1) , (2)

1
2

Alif Tri Handoyo

Fonction : Auteur

Bina Nusantara University [Jakarta]

Aurélien Diot

Fonction : Auteur

Centre de Recherche en Sciences et Technologies de l'Information et de la Communication - EA 3804

Hidayaturrahman Hidayaturrahman

Fonction : Auteur

Bina Nusantara University [Jakarta]

Derwin Suhartono

Fonction : Auteur

Bina Nusantara University [Jakarta]

Bart Lamiroy

Fonction : Auteur
PersonId : 1298
IdHAL : bart-lamiroy
ORCID : 0000-0003-0871-0149
IdRef : 111726980

Centre de Recherche en Sciences et Technologies de l'Information et de la Communication - EA 3804

Résumé

Sarcasm is a form of figurative speech where the intended meaning of a sentence is different from it literal meaning. Sarcastic expressions tend to confuse automatic NLP approaches in many application domains, making their detection of significant importance. One of the challenges in machine learning approaches to sarcasm detection is the difficulty of acquiring ground-truth annotations. Thus, human-annotated datasets usually contain only a few thousand texts, often being unbalanced. In this paper, we propose two different pipelines of data augmentation to generate more sarcastic data. The first one is SMERT-BERT, a modified SMERTI pipeline that uses RoBERTa as the language model for the text infilling module. The second one is SWORD (semantic text exchange by Word-Attribution), where we modified the masking module in the SMERTI pipeline by utilizing the word-attribution value. These approaches are combined with a SLOR (syntactic log-odds ratio) metric to filter the generated sarcastic data and only select sentences with the best score. Our experiments show that the use of a SLOR filter has a significant positive contribution to the augmentation process. In particular, we achieve the best results when using the SMERT-BERT pipeline and a SLOR filter by improving the F-measure by 4.00% on the iSarcasm dataset, compared to the baseline models.

Mots clés

BERT Data augmentation Sarcasm detection SLOR SMERTI

Domaines

Informatique [cs]

Fichier principal

IntJIntelligentEngSys_2023.pdf (828.98 Ko)

Origine : Fichiers éditeurs autorisés sur une archive ouverte
licence : CC BY NC SA - Paternité - Pas d'utilisation commerciale - Partage selon les Conditions Initiales

BU de Reims Champagne-Ardenne : Connectez-vous pour contacter le contributeur

https://hal.science/hal-04194530

Soumis le : vendredi 29 mars 2024-16:56:29

Dernière modification le : vendredi 19 avril 2024-13:44:22

Dates et versions

hal-04194530 , version 1 (29-03-2024)

Licence

Paternité - Pas d'utilisation commerciale - Partage selon les Conditions Initiales

Identifiants

HAL Id : hal-04194530 , version 1

Citer

Alif Tri Handoyo, Aurélien Diot, Hidayaturrahman Hidayaturrahman, Derwin Suhartono, Bart Lamiroy. Measuring the Quality of Semantic Data Augmentation for Sarcasm Detection. International Journal of Intelligent Engineering and Systems, 2023, 6 (5), pp.79-91. ⟨hal-04194530⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

URCA CRESTIC

19 Consultations

1 Téléchargements

Measuring the Quality of Semantic Data Augmentation for Sarcasm Detection

Résumé

Mots clés

Domaines

Dates et versions

Licence

Identifiants

Citer

Exporter

Collections

Partager