To Be or Not To Be a Verbal Multiword Expression: A Quest for Discriminating Features - Laboratoire LI, équipe BDTLN Accéder directement au contenu
Pré-Publication, Document De Travail Année : 2020

To Be or Not To Be a Verbal Multiword Expression: A Quest for Discriminating Features

Résumé

Automatic identification of mutiword expressions (MWEs) is a prerequisite for semantically-oriented downstream applications. This task is challenging because MWEs, especially verbal ones (VMWEs), exhibit surface variability. However, this variability is usually more restricted than in regular (non-VMWE) constructions, which leads to various variability profiles. We use this fact to determine the optimal set of features which could be used in a supervised classification setting to solve a subproblem of VMWE identification: the identification of occurrences of previously seen VMWEs. Surprisingly, a simple custom frequency-based feature selection method proves more efficient than other standard methods such as Chi-squared test, information gain or decision trees. An SVM classi-fier using the optimal set of only 6 features out-performs the best systems from a recent shared task on the French seen data.
Fichier principal
Vignette du fichier
Arxiv.pdf (347.6 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)

Dates et versions

hal-02905874 , version 1 (23-07-2020)

Identifiants

  • HAL Id : hal-02905874 , version 1

Citer

Caroline Pasquer, Agata Savary, Jean-Yves Antoine, Carlos Ramisch, Nicolas Labroche, et al.. To Be or Not To Be a Verbal Multiword Expression: A Quest for Discriminating Features. 2020. ⟨hal-02905874⟩
72 Consultations
61 Téléchargements

Partager

Gmail Facebook X LinkedIn More