Data Augmentation via Subtree Swapping for Dependency Parsing of Low-Resource Languages - Langues, textes, traitement informatique, cognition Accéder directement au contenu
Communication Dans Un Congrès Année : 2020

Data Augmentation via Subtree Swapping for Dependency Parsing of Low-Resource Languages

Résumé

The lack of annotated data is a big issue for building reliable NLP systems for most of the world's languages. But this problem can be alleviated by automatic data generation. In this paper, we present a new data augmentation method for artificially creating new dependency-annotated sentences. The main idea is to swap subtrees between annotated sentences while enforcing strong constraints on those trees to ensure maximal grammaticality of the new sentences. We also propose a method to perform low-resource experiments using resource-rich languages by mimicking low-resource languages by sampling sentences under a low-resource distribution. In a series of experiments, we show that our newly proposed data augmentation method outperforms previous proposals using the same basic inputs.
Fichier principal
Vignette du fichier
Data Augmentation via Subtree Swapping.pdf (205.61 Ko) Télécharger le fichier
Origine : Fichiers éditeurs autorisés sur une archive ouverte

Dates et versions

hal-03256711 , version 1 (10-06-2021)

Identifiants

Citer

Mathieu Dehouck, Carlos Gómez-Rodríguez. Data Augmentation via Subtree Swapping for Dependency Parsing of Low-Resource Languages. 28th International Conference on Computational Linguistics, Dec 2020, Barcelona, Spain. pp.3818-3830, ⟨10.18653/v1/2020.coling-main.339⟩. ⟨hal-03256711⟩
74 Consultations
90 Téléchargements

Altmetric

Partager

Gmail Facebook X LinkedIn More