Skip to Main content Skip to Navigation
Conference papers

Data Augmentation via Subtree Swapping for Dependency Parsing of Low-Resource Languages

Abstract : The lack of annotated data is a big issue for building reliable NLP systems for most of the world's languages. But this problem can be alleviated by automatic data generation. In this paper, we present a new data augmentation method for artificially creating new dependency-annotated sentences. The main idea is to swap subtrees between annotated sentences while enforcing strong constraints on those trees to ensure maximal grammaticality of the new sentences. We also propose a method to perform low-resource experiments using resource-rich languages by mimicking low-resource languages by sampling sentences under a low-resource distribution. In a series of experiments, we show that our newly proposed data augmentation method outperforms previous proposals using the same basic inputs.
Complete list of metadata

https://hal.archives-ouvertes.fr/hal-03256711
Contributor : Mathieu Dehouck Connect in order to contact the contributor
Submitted on : Thursday, June 10, 2021 - 2:05:14 PM
Last modification on : Tuesday, October 19, 2021 - 10:58:23 AM
Long-term archiving on: : Saturday, September 11, 2021 - 6:53:56 PM

File

Data Augmentation via Subtree ...
Publisher files allowed on an open archive

Identifiers

Collections

Citation

Mathieu Dehouck, Carlos Gómez-Rodríguez. Data Augmentation via Subtree Swapping for Dependency Parsing of Low-Resource Languages. 28th International Conference on Computational Linguistics, Dec 2020, Barcelona, Spain. pp.3818-3830, ⟨10.18653/v1/2020.coling-main.339⟩. ⟨hal-03256711⟩

Share

Metrics

Record views

21

Files downloads

38