Skip to Main content Skip to Navigation
Theses

Bootstrap methods for multi-task dependency parsing in low-resource conditions

Abstract : Dependency parsing is an essential component of several NLP applications owing its ability to capture complex relational information in a sentence. Due to the wider availability of dependency treebanks, most dependency parsing systems are built using supervised learning techniques. These systems require a significant amount of annotated data and are thus targeted toward specific languages for which this type of data are available. Unfortunately, producing sufficient annotated data for low-resource languages is time- and resource-consuming. To address the aforementioned issue, the present study investigates three bootstrapping methods, namely, (1) multi-lingual transfer learning, (2) deep contextualized embedding, and (3) Co-training. Multi-lingual transfer learning is a typical supervised learning approach that can transfer dependency knowledge using multi-lingual training data based on multi-lingual lexical representations. Deep contextualized embedding maximizes the use of lexical features during supervised learning based on enhanced sub-word representations and language model (LM). Lastly, co-training is a semi-supervised learning method that leverages parsing accuracies using unlabeled data. Our approaches have the advantage of requiring only a small bilingual dictionary or easily obtainable unlabeled resources (e.g., Wikipedia) to improve parsing accuracy in low-resource conditions. We evaluated our parser on 57 official CoNLL shared task languages as well as on Komi, which is a language we developed as a training and evaluation corpora for low-resource scenarios. The evaluation results demonstrated outstanding performances of our approaches in both low- and high-resource dependency parsing in the 2017 and 2018 CoNLL shared tasks. A survey of both model transfer learning and semi-supervised methods for low-resource dependency parsing was conducted, where the effect of each method under different conditions was extensively investigated.
Document type :
Theses
Complete list of metadata

https://tel.archives-ouvertes.fr/tel-03477961
Contributor : Abes Star :  Contact
Submitted on : Monday, December 13, 2021 - 4:36:07 PM
Last modification on : Monday, January 10, 2022 - 5:30:15 PM

File

Lim_2020_These.pdf
Version validated by the jury (STAR)

Identifiers

  • HAL Id : tel-03477961, version 1

Collections

Citation

Kyungtae Lim. Bootstrap methods for multi-task dependency parsing in low-resource conditions. Linguistics. Université Paris sciences et lettres, 2020. English. ⟨NNT : 2020UPSLE027⟩. ⟨tel-03477961⟩

Share

Metrics

Les métriques sont temporairement indisponibles