The Corpus for Idiolectal Research (CIDRE) - Langues, textes, traitement informatique, cognition Accéder directement au contenu
Poster De Conférence Année : 2021

The Corpus for Idiolectal Research (CIDRE)

Résumé

It is well known that the idiolect (the language of an individual) evolves over time. However, there is a lack of quantitative studies on this topic, due to the lack of large corpora (but see Barlow 2013; Mollin 2009; Petré et al. 2019 for a few examples). To study what is specific in an idiolect and how it evolves over a lifetime, we assembled, cleaned and dated the fiction works of 11 very prolific 19th and early 20th century French writers. This resulted in the CIDRE corpus counting 37 million words and over 400 books.
Fichier principal
Vignette du fichier
POSTER_SEPTEMBRE2021.pdf (819.47 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)

Dates et versions

hal-03353520 , version 1 (24-09-2021)

Identifiants

  • HAL Id : hal-03353520 , version 1

Citer

Olga Seminck, Philippe Gambette, Dominique Legallois, Thierry Poibeau. The Corpus for Idiolectal Research (CIDRE). European Association of Digital Humanities Conference (EADH 2021), Sep 2021, Krasnoyarsk, Russia. ⟨hal-03353520⟩
105 Consultations
81 Téléchargements

Partager

Gmail Facebook X LinkedIn More