Skip to Main content Skip to Navigation
Poster communications

The Corpus for Idiolectal Research (CIDRE)

Abstract : It is well known that the idiolect (the language of an individual) evolves over time. However, there is a lack of quantitative studies on this topic, due to the lack of large corpora (but see Barlow 2013; Mollin 2009; Petré et al. 2019 for a few examples). To study what is specific in an idiolect and how it evolves over a lifetime, we assembled, cleaned and dated the fiction works of 11 very prolific 19th and early 20th century French writers. This resulted in the CIDRE corpus counting 37 million words and over 400 books.
Complete list of metadata

https://hal.archives-ouvertes.fr/hal-03353520
Contributor : Olga Seminck Connect in order to contact the contributor
Submitted on : Friday, September 24, 2021 - 10:19:36 AM
Last modification on : Friday, October 22, 2021 - 11:41:42 AM

File

POSTER_SEPTEMBRE2021.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : hal-03353520, version 1

Citation

Olga Seminck, Philippe Gambette, Dominique Legallois, Thierry Poibeau. The Corpus for Idiolectal Research (CIDRE). European Association of Digital Humanities Conference (EADH 2021), Sep 2021, Krasnoyarsk, Russia. ⟨hal-03353520⟩

Share

Metrics

Record views

14

Files downloads

14