Skip to Main content Skip to Navigation
Poster communications

The Corpus for Idiolectal Research (CIDRE)

Abstract : It is well known that the idiolect (the language of an individual) evolves over time. However, there is a lack of quantitative studies on this topic, due to the lack of large corpora (but see Barlow 2013; Mollin 2009; Petré et al. 2019 for a few examples). To study what is specific in an idiolect and how it evolves over a lifetime, we assembled, cleaned and dated the fiction works of 11 very prolific 19th and early 20th century French writers. This resulted in the CIDRE corpus counting 37 million words and over 400 books.
Complete list of metadata
Contributor : Olga Seminck Connect in order to contact the contributor
Submitted on : Friday, September 24, 2021 - 10:19:36 AM
Last modification on : Friday, October 22, 2021 - 11:41:42 AM


Files produced by the author(s)


  • HAL Id : hal-03353520, version 1


Olga Seminck, Philippe Gambette, Dominique Legallois, Thierry Poibeau. The Corpus for Idiolectal Research (CIDRE). European Association of Digital Humanities Conference (EADH 2021), Sep 2021, Krasnoyarsk, Russia. ⟨hal-03353520⟩



Record views


Files downloads