Skip to Main content Skip to Navigation
Journal articles

The Corpus for Idiolectal Research (CIDRE)

Abstract : The Corpus for Idiolectal Research (CIDRE) is a collection of fiction works from 11 prolific 19th-century French authors (4 women, 7 men; 22-62 works/author; total of 37 million words). Every work is dated with the year it was written. Using programming scripts, the works have been gathered from open source platforms, for example La Bibliothèque électronique du Québec, and stripped of paratext (text not being part of the novel, e.g. prefaces). We distribute the text files, the dating, other metadata and the programming scripts under an open source license. CIDRE is the first resource of French for the study of style and idiolect in a diachronic manner (i.e. stylochronometry) on a larger scale.
Complete list of metadata

https://hal.archives-ouvertes.fr/hal-03310451
Contributor : Olga Seminck Connect in order to contact the contributor
Submitted on : Friday, July 30, 2021 - 2:24:09 PM
Last modification on : Tuesday, October 19, 2021 - 10:58:41 AM

File

JOHD_Cidre.pdf
Publisher files allowed on an open archive

Identifiers

Citation

Olga Seminck, Philippe Gambette, Dominique Legallois, Thierry Poibeau. The Corpus for Idiolectal Research (CIDRE). Journal of Open Humanities Data, Ubiquity Press, 2021, 7, pp.15. ⟨10.5334/johd.42⟩. ⟨hal-03310451⟩

Share

Metrics

Record views

93

Files downloads

116