Gabay, Simon Clérice, Thibault
The 17th c. is crucial for the French language, as it sees the creation of a strict orthographic norm that largely persists to this day. Despite its significance, the history of spelling systems remains however an overlooked area in linguistics for two reasons. On the one hand, spelling is made up of microchanges which requires a quantitative appro...
Goux, Mathieu
La linguistique de corpus et les très grands corpus outillés sont utilisés depuis plusieurs dizaines d’années pour l’étude diachronique du français. Ils ont permis d’affiner notre connaissance de son évolution et de mettre au jour des phénomènes qui n’avaient jusque-là pas été étudiés. Pourtant, leur constitution et leurs fonctionnalités de recherc...
Gabay, Simon Clérice, Thibault Reul, Christian
Machine learning begins with machine teaching: in the following paper, we present the data that we have prepared to kick-start the training of reliable OCR models for 17th century prints written in French. The construction of a representative corpus is a major challenge: we need to gather documents from different decades and of different genres to ...
Gabay, Simon Clérice, Thibault Reul, Christian
Machine learning starts with machine teaching: in the following paper, we present the data that we have gathered and created to train reliable OCR models for 17th c. French prints, and preliminary results based on these training data and experiments to improve them.
Gabay, Simon
International audience
Biros, Camille Rossi, Caroline Sahakyan, Inesa
This article offers a descriptive and analytic view of the different stages leading to the constitution of a corpus that is representative of the issues of climate and energy justice. Overall, the corpus contains over five million words and gathers reports, newsletters and web-pages dealing with the most equitable ways of moving to a low-carbon fut...
Biros, Camille Rossi, Caroline Sahakyan, Inesa
This article offers a descriptive and analytic view of the different stages leading to the constitution of a corpus that is representative of the issues of climate and energy justice. Overall, the corpus contains over five million words and gathers reports, newsletters and web-pages dealing with the most equitable ways of moving to a low-carbon fut...
Biros, Camille Rossi, Caroline Sahakyan, Inesa
This article offers a descriptive and analytic view of the different stages leading to the constitution of a corpus that is representative of the issues of climate and energy justice. Overall, the corpus contains over five million words and gathers reports, newsletters and web-pages dealing with the most equitable ways of moving to a low-carbon fut...
Biros, Camille Rossi, Caroline Sahakyan, Inesa
This article offers a descriptive and analytic view of the different stages leading to the constitution of a corpus that is representative of the issues of climate and energy justice. Overall, the corpus contains over five million words and gathers reports, newsletters and web-pages dealing with the most equitable ways of moving to a low-carbon fut...
Biros, Camille Rossi, Caroline Sahakyan, Inesa
This article offers a descriptive and analytic view of the different stages leading to the constitution of a corpus that is representative of the issues of climate and energy justice. Overall, the corpus contains over five million words and gathers reports, newsletters and web-pages dealing with the most equitable ways of moving to a low-carbon fut...