Discourse on climate and energy justice: a comparative study of Do It Yourself and Bootstrapped corpora
- Authors
- Publication Date
- Sep 24, 2018
- Source
- HAL-INRIA
- Keywords
- Language
- English
- License
- Unknown
- External links
Abstract
This article offers a descriptive and analytic view of the different stages leading to the constitution of a corpus that is representative of the issues of climate and energy justice. Overall, the corpus contains over five million words and gathers reports, newsletters and web-pages dealing with the most equitable ways of moving to a low-carbon future in the aim of limiting climate change. It can be divided into six sub-corpora, according to types of discourse communities, and methods of constitution. We begin by presenting the small Do It Yourself (DIY) corpora which were used as a starting point. Three discourse communities were selected to observe possible variation in their treatment of the issue: Non-Governmental Organisations (NGOs), United-Nation institutions, and the Renewable Energy Sector (RES). The sources are selected according to author, date, keywords in title. Using the concordance Antconc and WMatrix software we test the reliability of the corpora for their thematic content, terminology and lexical unit classification. Our first results enable us to confirm variation between the discourse communities. The discrepancy in sizes and the time-consuming nature of the initial DIY corpus constitution lead us to use BootCat to extend them, using keywords from the corpora as seeds to retrieve and download webpages. We thus contrast a more traditional approach to corpus building to web-as-corpus data gathering methods. We compare the results found in the BootCat corpora to test if they are as specific as those in the DIY corpora. This enables us to draw conclusions on the possible uses and advantages of relatively small corpora for the study of specialised discourse.