Biros, Camille Rossi, Caroline Sahakyan, Inesa
This article offers a descriptive and analytic view of the different stages leading to the constitution of a corpus that is representative of the issues of climate and energy justice. Overall, the corpus contains over five million words and gathers reports, newsletters and web-pages dealing with the most equitable ways of moving to a low-carbon fut...
Biros, Camille Rossi, Caroline Sahakyan, Inesa
This article offers a descriptive and analytic view of the different stages leading to the constitution of a corpus that is representative of the issues of climate and energy justice. Overall, the corpus contains over five million words and gathers reports, newsletters and web-pages dealing with the most equitable ways of moving to a low-carbon fut...
Bakari, Wided Bellot, Patrice Neji, Mahmoud
With the development of electronic media and the heterogeneity of Arabic data on the Web, the idea of building a clean corpus for certain applications of natural language processing, including machine translation, information retrieval, question answer, become more and more pressing. In this manuscript, we seek to create and develop our own corpus ...
Shen, Mo
In reference to IEEE copyrighted material which is used with permission in this thesis, the IEEE does not endorse any of Kyoto University's products or services. Internal or personal use of this material is permitted. If interested in reprinting/republishing IEEE copyrighted material for advertising or promotional purposes or for creating new colle...
Wallgrün, Jan Oliver Klippel, Alexander Baldwin, Timothy
Spatial language, despite decades of research, still poses substantial challenges for automated systems, for instance in geographic information retrieval or human-robot interaction. We describe an approach to building a corpus of natural language expressions extracted from web documents for analyzing and modeling spatial relational expressions (SRE...
Endrédy, István Novák, Attila
The ever-increasing web is an important source for building large-scale corpora. However, dynamically generated web pages often contain much irrelevant and duplicated text, which impairs the quality of the corpus. To ensure the high quality of web-based corpora, a good boilerplate removal algorithm is needed to extract only the relevant content fro...
Reed, Chris; Mochales Palau, Raquel; 54309; Rowe, Glenn; Moens, Marie-Francine; 12108;
This paper describes the development of a written corpus tagged for argumentative reasoning and the main difficulties found in the process. In addition, it offers a number of examples of how this kind of language resources can be used in developing both novel applications in artificial intelligence and novel theory-testing and theory-building in ph...