Affordable Access

BOUTEF: Bolstering Our Understanding Through an Elaborated Fake News Corpus

Authors
  • Smaïli, Kamel
  • Anissa, Hamza
  • David, Langlois
  • Djegdjiga, Amazouz
Publication Date
Apr 19, 2024
Source
Hal-Diderot
Keywords
Language
English
License
Unknown
External links

Abstract

This article presents BOUTEF, an original and comprehensive corpus of fake news. It encompassescontent in Algerian and Tunisian dialects, Modern Standard Arabic (MSA), French, and English,featuring instances of code-switching between these languages. Moreover, for the Algerian and Tunisiandialects, we have preserved both Latin and Arabic scripts in the dataset. BOUTEF comprises over 3,600fake news posts collected from various social media platforms spanning from 2010 to 2024. This corpusis developed as part of the TRADEF 4 project and is made available to the research community. Eachfake news post in BOUTEF is associated with 16 attributes, providing rich contextual information. Thedata was gathered from Facebook, Twitter, YouTube, and TikTok, reflecting the diverse sources of misinformation.To enhance the depth of our analysis, we introduce a novel labeling scheme consisting of 40categories. This scheme is developed through a thorough examination of the collected corpus, and we havealso retained a tagging process inspired by Claire Wardle’s categorization. BOUTEF not only contributesto the understanding of fake news in multilingual contexts but also provides valuable resources for furtherresearch in this domain.

Report this publication

Statistics

Seen <100 times