Affordable Access

Access to the full text

Linking Twitter and survey data: asymmetry in quantity and its impact

Authors
  • Al Baghal, Tarek1
  • Wenz, Alexander2
  • Sloan, Luke3
  • Jessop, Curtis4
  • 1 University of Essex, Colchester, UK , Colchester (United Kingdom)
  • 2 University of Mannheim, Mannheim, Germany , Mannheim (Germany)
  • 3 Cardiff University, Cardiff, UK , Cardiff (United Kingdom)
  • 4 NatCen Social Research, London, UK , London (United Kingdom)
Type
Published Article
Journal
EPJ Data Science
Publisher
Springer Berlin Heidelberg
Publication Date
Jun 09, 2021
Volume
10
Issue
1
Identifiers
DOI: 10.1140/epjds/s13688-021-00286-7
Source
Springer Nature
Keywords
License
Green

Abstract

Linked social media and survey data have the potential to be a unique source of information for social research. While the potential usefulness of this methodology is widely acknowledged, very few studies have explored methodological aspects of such linkage. Respondents produce planned amounts of survey data, but highly variant amounts of social media data. This study explores this asymmetry by examining the amount of social media data available to link to surveys. The extent of variation in the amount of data collected from social media could affect the ability to derive meaningful linked indicators and could introduce possible biases. Linked Twitter data from respondents to two longitudinal surveys representative of Great Britain, the Innovation Panel and the NatCen Panel, show that there is indeed substantial variation in the number of tweets posted and the number of followers and friends respondents have. Multivariate analyses of both data sources show that only a few respondent characteristics have a statistically significant effect on the number of tweets posted, with the number of followers being the strongest predictor of posting in both panels, women posting less than men, and some evidence that people with higher education post less, but only in the Innovation Panel. We use sentiment analyses of tweets to provide an example of how the amount of Twitter data collected can impact outcomes using these linked data sources. Results show that more negatively coded tweets are related to general happiness, but not the number of positive tweets. Taken together, the findings suggest that the amount of data collected from social media which can be linked to surveys is an important factor to consider and indicate the potential for such linked data sources in social research.

Report this publication

Statistics

Seen <100 times