The SAGE Handbook of Social Media Research Methods was published in 2016. Naturally, the research underpinnings for this excellent volume were generated several years prior to publication. Since then, we’ve seen scandals associated with social media and the Brexit vote, 2016 US election, hacked personal data from financial institutions, as well as policy changes including the far-reaching EU General Data Protection Regulation. I can’t help but wonder about research implications for those who study social media or use social media to interact with participants. To learn what my fellow Handbook contributors are thinking, I posed these questions:
- Why do feel changes are needed– or not– to the research approach you wrote about in the Handbook?
- What would you do or say differently in the post-Cambridge Analytica/social media hacking scandal era?
- What caveats or steps would you recommend for social media researchers now?
The first post of this series is from Luke Sloan, co-editor of the Handbook. He co-wrote Chapter 1 “Introduction to the Handbook of Social Media Research Methods: Goals, Challenges and Innovations,” and Chapter 7 Social Science ‘Lite’? Deriving Demographic Proxies from Twitter
Inferring from Twitter Data
by Luke Sloan, PhD
Unlike with survey research, where we can simply ask the respondent these things directly, naturally occurring social media data is noisy and often the things we are trying to measure are not explicit. Examples of this include important demographic characteristics that form the basis of most social scientific research (such as gender, ethnicity, class) and also attitudinal and behavioural measures such as political views. Finding these signals in the noise is a difficult technical and methodological task, but the fact that these signals exist is precisely why social media data is of interest to social scientists. In light of this, many researchers across the globe are involved in work concerning inferring characteristics, behaviours and attitudes of social media users for a wide range of reasons, from simply understanding who is using the platform, to understanding hate speech, to predicting elections.
However, inferring characteristics can have negative consequences when applied outside of the ethical framework in which most researchers operate. Whilst academics are bound to the principles of ‘doing no harm to participants’, inferring things about users based on their online presence can be used by non-regulated actors to discriminate or target individuals. For example, assessing the financial position of a user could be used against them to secure credit, deriving race and/or ethnicity could be used to target users with abusive content, estimating whether someone is in a trade union could be used to discriminate against them in the workplace. Indeed, Twitter have recently changed their terms to avoid their API being used for these purposes: https://developer.twitter.com/en/developer-terms/more-on-restricted-use-cases.html.
This change poses particular problems for researchers across the globe as the academic community become collateral casualties in the war on ‘data misuse’. Academics are subject to exemptions regarding data sharing in other parts of the Twitter terms of service – it remains to be seen whether similar rights will be granted here… but why should academics be treated differently?
Well, as already mentioned academic researchers are bound to protect participants, and the general consensus in the social sciences is that ‘data are people’, even if presented in a public forum such as Twitter. But as compelling a reason as ethics is that of openness and honest around uncertainty. No researchers believe that inferring characteristics through such messy and noisy data results in complete accuracy and much of the work in this area is concerned with understanding the weaknesses of assigning characteristic to virtual representations of users.
Of course, narratives around uncertainty don’t go down well when you’re trying to sell a product that determines likelihood to default on a loan based on social media activity.
All of this demonstrates that we, as an academic community, need to work with tech giants and the public to:
- Explain in what way our work benefits society (such as highlighting inequality online through identifying members of minority groups via inferred ethnicity)
- Extol the principles in which we all believe and are bound to of ethics and doing no harm to participants
- Continue to be explicit about uncertainty and reflexivity in our work
See Luke’s new article:
Sloan L (2017) Who Tweets in the United Kingdom? Profiling the Twitter Population Using the British Social Attitudes Survey 2015, Social Media + Society, 3:1, http://journals.sagepub.com/doi/full/10.1177/2056305117698981
Post your thoughts and comments below. Follow the series, and read the chapters. Register to the MethodSpace site to receive new posts by email (never any SPAM!)