In the age of social media, Twitter provides unprecedented opportunities for social researchers to listen to millions of voices, observe millions of interactions and gain new insights in our social world. This has implications for research practices, policy decisions and everyday life. But as with all research, methods matter. Luke Sloan, co-editor of the recently released SAGE Handbook of Social Media Research Methods, shares his insights into the methodological challenges of Twitter research, demonstrating how it isn’t all ‘black and white.’ This post originally appeared at SAGE Connection and is reposted here with permission.
On the face of it, it sounds too good to be true… and to an extent it is. Easy access to a wealth of naturally occurring data that is locomotive and instant, allowing us to see real-time changes in behavior and attitudes as events in the real world unfold. Outstanding geographical granularity to the point of latitude and longitude. A voice for the disenfranchised and hard to reach populations that are typically difficult to engage in traditional modes of social research. High level quantitative aggregated data alongside qualitative micro-interactions. Monitoring the pulse of the world.
Wow, Twitter data really does sound like the holy grail of data sources for academics, government and private companies alike. Except that is isn’t. As with most things, it’s complicated and we should be concerned by anyone who presents a black and white case for what Twitter is telling us about human and social behavior without reflecting on the methodological challenges we face.Yes the data is naturally occurring and, as it is not generated or elicited as part of an explicit research project, it is not subject to the Hawthorne effect (altering the behavior of those being studied due to their awareness of being observed). But Twitter data is not produced in a vacuum and there is nothing ‘natural’ about how behavior is path-dependent on technology, how users construct virtual identities and the myriad of ways in which Twitter is used: as a news feed; as a professional network; as a friendship network; for self-promotion; for lurking and so on. The data is not untainted – it’s like buying organic potatoes in a plastic bag, the product looks wholesome but the packaging is artificial (yet you’ll still enjoy eating those chips).
As for the speed at which the data is produced, certainly this is something new. A vast network of human sensors producing data in real-time means that responses to emergency events can be coordinated efficiently. It even works with earthquake detection. But there is a lot of noise on Twitter, and how do you know what to focus on and filter out to monitor the important things? Perhaps more to the point, how can you be monitoring and looking for something that hasn’t happened yet?
The ability to pinpoint the exact location of a user when they produced a tweet is incredibly powerful. It allows us to build a context around the data, to locate that individual within existing geographies relating to Census data, crime rates and voting patterns to name but a few. If someone tweets that they feel unsafe, we can look at deprivation measures for the area they are in. If someone shows political support for a particular party, we can link this to a constituency and model voting patterns. The ability to link data sources through geography means that we can investigate the most fundamental question – what is the relationship between what people tweet and the real world? Indeed, does Twitter tell us anything at all? Yet bear in mind that the proportion of tweets that have this data (aka are geotagged) are estimated to be as low as 0.85 percent and certainly those who geotag are not a random sample of the Twitter population.
And whilst we’re talking about samples, Twitter users themselves are not a random sample of any population, certainly not in the UK where tweeters are disproportionately younger and certain class groups are over or under-represented (See Chapter 7 here). Virtual identities are constructed and the demographics of most Twitter users are largely unknown. When you’re presented with some Twitter data and someone is making a knowledge claim, when you’re about to make an important policy decision based on the analysis, ask yourself one question – do you actually know who was in the sample? Can you even tell what country they’re tweeting from? If you’re interested in the disenfranchised, how do you know they’re present in your data?
Finally, Twitter provides a wealth of quantitative and qualitative information in the form of sentiment, networks, interactions, retweets, frequencies, topics, hashtags, URL sharing and all the rest of the meta-data that surrounds a tweet. But what does it mean? Is a retweet an endorsement, an act of passing on information or can it even be ironic? When you’re limited to 140 characters, how can meaning be ascribed to words?
The point is that Twitter does provide unprecedented opportunities to explore attitudes and behavior but as with all research, methods matter. Consideration of research design and sampling are key, as is a detailed understanding of the technological parameters of the platform and how it’s used. These observations apply to any research that sources data from social media.
It would be naive (at best) and foolhardy (at worst) to take Twitter at face-value, but that doesn’t mean it’s not useful.
As I said, it’s complicated.