Categories: Big Data
The debut MethodSpace webinar, “Methods in Action: Tackling the Tweet,” offered a number of fascinating tidbits in a wide-ranging hour that combined actual research outputs drawn from social media research with techniques and tips on how to do your own experiments. Listeners learned how there may be cultural differences in who chooses to geotag their tweets, that liberals are more likely to retweet conservatives’ messages than vice versa, and how it is that Ellen DeGeneres helped create a “black hole” in political science.
Two academics, Luke Sloan, a senior lecturer in quantitative methods at Cardiff University, and Joshua Tucker of New York University’s Social Media and Political Participation (SMaPP) lab were the webinar’s guests. Sloan has worked on a range of projects investigating the use of Twitter data for understanding social phenomena covering topics such as election prediction, tracking (mis)information propagation during food scares and ‘crime-sensing,’ while Tucker’s work in comparative political science in modern elections has led him to pioneering examinations of the nexus of political identity and online communication.
That April 13 webinar is now archived and can be watched below, and a Storify of the tweets during the live webcast appears at the bottom of this post:
The hour-long webinar was not long enough for all questions from the audience. Sloan answered some of those questions afterwards.
Can Twitter be used to understand the usage of language for people of specific region? Is your group working in this regard? Is your group is open to collaborate with other labs?
Potentially yes – you can distinguish between the language of the user interface that tweeters interact with and the language in which they tweet. The later of these is more variable. You can then use geotagging to locate certain languages within specific geographies, although my recent research indicates that geotagging use is a function of language of tweet and interface: http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0142209 I’d be very keen to link up with other research groups working in this area.
Luke mentioned that some of what he studies is the propagation of misinformation on Twitter. I am interested in hearing more about this…do you have any theories or research on how this occurs? How have you (or would you) go about studying this?
One way to study the propagation of information is to track the progress of a retweet, seeing who picks it up, and then this can be modeled statistically. In events such as food scares it’s important to be able to understand how a retweet of (potentially) false information proliferates through a social network so it can be addressed.
Are there any particularly recommended data analysis techniques for Twitter data? Where is a good resource to start?
You can apply and adapt existing statistical modelling techniques, but you need to be aware that issues around representation (and, consequently, p-values) need to be resolved. We recently build a survival model predicting what increases the ‘life’ of a tweet. We found that the predictors were very event specific, but publishing a tweet on certain days at certain times does significantly impact upon it’s likelihood of being retweets. This was just an adaptation of the types of models used in healthcare. For geotagged data you can plot tweets in packages such as ArcGIS or QGIS – this allows you to, for example, work out the number of tweets in a given area that contain a certain keyword. For election prediction, we could theoretically use political party and leader mentions to predict voter behavior, but because less than 1 percent of tweets have geotagging enabled we often find that we don’t have enough data to do this at the constituency level (see Sloan et al. 2013 for an introduction to geotagging on Twitter: http://www.socresonline.org.uk/18/3/7.html, see Williams et al. 2016 for an example of using tweets about low level disorder to predict crime rates in local areas: http://bjc.oxfordjournals.org/content/early/2016/03/31/bjc.azw031 and see Burnap et al. 2016 for an example of predicting elections using Twitter data: http://www.sciencedirect.com/science/article/pii/S0261379415002243).
In your opinion, what is appropriate size of data in order to conduct a study? A million tweets?
How long is a piece of string! If you can set up a collection from the Twitter API with appropriate keywords then you’re likely to gather most of the traffic around an event. Just be aware that lots of data does not mean better analysis per se. I would suggest casting your net wide with broad search terms and then refine the data once the collection has been completed. For example, if you’re interested in understanding references to the Green Party on Twitter then search for the word ‘Green’ but be prepared to do a lot of post collection filtering work to get rid of the noise – which will probably account for over 99.9 percent of the data you’ve collected.
Luke Sloan is a senior lecturer in quantitative methods and deputy director of the Social Data Science Lab at the School of Social Sciences, Cardiff University UK. He has worked on a range of projects investigating the use of Twitter data for understanding social phenomena covering topics such as election prediction, tracking (mis)information propagation during food scares and ‘crime-sensing’. His research focuses on the development of demographic proxies for Twitter data to further understand who uses the platform and increase the utility of such data for the social sciences. Sloan sits as an expert member on the Social Media Analytics Review and information Group which brings together academics and government agencies and he works closely with the Office for National Statistics and Food Standards Authority. He is also co-editor of the SAGE Handbook of Social Media Research Methods, due for publication later this year.
Joshua Tucker is a professor of politics and Russian and Slavic studies at New York University with an affiliate appointment at NYU-Abu Dhabi. He is the co-principal investigator of the NYU Social Media and Political Participation laboratory, which had its research featured in last month’s special report on technology and politics at The Economist. In 2006, the American Political Science Association awarded him its Emerging Scholar Award for the top scholar in the field of Elections, Public Opinion, and Voting Behavior within 10 years of the doctorate. Professor Tucker is also one of the co-authors of The Monkey Cage, a political science and policy blog published at The Washington Post.