25th September 2012 at 11:10 am #2090Katie MetzlerParticipant
Hi all, thought I’d share my summary of the Big Data Conference at the Oii last week.
I was really impressed with a couple of the speakers. Lance Bennett from the University of Washington gave a fascinating presentation on large scale platforms for discussion and deliberation and asked the question ‘Do people actually want to deliberate, or is twitter/facebook really as deep as they want to go?’. The jury is still out. He says we now have a gap between the ability to produce large scale tools for democracy and government and social movements acceptance of using them. He gave examples from both US state politics and the Occupy movement where he offered really powerful online platforms able to handle large amounts of input from the public to allow issues to be debated and discussed in real time and, in both cases, there was resistance by leaders to adopting them out of fear of the consequences of opening up debate in this way. Less surprised by government than by Occupy, who are worried that opening it up would lead to a moderating effect – the hard-core centre want to hear from people who are committed enough to show up at meetings, not the 99% apparently. In terms of methodological issues, he said the next step for them is to work on natural language processing and visualisation to allow them to categorise massive amounts of textual data in real time to show how discussions were branching off and allow you to see where you stood in relation to others in a debate.
Theo Betram, Policy Manager from Google (who incidentally used to lecture me on Samuel Beckett at Bristol Uni) talked about Google and Big Data. He gave yet more fascinating examples of how Google learns from all the masses of data it collects – the colour blue it uses for its links is one of 40 hues it tested on millions of gmail users before settling on the one that led to the most click-throughs. And did you know that when you are asked to provide human verification to access a web page, and you have to type in two blurred words, you are in fact working for Google? The first word is actually from a book Google is trying to digitise and it can’t read. You type it in and, bingo, Google is one word closer to fulfilling its aim of digitising the world’s books.
Nigel Shadbolt from the University of Southampton was a great speaker. He is the brains behind the government’s open data website. http://data.gov.uk/ Which is actually pretty cool. The idea is to ‘drive economic growth through open data.’ So for example, opening up data about bus stops and timetables leads to apps like Bus Guru, and the government is chuffed because they don’t actually have to worry about making the data useful anymore since profit driven companies can do it better. They are opening the Open Data Institute in Shoreditch in November. http://downtheditch.com/shoreditch-to-house-uk-open-data-institute Nigel said that when we’re talking about Big Data in social science research, we really should be talking about ‘Broad Data’, in that it’s about the overlaying of many different types of data sets; structured and unstructured, big and small, public and private, open and closed, person and non-personal, anonymous and identified, aggregate and individual. It’s about finding the structure in all this data and a way to link it all together so that it becomes meaningful. He said the goal is ‘data fusion, or integrating data assets.’ He then had about 20 slides about the challenges in doing this…
Solon Barocas gave a really interesting paper on bunch of issues around Big Data that don’t get discussed as much as the really obvious challenges (privacy, curation, storage and management, skills and expertise). Mainly, he was concerned with the issue of using Big Data (specifically that which is harvested from behaviour online) to make predictions based on mathematical models that are completely divorced from the social. He gave the following scary example: some researchers managed to come up with a model that can predict depression based on pattern of internet use, without even seeing what you’re doing online. So, without even knowing what websites you’re visiting, they can predict that you are depressed just by the pattern of your time online and the pattern of clicks. If the model works, who cares about the rest. Could we be abandoning an interest in causation? He also said something I thought was worth remembering. He made the point that these online platforms have all been engineered to push you to behave in certain ways and have been user-tested to manufacture the behaviours that they want (and can cope with in their design). So to treat the way we behave on these platforms as ‘observational data’ is problematic.
My thoughts? Big data can tell us some stuff, but it will only ever help us to answer certain types of questions – those answerable with quant methods. There was some talk that Big Data was going to ‘mathematise’ the social sciences, but I think that’s overstating it. There will still be researchers who want to ask the ‘why’ questions that billions of lines of anonymous data aren’t going to be any good for answering. The potential for companies to learn from their data may be huge, and the potential for governments to start making more data-informed decisions also seems real, though the challenges facing government seem pretty enormous still (lack of skills and expertise, no budget to innovate, bigger priorities, massive questions around privacy and their ability to be trusted managing sensitive data). It seems at the moment, the idea is just to open up our data to third party profiteers, which I am not overly thrilled about!!
Would be interested to hear what others think!
- The forum ‘Default Forum’ is closed to new topics and replies.