I recently got jazzed about two findings coming out of the world of computational social science, primarily because they hit so close to my home (hello, junior faculty feeling the pressure to produce):
- Dashun Wang looked at the career trajectories of 30,000 artists across multiple types of field: art, film, and science. He found that across all of these fields, productivity typically happens in temporal bursts, or hot-streaks, and can happen anytime in your career (if you haven’t hit yours yet, there’s still time!).
- Satyam Mukherjee and colleagues analyzed 28.5 million scientific papers and 5.3 million U.S. patents and found, in both science and technology, high impact papers and patents had a similar citation pattern: a low mean age of the works they cited, and a high variance (go ahead, cite those papers from the 1920s!).
These particular insights, claiming to have uncovered universal “science of science” patterns, re-ignited a debate I often find myself having (with myself and others). Can we really meaningfully compare such different contexts, e.g. careers in film to careers in science? Should we be using computational social science to identify these abstract, near-universal patterns? Does this mean theory, narrative detail, and uniqueness of context are no longer important?
The answer, of course, is no. These abstract universal patterns are great, but so is theory, narrative, and context.
Many people in the social sciences, however, all too often pose these as a trade-off: “big computation” necessarily comes at the expense of narrative richness, contextual detail, and theory. Or, conversely, narrative richness can only happen without computation. I call this the (assumed) computation/context trade-off, and it has been repeated so often that it has almost become a truism.
This claim, however, is simply not true. There is no computation/context trade-off. Computational methods, including machine learning, can be used to find universal or near-universal patterns like those listed above (if that’s what you’re into), but they can also be used to enhance qualitative, interpretive, and context-specific research. You can take my word for it, but you can also trust many many many many many others.
It turns out, humans are really good at some things, and really bad at other things. Humans are great at interpretation, critical thinking, and building theory that is important to our disciplines. Humans are pretty darn bad at identifying reliable patterns across large amounts of data, we are prone to bias when we’re looking for evidence (there are hundreds of cognitive biases that even scientific researchers are guilty of).
Enter the Computer, Human, Computer` research design (with our powers combined!). AKA,
Computer and information scientists have developed some pretty amazing tools for identifying patterns in complex, meaningful data. When applied carefully and with a full understanding of the assumptions behind the tools, and when done so while staying firmly grounded in the established epistemology of our discipline, we can inch closer to our epistemological goals. We should let computers do what they’re good at, to allow humans to do what they’re best at.
What is C-H-C`? Start with computer-assisted inductive analysis (or deductive analysis, depending on your question). Follow that step by interpreting these patterns – go back to the data, read the context and provide the rich narrative and detail that is so important in qualitative analysis. And then return to computational analysis to confirm what you have found indeed holds across your data. Boom. We achieve three things that are important to qualitative methods:
- Meaning – the same action does not always mean the same thing to different people. We don’t need to assume it does, and we can now look for differences in meaning across many more groups and people.
- Potential for iteration – if you find your particular categories don’t work well for the data, change them. Start over. Computers are fast and they don’t care about doing the same thing over and over.
- Transparency – go ahead, run my code to reproduce my results as many times as you want.
I call on all researchers, computational or otherwise: let’s put an end to this computation/context trade-off nonsense. Whenever the computation/context trade-off is repeated, whenever it rears its ugly, false, head, gently correct your interlocutor: there is no trade-off. It doesn’t exist. Computation can reveal context. It can enhance our contextual understanding. And it can lead to theoretical and narrative richness. Go forth, add context to your computation.