Text mining refers to digital social research methods that involve the collection and analysis of unstructured textual data, generally from internet-based sources such as social media and digital archives.
In the webinar presented below, Gabe Ignatow and Rada Mihalcea discuss the fundamentals of text mining for social scientists, covering topics including research design, research ethics, natural language processing, the intersection of text mining and text analysis, and tips on teaching text mining to social science students. The pair have written two books on text mining for SAGE, this year’s An Introduction to Text Mining and 2017’s Text Mining.
Ignatow is an associate professor of sociology at the University of North Texas, where he has taught since 2007. His research interests are in the areas of sociological theory, text mining and analysis methods, new media, and information policy. His current research involves working with computer scientists and statisticians to adapt text mining and topic modeling techniques for social science applications. Ignatow has been working with mixed methods of text analysis since the 1990s.
Mihalcea is a professor of computer science and engineering at the University of Michigan. Her research interests are in computational linguistics, with a focus on lexical semantics, multilingual natural language processing, and computational social sciences. She was a general chair for the Conference of the North American Chapter of the Association for Computational Linguistics and a program cochair for the Conference of the Association for Computational Linguistics (2011) and the Conference on Empirical Methods in Natural Language Processing (2009).
In addition to the webinar, Ignatow and Mihalcea teamed up once again to answer some of the questions they didn’t get to during the hour-long webinar. Their answers can be found beneath the recording.
Are you familiar with any text mining research on education or the experiences of teachers?
There are a few school- and teaching-related studies covered in our books. I’d look up Anne O’Keeffe at the University of Limerick. She published a co-authored 2007 book with Cambridge, and she and Steve Walsh published a 2012 article that may be of interest. Walsh has a 2002 article and a 2006 book. All of these studies use corpus linguistic tools rather than text mining per se, but there is major overlap.
O’Keeffe, A., & Walsh, S. (2012). Applying corpus linguistics and conversation analysis in the investigation of small group teaching in higher education. Corpus Linguistics and Linguistic Theory, 8(1), 159-181.
Walsh, S. (2002). Construction or obstruction: Teacher talk and learner involvement in the EFL classroom. Language teaching research, 6(1), 3-23.
See also work on exploring academic performance with text mining
Pennebaker, J.W., Chung, C.K., Frazee, J., Lavergne, G.M., & Beaver, D.I. (2014). When small words foretell academic success: The case of college admissions essays. PLoS ONE 9(12): e115844. doi:10.1371/journal.pone.0115844.
Is text mining an appropriate approach to use to analyze recorded interviews?
Yes. Researchers working in many diverse areas have used text mining and text analysis tools on transcribed recorded interviews. Here are a few examples.
Busanich, Rebecca, McGannon, Kerry and Robert Schinke. (2014). “Comparing Elite Male and Female Distance Runners’ Experiences of Disordered Eating Through Narrative Analysis.” Psychology of Sport and Exercise 15(6): 705–712.
Andersen, Ditte . (2015). Stories of change in drug treatment: A narrative analysis of ‘whats’ and ‘hows’ in institutional storytelling. Sociology of Health & Illness, 37(5), 668–682.
Laird, Elizabeth A., McCance, Tanya, McCormack, Brendan and Bernadette Gribben. (2015). Patients’ Experiences of In-hospital Care When Nursing Staff Were Engaged in a Practice Development Programme to Promote Person-centredness: A Narrative Analysis Study. International Journal of Nursing Studies 52(9), 1454-1462.
Halberstadt, A., Langley, H., Hussong, A., Rothenberg, W,, Coffman, J., Mokrova, I. & P. Costanzo. (2016). Parents’ Understanding of Gratitude in Children: A Thematic Analysis. Early Childhood Research Quarterly 36: 439–451.
Strachan, J., Yellowlees, Gill and April Quigley. (2015). General practitioners’ assessment of, and treatment decisions regarding, common mental disorder in older adults: Thematic analysis of interview data. Ageing and Society 35(1), 150-168.
Fereday, J. & E. Muir-Cochrane. (2006). Demonstrating Rigor Using Thematic Analysis: A Hybrid Approach of Inductive and Deductive Coding and Theme Development. International Journal of Qualitative Methods 5(1), 80-92.
I will be teaching a master’s level course in technical communication this summer. They are working professionals, and I want them to build a study that reveals something important about their workplaces. Any ideas? Constraints are workplace setting in a short 10 weeks.
Tons of ideas…too many to share in a blog post. A starting point might be the syllabus for Gabe’s hybrid grad text mining course: https://sites.google.com/site/classes390e/contentanalysis
As we mentioned in the webinar, we recommend a flexible course design that allows students to learn the basics of text mining and NLP, become familiar with multiple text mining approaches, and then choose an approach and develop their own project with regular feedback from the instructor and classmates.
Do you have any examples of teaching this in healthcare?
There are plenty of health care-related studies that use text mining tools. Here a few examples:
Bell, E., S. Campbell, & Lynette R Goldberg. (2015). Nursing Identity and Patient-centredness in Scholarly Health Services Research: A Computational Text Analysis of PubMed Abstracts, 1986–2013. BMC Health Services Research 15(3):1-16.
Hakimnia, R., Holmström, I., Carlsson, M., & A. Höglund. (2014). Exploring the Communication Between Telenurse and Caller—A Critical Discourse Analysis. International Journal of Qualitative Studies on Health and Well-being 9: 1-9.
Schuster, J., Beune, E., & K. Stronks. (2011). Metaphorical Constructions of Hypertension Among Three Ethnic Groups in the Netherlands. Ethnicity and Health 16(6), 583-600.
Shepherd, A., Sanders, C., Doyle, M. & Shaw, J. (2015). Using social media for support and feedback by mental health service users: Thematic analysis of a Twitter conversation. BMC Psychiatry 15(29).
There is also related work on analyzing counseling conversations, see for instance work from our own research:
“Understanding and Predicting Empathic Behavior in Counseling Therapy,” Verónica Pérez-Rosas, Rada Mihalcea, Kenneth Resnicow, Satinder Singh, Lawrence An 55th Annual Meeting of the Association for Computational Linguistics (ACL), 2017
“Predicting Counselor Behaviors in Motivational Interviewing Encounters,” Verónica Pérez-Rosas, Rada Mihalcea, Kenneth Resnicow, Satinder Singh, Lawrence An, Kathy J. Goggin, Delwyn Catley, European Association for Computational Linguistics (EACL), 2017
I am in a humanities-oriented communication studies program, and teach media studies to MA and PhD students – can you give any examples of small- or large-scale projects (or titles) that would be based in text mining as a method?
There are dozens of media-related studies discussed in the book. Here are a few examples:
Bail, Chris. (2012). “The Fringe Effect: Civil Society Organizations and the Evolution of Media Discourse about Islam since the September 11th Attacks.” American Sociological Review 77(6): 855-879.
Bastin, Gilles and Milan Bouchet-Valat. (2014). “Media Corpora, Text Mining, and the Sociological Imagination: A Free Software Text Mining Approach to the Framing of Julian Assange by Three News Agencies.” Bulletin de Méthodologie Sociologique 122: 5-25.
Bednarek, Monica and Helen Caple. (2014). “Why Do News Values Matter? Towards a New Methodological Framework for Analyzing News Discourse in Critical Discourse Analysis and Beyond.” Discourse & Society 25(2): 135-158.
Bickes, Hans, Otten, Tina and Laura Chelsea Weymann (2014). “The Financial Crisis in the German and English Press: Metaphorical Structures in the Media Coverage on Greece, Spain and Italy.” Discourse & Society 25(4): 424-445.
Eshbaugh-Soha, Matthew. (2010). “The Tone of Local Presidential News Coverage.” Political Communication 27(2): 121-140.
Koenig, Thomas, Mihelj, Sabina, Downey, John and Mine Gencel-Bek. (2006). “Media Framings of the Issue of Turkish Accession to the EU.” Innovation: The European Journal of Social Science Research 19(2): 149-169.
Should text mining be inserted in applied research methods in public administration? What are the applications of text mining in public administration and nonprofit management?
In the book we cover a few studies that are indirectly related to public administration and nonprofit management. A better starting point might be Jason Anastasopoulos at the University of Georgia. Here is one study by Jason, Tima Moldogaziev, and Tyler Scott:
“Computational Text Analysis for Public Management Research: An Annotated Application to County Budgets” | https://scholar.harvard.edu/files/janastas/files/final-draft-2017-v4_tm_ts_la_edits.pdf
And here are a couple of possibly pertinent studies cited by Anastasopoulos, Moldogaziev and Scott:
Pandey, Sheela, Sanjay K Pandey, and Larry Miller. 2017. “Measuring Innovativeness of Public Organizations: Using Natural Language Processing Techniques in Computer-Aided Textual Analysis.” International Public Management Journal 20 (1): 78–107.
Grimmelikhuijsen, Stephan, Lars Tummers, and Sanjay K. Pandey. 2017. “Promoting State-of-the-Art Methods in Public Management Research.” International Public Management Journal 20 (1): 7-13.
What tools (software) will you recommend?
The answer depends on the category of software tools: web scraping tools, QDAS (Qualitative Data Analysis Software), visualization software, data cleaning tools, or languages and environments for programming and statistics such as Python and R. We recommend looking at the appendices in An Introduction to Text Mining, as well as the online companion. Internet searches for specific types of software tools can be productive as well, if you know where to start your search.