Text mining refers to digital social research methods that involve the collection and analysis of unstructured textual data, generally from internet-based sources such as social media and digital archives.

In the webinar presented below, Gabe Ignatow and Rada Mihalcea discuss the fundamentals of text mining for social scientists, covering topics including research design, research ethics, natural language processing, the intersection of text mining and text analysis, and tips on teaching text mining to social science students. The pair have written two books on text mining for SAGE, this year’s An Introduction to Text Mining and 2017’s Text Mining.

Ignatow is an associate professor of sociology at the University of North Texas, where he has taught since 2007. His research interests are in the areas of sociological theory, text mining and analysis methods, new media, and information policy. His current research involves working with computer scientists and statisticians to adapt text mining and topic modeling techniques for social science applications. Ignatow has been working with mixed methods of text analysis since the 1990s.

Mihalcea is a professor of computer science and engineering at the University of Michigan. Her research interests are in computational linguistics, with a focus on lexical semantics, multilingual natural language processing, and computational social sciences. She was a general chair for the Conference of the North American Chapter of the Association for Computational Linguistics and a program cochair for the Conference of the Association for Computational Linguistics (2011) and the Conference on Empirical Methods in Natural Language Processing (2009).

In addition to the webinar, Ignatow and Mihalcea teamed up once again to answer some of the questions they didn’t get to during the hour-long webinar. Their answers can be found beneath the recording.

Are you familiar with any text mining research on education or the experiences of teachers?

There are a few school- and teaching-related studies covered in our books. I’d look up Anne O’Keeffe at the University of Limerick. She published a co-authored 2007 book with Cambridge, and she and Steve Walsh published a 2012 article that may be of interest. Walsh has a 2002 article and a 2006 book. All of these studies use corpus linguistic tools rather than text mining per se, but there is major overlap.

See also work on exploring academic performance with text mining

Is text mining an appropriate approach to use to analyze recorded interviews?

Yes. Researchers working in many diverse areas have used text mining and text analysis tools on transcribed recorded interviews. Here are a few examples.

I will be teaching a master’s level course in technical communication this summer.  They are working professionals, and I want them to build a study that reveals something important about their workplaces.  Any ideas? Constraints are workplace setting in a short 10 weeks.

Tons of ideas…too many to share in a blog post. A starting point might be the syllabus for Gabe’s hybrid grad text mining course:

As we mentioned in the webinar, we recommend a flexible course design that allows students to learn the basics of text mining and NLP, become familiar with multiple text mining approaches, and then choose an approach and develop their own project with regular feedback from the instructor and classmates.

Do you have any examples of teaching this in healthcare?

There are plenty of health care-related studies that use text mining tools. Here a few examples:

There is also related work on analyzing counseling conversations, see for instance work from our own research:

I am in a humanities-oriented communication studies program, and teach media studies to MA and PhD students – can you give any examples of small- or large-scale projects (or titles) that would be based in text mining as a method?

There are dozens of media-related studies discussed in the book. Here are a few examples:

Should text mining be inserted in applied research methods in public administration? What are the applications of text mining in public administration and nonprofit management? 

In the book we cover a few studies that are indirectly related to public administration and nonprofit management. A better starting point might be Jason Anastasopoulos at the University of Georgia. Here is one study by Jason, Tima Moldogaziev, and Tyler Scott:

“Computational Text Analysis for Public Management Research: An Annotated Application to County Budgets” |

And here are a couple of possibly pertinent studies cited by Anastasopoulos, Moldogaziev and Scott:

 What tools (software) will you recommend?

 The answer depends on the category of software tools: web scraping tools, QDAS (Qualitative Data Analysis Software), visualization  software, data cleaning tools, or languages and environments for programming and statistics such as Python and R. We recommend looking at the appendices in An Introduction to Text Mining, as well as the online companion. Internet searches for specific types of software tools can be productive as well, if you know where to start your search.

