What’s afoot in the Qualitative AI space?

Data Analysis

May 9

by Christina Silver, Ph.D.

Christina has been using and teaching qualitative data analysis software packages for over 20 years. She has conducted many different research projects, big and small, with and without the use of qualitative software – including her own studies and many other academic and applied research projects. As well as being director of Qualitative Data Analysis Services Ltd, she is leads and directs the CAQDAS Networking Project at the University of Surrey.

This post originally appeared on her blog.

There’s a lot of discussion about the potential implications of advances in AI on qualitative data analysis happening right now. The cynical might even say there’s a whole bunch of jumping on the bandwagon going on, heralding shortcuts to analysis (mainly around time-saving) and new “insights” supposedly afforded by these technologies. Others have their literal and metaphorical arms in the air in horror at the idea of computer assistance for qualitative analysis in this form, aghast at the idea of Qualitative-AI and it’s potential to harm our “craft”. As is often the case, it’s the more measured middle ground where the most meaningful and useful debates are to be had.

AI is here to stay and clearly has an impact on the practice of qualitative data analysis, but before the qualitative research community dismisses it out of hand or embraces it without critical thinking, let’s remember and reflect on two things;

AI, and in particular machine learning, likely the most useful form for qualitative data analysis in my opinion, is actually not new in the CAQDAS-field;
the way AI is viewed and the uses to which it can usefully be put are contingent on several factors, not least methodological paradigm, amounts and kinds of qualitative materials being analysed, and the purposes to which the AI tools are put within our workflow.

This post looks briefly at the history of AI in qualitative software (CAQDAS-packages), giving a quick overview of options and the topics being discussed right now in this space. I’ll make some comments about point 2. in a later post - The methodological piece: the role of AI in qualitative analysis – coming soon.

It’s not new, folks

When you look at the trajectory of the Computer Assisted Qualitative Data AnalysiS (CAQDAS) field, AI actually has a relatively long history. The first qualitative software programs were available from the late 1980s; more than 30 years ago now. Debate about the use of computers in qualitative analysis has ensued ever since the beginning, so the discussions we’re now having are themselves not new (this article by Kristi Jackson, Trena Paulus & Nick Woolf is a good one to read if you’re interested in these debates and how unsubstantiated criticisms of CAQDAS are propagated).

Qualrus: the first ‘intelligent CAQDAS’ (no longer available)

The first CAQDAS-package to include real assistance - i.e. beyond searching for words/phrases and auto-coding the hits - came around 15 years into the CAQDAS-history, with Qualrus, developed by Prof Ed Brent (University of Missouri and Ideaworks Inc.) and available from 2002. Although no longer available, this CAQDAS-package included case-based reasoning, natural language understanding, machine learning and semantic networks to suggested codes based on patterns in qualitative data. These suggestions could be accepted or rejected as appropriate by the user and the program learnt based on those decisions which were subsequently used to inform subsequent suggested codes.

If you want to know more about what Qualrus did, you can check out the review of Qualrus myself and Ann Lewins wrote back in 2010 for the CAQDAS Networking Project.

Discovertext: balancing what humans and computers do best (free for academics)

Discovertext, an online Ap developed by Dr. Stuart Shulman and colleagues under the auspices of U.S. National Science Foundation-funded academic research has been balancing human interpretation and the power of machine learning since 2009, almost fifteen years now. It developed out of the Coding Analysis Toolkit (CAT) that was a tool for qualitative measurement and adjudication available from 2007-2020. Amongst the key sets of AI tools embedded in Discovertext are: automatic duplicate detection and near-duplicate clustering (analogous to plagiarism detection), and machine-learning coding tools based on initial human coding (undertaken collaboratively by ‘peers’) that is adjudicated by humans, and then used to train a machine classifier that will score the likelihood that additional data falls into the categories and then coding on that basis.

If you want to know more about Discovertext, you can check out:

the Discovertext website (where you can access it for free if you’re an academic),
the review of Discovertext I wrote in 2019 for the CAQDAS Networking Project,
the #CAQDASchat podcast episode I did with Stu Shulman recently where he discusses the tool and describes its core features,
the scholarly and other published mentions of Discovertext where you can see how others have been using it
a webinar Stuart Shulman gave for the CAQDAS Networking Project on humans and machines learning together

Provalis Research tools (WordStat and QDA Miner)

Provalis Research develops several analytic products, among which are QDA Miner and Wordstat that provide tools for the qualitative analysis and data mining of textual material. These tools include both unsupervised machine learning models, such as topic extraction using clustering (available in WordStat since 1999), clustered coding (available in QDA Miner since 2011), topic modelling (available in WordStat since 2014), and supervised machine learning such as automatic document classification (available in WordStat since 2005), query-by-example (available in QDA Miner since 2007) and code similarity searching (available in QDA Miner since 2011).

If you want to know more about these tools, you can check out:

The Provalis website (where a search for ‘machine learning’ will bring up several resources, including
The recording of a webinar presented by Provalis Research CEO Normand Péladeau on Automatic Document Classification
A webinar Normand Péladeau gave for the CAQDAS Networking Project on The black box of sentiment analysis: What's in it, and how to do it better.
A review of QDA Miner myself and Ann Lewins wrote for the CAQDAS Networking Project in 2020

Leximancer

Developed by Andrew Smith and Michael Humphreys in 2000, Leximancer includes unsupervised machine learning tools for automatic content analysis that generate concept models of textual material presented in visualisations of categories and relationships. The language models used are based on the data being analysed and using semi-supervised learning can be trained using user-specified variables.

If you want to know more about these tools check out:

o A review of Leximancer myself and Ann Lewins wrote for the CAQDAS Networking Project in 2020

o A webinar given by Andrew Smith for the CAQDAS Networking Project on managing patterns of meaning latent in text using Leximancer

o This article about unsupervised semantic mapping of natural language with Leximancer concept mapping, by Andrew Smith and Michael Humphreys

o The Leximancer website

Other forms of assistance for qualitative data analysis: automated transcription

It’s also worth mentioning that AI has also infiltrated the qualitative research field in the form of automated transcription for several years, and the Covid-pandemic accelerated its normalisation as qualitative researchers, many of whom may previously have balked at the idea of gathering qualitative data online were forced to do so. Suddenly it became the norm to interview participants and conduct focus-group discussions via video-conferencing platforms such as Zoom, Microsoft Teams and Google Hangouts. With the recordings came the automated transcripts, and suddenly their use began to become more widely accepted in our community of practice. Note, also that some CAQDAS packages had previously developed AI driven automated transcription tools, such as NVivo Transcription and the more recently developed Quirkos Transcribe

Newer Qual-AI tools

Okay, so now we’ve established that computer-assistance in qualitative analysis isn’t actually “new”, what’s causing all the hullabaloo right now? It seems to me there are three genres of newer Qual-AI tools on the deck that qualitative researchers are variously embracing, exploring and resisting:

using chatbot tools like ChatGPT alongside CAQDAS-packages to facilitate different aspects of qualitative data analysis;
the integration of these newer AI capabilities into existing CAQDAS-packages; and
the development of new Aps designed specifically to harness new AI capabilities for qualitative analysis.

Let’s check each out…

ChatGPT in qualitative analysis

The release of OpenAI’s ChatGPT is what prompted much of the hullabaloo we’re seeing in the qualitative community; what many researchers are responding to – some in delight, some in interest, some in despair! Many of us have experimented and commented on the use of ChatGPT use alongside CAQDAS-packages, coming to different conclusions…here’s three…

Philip Adu discusses using ChatGPT as a data summarization tool, advocating this when working with large amounts of data that it is “not humanly possible” to analyse. The example Philip gives in this presentation is extracting transcripts from YouTube and having ChatGPT summarize the content as a pre-cursor to analysis that is then undertaken using NVivo. Summaries of different lengths are generated by ChatGPT using prompts such as ‘Summarize the following’ (producing a short paragraph) and ‘Continue with 1000 words of summary’ to generate a more detailed summary (see circa 30 minutes into the presentation).
Andreas Muller discusses in this video using ChatGPT to summarize coded data, generate ideas for developing a coding framework and to define codes in addition to summarizing data itself. For example, these uses in combination with MAXQDA, including prompts for using ChatGPT to summarize an interview transcript, coded data segments relating to a topic, suggesting potential codes from qualitative material and generating summaries of coded segments that can be used as code definitions (including examples).

The integration of open-AI into existing CAQDAS-packages

We’ve recently seen two of the pioneer CAQDAS-packages integrate Open-AI tools into their products (subsequent posts will look at these in more detail)

ATLAS.ti has recently released a beta version of it’s Open-AI powered “open coding” feature that will automatically suggest and code selected textual transcripts. The user is presented with the suggested codes and has the option to accept or reject them (and also add your own codes to quotations) before choosing to implement the coding. The coding happens at the level of paragraphs (from which the coded quotations are created).
MAXQDA had also recently released a beta version of it’s “AI Assist” tool which offers the creation of different levels of summary (standard, shorter or text in bullet points) based on segments of text that have already been coded. These summaries can be added to Code Memos or generated within Summary Grids and then used in the same way any summaries of coded data would have been, that researchers had written themselves. The MAXQDA AI Assist tool is also powered by Open AI.

CoLoop - a new Ap designed for Qualitative-AI

In addition are the emergence of new Aps designed specifically to facilitate qualitative analysis using Open-AI tools. One example, currently in beta, is CoLoop, developed by Genei.io (keep an eye out for a subsequent post from me detailing more about this tool).

CoLoop is an “AI Copilot” for qualitative research that allows the user to ask questions of the data using AI prompts using a “chat” function. This can be used to summarize textual material that you upload into the system (e.g. transcripts of interviews and focus-groups or other textual material) based on prompts. The supporting evidence – i.e. the qualitative data that the summaries are based on – are accessible and navigable within the system. In addition, users specify at the outset what the project objectives are, and can optionally upload an overview or interview questions etc. that are used to teach the system what you’re trying to achieve with the analysis. This is used by CoLoop to generate an editable project description. Speakers labelled in a customary way will be automatically identified such that you can ask questions based on what certain speakers say about a topic etc. in the transcripts.

Once data are uploaded, CoLoop will automatically generate and allow you to review codes, based on the content of the uploaded material. You can also add your own codes to the system. This produces a high-level overview of the material, which can be refined by using prompts to ask specific questions (there are a series of suggestions provided based on the content of the material uploaded). A series of separate ‘conversations’ can be held in the CoLoop without loosing earlier content – by resetting the chat to prevent getting stuck in loops. CoLoop also includes AI generated transcription, allowing audio files to be uploaded and transcribed, and an AI-generated ‘analysis grid’ that allows the overviews and verbatim segments to be compared across speakers.

The key difference for qualitative data analysis between CoLoop and the use of OpenAI powered tools like ChatGPT is that what CoLoop generates is based on the transcripts you upload, rather than its understanding generated from online sources. In addition, the algorithm has been instructed against hypothesising or offering suggestions unless explicitly instructed, and all information contains referenced sources to the transcripts. Therefore, where there is no relevant answer in the transcripts to a prompt given by the researcher, CoLoop will say so.

There’s much afoot

So its clear there’s a lot going on in the Qual-AI space – and there has been for a while. You might not have been aware of the use of AI in CAQDAS before the more recent developments, and if you weren’t then its definitely worth exploring them in more depth – as well as those newer ones on the block in the past few months.

More Sage Research Methods Community Posts About Data Analysis

Blog

Data Analysis, Big Data

Recent Advances in Partial Least Squares Structural Equation Modeling: Disclosing Necessary Conditions

Data Analysis, Big Data

Learn about options available in the dynamic landscape of emerging methodological extensions in the PLS-SEM field is the necessary condition analysis (NCA).

Data Analysis, Big Data

Research Design, Data Collection, Data Analysis, Communicating Research

Research Stages: A 2023 Recap

Research Design, Data Collection, Data Analysis, Communicating Research

Looking back at 2023, find all posts here!
We explored stages of a research project, from concept to publication. In each quarter we focused on one part of the process. In this recap for the year you will find original guest posts, interviews, curated collections of open-access resources, recordings from webinars or roundtable discussions, and instructional resources.

Research Design, Data Collection, Data Analysis, Communicating Research

Impact & Society, Research Design, Data Collection, Data Analysis, Communicating Research, Teaching Methods

Methods Film Fest: Researchers Share Insights

Impact & Society, Research Design, Data Collection, Data Analysis, Communicating Research, Teaching Methods

Methods Film Fest!
We can read what they write, but what do researchers say? What are they thinking about, what are they exploring, what insights do they share about methodologies, methods, and approaches? In 2023 Methodspace produced 32 videos, and you can find them all in this post!

Impact & Society, Research Design, Data Collection, Data Analysis, Communicating Research, Teaching Methods

Data Analysis

Choosing digital tools for qualitative data analysis

Data Analysis

Christina Silver explains why and how to use qualitative data analysis software to manage and analyze your notes, literature, materials, and data. Sign up for her upcoming (free) symposium!

Data Analysis

Teaching Methods, Data Analysis

Use Research Cases to Teach Methods for Large-Scale Data Analysis

Teaching Methods, Data Analysis

Use research cases as the basis for individual or team activities that build skills.

Teaching Methods, Data Analysis

Data Analysis, Communicating Research

Finding gems in limited data: How we went from “ungeneralizable” to valuable findings

Data Analysis, Communicating Research

How do you find gems in a research project when the data is too thin for generalizations? In this post researchers discuss creative ways to learn from (and write about) the experience.

Data Analysis, Communicating Research

Data Analysis

Analyzing Qualitative and/or Quantitative Data

Data Analysis

The focus for Q3 of 2023 was on analyzing and interpreting qualitative and quantitative data. Find all the posts, interviews, and resources here!

Data Analysis

Data Analysis, Research Design, Skills

What is randomness?

Data Analysis, Research Design, Skills

Dr. Stephen Gorard defines and explains randomness in a research context.

Data Analysis, Research Design, Skills

Data Analysis

The power of prediction

Data Analysis

Mentor in Residence Stephen Gorard explains how researchers can think about predicting results.

Data Analysis

Data Analysis, Diversity Equity & Inclusion

Part Two: Equity Approaches in Quantitative Analysis

Data Analysis, Diversity Equity & Inclusion

The Career and Technical Education (CTE) Equity Framework approach draws high-level insights from this body of work to inform equity in data analysis that can apply to groups of people who may face systemic barriers to CTE participation. Learn more in this two-part post!

Data Analysis, Diversity Equity & Inclusion

Part One: The Need for Equity Approaches in Quantitative Analysis

Data Analysis, Diversity Equity & Inclusion

Data Analysis, Teaching Methods

Teaching and learning quantitative research methods in the social sciences

Data Analysis, Teaching Methods

Instructional tips for teaching quantitative data analysis.

Data Analysis, Teaching Methods

Data Analysis

How can we judge the trustworthiness of a research finding?

Data Analysis

In an era of rampant misinformation and disinformation, what research can you trust? Dr. Stephen Gorard offers guidance!

Data Analysis

Analysing complex qualitative data - a brief guide for undergraduate social science research

Data Analysis

Learn how inductive and deductive styles of reasoning are used to interpret qualitative research findings.

Data Analysis

Methods Innovation, Data Analysis

Image as data: Automated visual content analysis for social science

Methods Innovation, Data Analysis

Images contain information absent in text, and this extra information presents opportunities and challenges. It is an opportunity because one image can document variables with which text sources (newspaper articles, speeches or legislative documents) struggle or on datasets too large to feasibly code manually. Learn how to overcome the challenges.

Methods Innovation, Data Analysis

Data Analysis

What to do about missing data?

Data Analysis

Tips for dealing with missing data from Dr. Stephen Gorard, author of How to Make Sense of Statistics.

Data Analysis

How Standard is Standard Deviation?

Data Analysis

Learn more about standard deviation from a paper and presentation from Dr. Stephen Gorard.

Data Analysis

Video Data Analysis: How 21st century video data reshapes social science research

Data Analysis

Video capture is ubiquitous. What does it mean for researchers, and how can we analyze such data?

Data Analysis

Qual Data Analysis & Phenomenology

Data Analysis

Qualitative data analysis varies by methodology. Learn about approaches for phenomenological studies through this collection of open access articles.

Data Analysis

Qual Data Analysis & Narrative Research

Data Analysis

Learn about qualitative data analysis approaches for narrative and diary research in these open access articles.

Data Analysis

Qual Data Analysis & Ethnography

Data Analysis

Ethnography involves the production of highly detailed accounts of how people in a social setting lead their lives, based on systematic and long-term observation of, and discussion with, those within the setting.

Data Analysis