The validity problem with automated content analysis

By Dr. Chung-hong Chan, Fellow at the Mannheim Centre for European Social Research (MZES)

There’s a validity problem with automated content analysis tools. For years, researchers have recognized the need for models developed for one specific context to undergo a process of revalidation before they are applied to other studies. Yet the question of how to do this has until now remained unanswered, meaning that many papers that use these tools are published without such revalidation, and thus are probably invalid. 

Now, a solution is in sight. A new R package - oolong - provides a set of simple and standardized tests for frequently used text analytic tools. In this blog post, the creator of oolong Dr. Chung-hong Chan gives an overview of the tool’s capabilities with examples of validity tests you can apply to your research right away.


Content analysis is a summarizing, quantitative analysis of messages that relies on the scientific method (including attention to objectivity-intersubjectivity, a priori design, reliability, validity, generalizability, replicability, and hypothesis testing) and is not limited as to the types of variables that may be measured or the context in which the message are created or presented.
— Neuendorf (2002) - The Content Analysis Guidebook

Just as an automated teller machine (ATM) is a teller machine, an automated content analysis is still a content analysis. Content analysis always involves annotation of materials (coding). In a traditional content analysis, this annotation is done manually. In an automated content analysis, as you can guess, this step is automated. This is usually done using topic modeling or off-the-shelf dictionaries (e.g. the hugely popular Linguistic Inquiry and Word Count). Although popular, the validity of these methods have been called into question, with many calling for proper validation of these methods for researchers’ own applications.

But how can we go about validating these methods? The most important part of the process is semantic validation: checking whether the model’s results make semantic sense. Since the human brain is still the best device we have for deciphering meanings from text, semantic validation is boiled down to comparing models’ results with human annotations.

I developed the R package oolong to make it easy to conduct semantic validation of automated content analysis. R is not intended to be an annotation interface; as its official website states: “R is a free software environment for statistical computing and graphics”. One of the main innovations of oolong is to integrate the creation, administration, and analysis of human-in-the-loop tests.

Validating topic models

Suppose you have a topic model named abstract_stm. With oolong, the whole procedure of creating a validation test, administering the test and analyzing the test can be completed with only four lines of code:

```r

oolong_test <- create_oolong(abstracts_stm)

oolong_test$do_word_intrusion_test()

oolong_test$lock()

oolong_test

```

This test, which is called “word intrusion test”, is administered through an easy-to-use graphical user interface, like this:

These words are generated from a topic of the topic model abstract_stm. Their meaning should be very similar. However, one random word is inserted into the bunch. The goal of this test is to pick up the odd word (intruder word) from this bunch of words. If the topic model makes semantic sense, our human raters should be able to pick up the odd word (e.g. “coverag”). The likelihood of our human raters identifying these odd words can be quantified as the precision of the model. The higher the precision, the more the model is semantically making sense.

Oolong also supports “topic intrusion test”. For more details, please refer to the overview of the package.

Validating dictionary-based methods

Suppose you want to annotate the sentiment of tweets in a datatset called trump2k, it is as simple as:

```r

oolong_test <- create_oolong(input_corpus = trump2k, construct = "positive")

oolong_test$do_gold_standard_test()

oolong_test$lock()

oolong_test

```

The test interface in oolong looks like this:

Fig. 2: Sentiment annotation test in oolong

In this case, our human raters need to pick a score of 1 to 5 to denote how positive this tweet is. After the human annotations, oolong can generate a diagnostic figure like so:

Fig. 3: Graphs showing distribution of sentiment annotations in oolong

This figure shows how well the target value (in this case, the sentiment score generated from AFINN) correlates with human judgement of sentiment. The figure also displays other information about the validity and reliability of the dictionary-based method. Once again, for more information, please refer to the overview of the package.

Further development

Oolong has developed a lot over the last few months. For example, it now supports all topic model packages in the R ecosystem (topicmodels, stm, text2vec, BTM, keyATM, etc.). We are also experimenting with a “two-finger” keyboard interface for super speedy human annotation of text. In the video below, a human rater is annotating a large collection of text for their relevance to the research question. The human rater can annotate 5 articles in 20 seconds. All she needs to do is to read the articles and press `q` or `w` on her keyboard.

With the support from the SAGE Concept Grant, we will extend oolong to support massive annotation over the internet. For example, we are developing an internet-deployable version of oolong, so that the human raters don’t need to install R on their computer. We are also considering the use of crowd-sourcing platforms such as Prolific.

 

Oolong is available at: https://github.com/chainsawriot/oolong

Find out more about the SAGE Concept Grants here.

 

Previous
Previous

When reading a visualization is what we see really what we get?

Next
Next

Doing Qualitative Research in a Digital World: Interview with the Authors