# Discovering Statistics

## Problem with factors in line graphs (Chapter 4 of DSUR)

Viewing 2 posts - 1 through 2 (of 2 total)
• Author
Posts
• #589
Gavin Revie
Member

I’ve been working through Discovering Statistics Using R.  I’m using R version 3.2.0, run through the latest version of R Studio.  I’ve noticed a few errors which I imagine you’re already aware of (e.g. there is no package called DSUR, the opts command no longer works etc.).  None of these caused me serious problems since a little background reading allowed me to work around these issues (with the exception of the outlierSummary command :-()

Today I came across something that left me scratching my head though.  I’m in the section of chapter 4 dealing with line graphs.

I enclose my syntax below:

setwd(“C:/Users/Gavin/Dropbox/R Stuff/Field Chapter 4”)
library(ggplot2)
View(hiccupsData)
hiccups <- stack(hiccupsData)
names(hiccups) <- c(“Hiccups”, “Intervention”)
hiccups\$Intervention_Factor <- factor(hiccups\$Intervention, levels = hiccups\$Intervention)
summary(hiccups\$Intervention_Factor)
summary(hiccups\$Intervention)

If you run this syntax you’ll find that the console output looks like this:

> setwd(“C:/Users/Gavin/Dropbox/R Stuff/Field Chapter 4”)
> library(ggplot2)
> hiccupsData <- read.delim(“Hiccups.dat”, header = TRUE)
> View(hiccupsData)
> hiccups <- stack(hiccupsData)
> names(hiccups) <- c(“Hiccups”, “Intervention”)
> hiccups\$Intervention_Factor <- factor(hiccups\$Intervention, levels = hiccups\$Intervention)
Warning message:
In ‘levels<-‘(‘*tmp*’, value = if (nl == nL) as.character(labels) else paste0(labels, :
duplicated levels in factors are deprecated
> summary(hiccups\$Intervention_Factor)
Baseline Baseline Baseline Baseline Baseline Baseline Baseline Baseline Baseline
15 0 0 0 0 0 0 0 0
Baseline Baseline Baseline Baseline Baseline Baseline Tongue Tongue Tongue
0 0 0 0 0 0 15 0 0
Tongue Tongue Tongue Tongue Tongue Tongue Tongue Tongue Tongue
0 0 0 0 0 0 0 0 0
Tongue Tongue Tongue Carotid Carotid Carotid Carotid Carotid Carotid
0 0 0 15 0 0 0 0 0
Carotid Carotid Carotid Carotid Carotid Carotid Carotid Carotid Carotid
0 0 0 0 0 0 0 0 0
Rectum Rectum Rectum Rectum Rectum Rectum Rectum Rectum Rectum
15 0 0 0 0 0 0 0 0
Rectum Rectum Rectum Rectum Rectum Rectum
0 0 0 0 0 0
Warning message:
In ‘levels<-‘(‘*tmp*’, value = if (nl == nL) as.character(labels) else paste0(labels, :
duplicated levels in factors are deprecated
> summary(hiccups\$Intervention)
Baseline Carotid Rectum Tongue
15 15 15 15

What the summary command is telling me here is that the hiccups\$Intervention_Factor has 60 levels rather than 4 and 56 of them have zero cases, whereas the “Intervention” variable, while we didn’t tell R to make it a factor, it has nonetheless correctly set it up as a factor with 4 levels.  I spent a long time scratching my head trying to figure out why this was.  Eventually I found a way of “fixing” this, either by creating the factor from scratch using the following commands after pulling everything out of the dataframe into individual objects and putting it back again:

interventionfactor <- c(rep(1,15), rep(2,15), rep(3,15), rep(4,15))
interventionfactor <- factor(interventionfactor, levels = c(1:4), labels = c(“Baseline”, “Tongue”, “Carotid”, “Rectum”))
hiccups <- data.frame(Participant = participant, Hiccups = hiccupsdata, Intervention = intervention, Intervention_Factor = interventionfactor)

Or much more simply by using the droplevels command to get rid of the “empty” levels as in here:

hiccups\$Intervention_Factor <- droplevels(hiccups\$Intervention_Factor)
summary(hiccups\$Intervention_Factor)
summary(hiccups\$Intervention)

If you do this you’ll find the factor variables have the correct number of levels:

> summary(hiccups\$Intervention_Factor)
Baseline Tongue Carotid Rectum
15 15 15 15
> summary(hiccups\$Intervention)
Baseline Carotid Rectum Tongue
15 15 15 15

All this suffices as a workaround, but I still don’t understand why it happened in the first place.  I’ve been advised in the past that it is best to use numeric variables to identify the levels of a factor in R since it can avoid problems.  Is this an example of why?  Can anyone provide me with a single command for creating the hiccups\$Intervention_Factor which doesn’t result in 60 levels?

Thank you for any help you can provide.

#590
Gavin Revie
Member

After a little more reading I discovered that the following syntax works:

hiccups\$Intervention_Factor <- factor(hiccups\$Intervention, levels = c(“Baseline”, “Tongue”, “Carotid”, “Rectum”))

This creates a factor with only 4 levels without any further processing needed.

Viewing 2 posts - 1 through 2 (of 2 total)
• You must be logged in to reply to this topic.