I’ve been working through Discovering Statistics Using R. I’m using R version 3.2.0, run through the latest version of R Studio. I’ve noticed a few errors which I imagine you’re already aware of (e.g. there is no package called DSUR, the opts command no longer works etc.). None of these caused me serious problems since a little background reading allowed me to work around these issues (with the exception of the outlierSummary command :-()
Today I came across something that left me scratching my head though. I’m in the section of chapter 4 dealing with line graphs.
I enclose my syntax below:
setwd(“C:/Users/Gavin/Dropbox/R Stuff/Field Chapter 4”)
library(ggplot2)
hiccupsData <- read.delim(“Hiccups.dat”, header = TRUE)
View(hiccupsData)
hiccups <- stack(hiccupsData)
names(hiccups) <- c(“Hiccups”, “Intervention”)
hiccups$Intervention_Factor <- factor(hiccups$Intervention, levels = hiccups$Intervention)
summary(hiccups$Intervention_Factor)
summary(hiccups$Intervention)
If you run this syntax you’ll find that the console output looks like this:
> setwd(“C:/Users/Gavin/Dropbox/R Stuff/Field Chapter 4”)
> library(ggplot2)
> hiccupsData <- read.delim(“Hiccups.dat”, header = TRUE)
> View(hiccupsData)
> hiccups <- stack(hiccupsData)
> names(hiccups) <- c(“Hiccups”, “Intervention”)
> hiccups$Intervention_Factor <- factor(hiccups$Intervention, levels = hiccups$Intervention)
Warning message:
In ‘levels<-‘(‘*tmp*’, value = if (nl == nL) as.character(labels) else paste0(labels, :
duplicated levels in factors are deprecated
> summary(hiccups$Intervention_Factor)
Baseline Baseline Baseline Baseline Baseline Baseline Baseline Baseline Baseline
15 0 0 0 0 0 0 0 0
Baseline Baseline Baseline Baseline Baseline Baseline Tongue Tongue Tongue
0 0 0 0 0 0 15 0 0
Tongue Tongue Tongue Tongue Tongue Tongue Tongue Tongue Tongue
0 0 0 0 0 0 0 0 0
Tongue Tongue Tongue Carotid Carotid Carotid Carotid Carotid Carotid
0 0 0 15 0 0 0 0 0
Carotid Carotid Carotid Carotid Carotid Carotid Carotid Carotid Carotid
0 0 0 0 0 0 0 0 0
Rectum Rectum Rectum Rectum Rectum Rectum Rectum Rectum Rectum
15 0 0 0 0 0 0 0 0
Rectum Rectum Rectum Rectum Rectum Rectum
0 0 0 0 0 0
Warning message:
In ‘levels<-‘(‘*tmp*’, value = if (nl == nL) as.character(labels) else paste0(labels, :
duplicated levels in factors are deprecated
> summary(hiccups$Intervention)
Baseline Carotid Rectum Tongue
15 15 15 15
What the summary command is telling me here is that the hiccups$Intervention_Factor has 60 levels rather than 4 and 56 of them have zero cases, whereas the “Intervention” variable, while we didn’t tell R to make it a factor, it has nonetheless correctly set it up as a factor with 4 levels. I spent a long time scratching my head trying to figure out why this was. Eventually I found a way of “fixing” this, either by creating the factor from scratch using the following commands after pulling everything out of the dataframe into individual objects and putting it back again:
interventionfactor <- c(rep(1,15), rep(2,15), rep(3,15), rep(4,15))
interventionfactor <- factor(interventionfactor, levels = c(1:4), labels = c(“Baseline”, “Tongue”, “Carotid”, “Rectum”))
hiccups <- data.frame(Participant = participant, Hiccups = hiccupsdata, Intervention = intervention, Intervention_Factor = interventionfactor)
Or much more simply by using the droplevels command to get rid of the “empty” levels as in here:
hiccups$Intervention_Factor <- droplevels(hiccups$Intervention_Factor)
summary(hiccups$Intervention_Factor)
summary(hiccups$Intervention)
If you do this you’ll find the factor variables have the correct number of levels:
> summary(hiccups$Intervention_Factor)
Baseline Tongue Carotid Rectum
15 15 15 15
> summary(hiccups$Intervention)
Baseline Carotid Rectum Tongue
15 15 15 15
All this suffices as a workaround, but I still don’t understand why it happened in the first place. I’ve been advised in the past that it is best to use numeric variables to identify the levels of a factor in R since it can avoid problems. Is this an example of why? Can anyone provide me with a single command for creating the hiccups$Intervention_Factor which doesn’t result in 60 levels?
Thank you for any help you can provide.