Discovering Statistics

Problem with factors in line graphs (Chapter 4 of DSUR)

Viewing 2 posts - 1 through 2 (of 2 total)
  • Author
    Posts
  • #589
    Gavin Revie
    Member

    I’ve been working through Discovering Statistics Using R.  I’m using R version 3.2.0, run through the latest version of R Studio.  I’ve noticed a few errors which I imagine you’re already aware of (e.g. there is no package called DSUR, the opts command no longer works etc.).  None of these caused me serious problems since a little background reading allowed me to work around these issues (with the exception of the outlierSummary command :-()

    Today I came across something that left me scratching my head though.  I’m in the section of chapter 4 dealing with line graphs.

    I enclose my syntax below:

    setwd(“C:/Users/Gavin/Dropbox/R Stuff/Field Chapter 4”)
    library(ggplot2)
    hiccupsData <- read.delim(“Hiccups.dat”, header = TRUE)
    View(hiccupsData)
    hiccups <- stack(hiccupsData)
    names(hiccups) <- c(“Hiccups”, “Intervention”)
    hiccups$Intervention_Factor <- factor(hiccups$Intervention, levels = hiccups$Intervention)
    summary(hiccups$Intervention_Factor)
    summary(hiccups$Intervention)

    If you run this syntax you’ll find that the console output looks like this:

    > setwd(“C:/Users/Gavin/Dropbox/R Stuff/Field Chapter 4”)
    > library(ggplot2)
    > hiccupsData <- read.delim(“Hiccups.dat”, header = TRUE)
    > View(hiccupsData)
    > hiccups <- stack(hiccupsData)
    > names(hiccups) <- c(“Hiccups”, “Intervention”)
    > hiccups$Intervention_Factor <- factor(hiccups$Intervention, levels = hiccups$Intervention)
    Warning message:
    In ‘levels<-‘(‘*tmp*’, value = if (nl == nL) as.character(labels) else paste0(labels, :
    duplicated levels in factors are deprecated
    > summary(hiccups$Intervention_Factor)
    Baseline Baseline Baseline Baseline Baseline Baseline Baseline Baseline Baseline
    15 0 0 0 0 0 0 0 0
    Baseline Baseline Baseline Baseline Baseline Baseline Tongue Tongue Tongue
    0 0 0 0 0 0 15 0 0
    Tongue Tongue Tongue Tongue Tongue Tongue Tongue Tongue Tongue
    0 0 0 0 0 0 0 0 0
    Tongue Tongue Tongue Carotid Carotid Carotid Carotid Carotid Carotid
    0 0 0 15 0 0 0 0 0
    Carotid Carotid Carotid Carotid Carotid Carotid Carotid Carotid Carotid
    0 0 0 0 0 0 0 0 0
    Rectum Rectum Rectum Rectum Rectum Rectum Rectum Rectum Rectum
    15 0 0 0 0 0 0 0 0
    Rectum Rectum Rectum Rectum Rectum Rectum
    0 0 0 0 0 0
    Warning message:
    In ‘levels<-‘(‘*tmp*’, value = if (nl == nL) as.character(labels) else paste0(labels, :
    duplicated levels in factors are deprecated
    > summary(hiccups$Intervention)
    Baseline Carotid Rectum Tongue
    15 15 15 15

    What the summary command is telling me here is that the hiccups$Intervention_Factor has 60 levels rather than 4 and 56 of them have zero cases, whereas the “Intervention” variable, while we didn’t tell R to make it a factor, it has nonetheless correctly set it up as a factor with 4 levels.  I spent a long time scratching my head trying to figure out why this was.  Eventually I found a way of “fixing” this, either by creating the factor from scratch using the following commands after pulling everything out of the dataframe into individual objects and putting it back again:

    interventionfactor <- c(rep(1,15), rep(2,15), rep(3,15), rep(4,15))
    interventionfactor <- factor(interventionfactor, levels = c(1:4), labels = c(“Baseline”, “Tongue”, “Carotid”, “Rectum”))
    hiccups <- data.frame(Participant = participant, Hiccups = hiccupsdata, Intervention = intervention, Intervention_Factor = interventionfactor)

    Or much more simply by using the droplevels command to get rid of the “empty” levels as in here:

    hiccups$Intervention_Factor <- droplevels(hiccups$Intervention_Factor)
    summary(hiccups$Intervention_Factor)
    summary(hiccups$Intervention)

    If you do this you’ll find the factor variables have the correct number of levels:

    > summary(hiccups$Intervention_Factor)
    Baseline Tongue Carotid Rectum
    15 15 15 15
    > summary(hiccups$Intervention)
    Baseline Carotid Rectum Tongue
    15 15 15 15

    All this suffices as a workaround, but I still don’t understand why it happened in the first place.  I’ve been advised in the past that it is best to use numeric variables to identify the levels of a factor in R since it can avoid problems.  Is this an example of why?  Can anyone provide me with a single command for creating the hiccups$Intervention_Factor which doesn’t result in 60 levels?

    Thank you for any help you can provide.

    #590
    Gavin Revie
    Member

    After a little more reading I discovered that the following syntax works:

    hiccups$Intervention_Factor <- factor(hiccups$Intervention, levels = c(“Baseline”, “Tongue”, “Carotid”, “Rectum”))

    This creates a factor with only 4 levels without any further processing needed.

Viewing 2 posts - 1 through 2 (of 2 total)
  • You must be logged in to reply to this topic.