Reshaping data in R

One of the frustrating things about using R (although it can be a positive thing too) is that there are often 25 different ways to do the same, or almost the same thing. Therefore, you want to do something, you have a look around, and then you find something that looks like it does what you want, but it doesn’t. It almost does, but it doesn’t. A few dead ends later and you have hit on something that works. The relief is tangible.

 

As I use R more (I’m supposed to be writing a bloody book on it, but seriously it’ll be the blind leading the blind), I find out how to do more stuff. I’m going to share things here. Mainly this is so I’ll remember to stick it in the book.

 

So, I had a data set that looked a bit like this:

 

Percent           animal

75     Cat

48     Cat

72     Cat

58     Cat

78     Cat

68     Cat

62     Cat

65     Human

68     Human

58     Human

65     Human

65     Human

78     Snail

68     Snail

58     Snail

48     Snail

65     Snail

68     Snail

 

Actually it looked exactly like that except I’ve ignored most of the data because no-one likes scrolling through data. I’ve also changed the example because it was tedious. A cat was recently asked to do jury service (http://www.telegraph.co.uk/news/newstopics/howaboutthat/8264782/Cat-ordered-to-do-jury-service.html) … hopefully the trial wasn’t for a cat burglar. Imagine , we wanted to test their abilities. We gave them several trial cases on a video screen, then asked them to decide guilty or not guilty by pressing a button. Their score is the number of decisions that they got correct. As a control we had some humans and snails. Not sure what the snails control for, but I like them (their eyes are cute when they stick out).

 

So, I’m doing a one-way ANOVA and I have three groups (helpfully labeled cat, human, snail) and an outcome: number of felons correctly identified as guilty. The data above are in the format SPSS expects them to be in. Then I decide that I want to do a robust ANOVA because the data have a weird distribution. To do this, R needs to see my in columns like so:

 

Cat      Human           Snail

75       65       78

48       68       68

72       58       58

58       65       48

78       65       65

68                   68

62    

 

Took me a while to work out how to do this very easy task. Basically, if your original data are saved in a data frame called ‘jury’ that has two variables (percent and animal) then you create a new data frame (I’ve called it newData) using unstack():

 

newData<-unstack(jury, percent~animal)

 

This will break up the variable ‘percent’ into columns based on the variable ‘animal’ in the data frame called ‘jury’.

 

There’s also a package called reshape that has functions cast() and melt() which took me ages to get my head around, but that I ended up using a lot in the book. They’re better for things that aren’t as simple as this example.