One of the frustrating things about using R (although it can be a positive thing too) is that there are often 25 different ways to do the same, or almost the same thing. Therefore, you want to do something, you have a look around, and then you find something that looks like it does what you want, but it doesn’t. It almost does, but it doesn’t. A few dead ends later and you have hit on something that works. The relief is tangible.
As I use R more (I’m supposed to be writing a bloody book on it, but seriously it’ll be the blind leading the blind), I find out how to do more stuff. I’m going to share things here. Mainly this is so I’ll remember to stick it in the book.
So, I had a data set that looked a bit like this:
Actually it looked exactly like that except I’ve ignored most of the data because no-one likes scrolling through data. I’ve also changed the example because it was tedious. A cat was recently asked to do jury service (http://www.telegraph.co.uk/news/newstopics/howaboutthat/8264782/Cat-ordered-to-do-jury-service.html) … hopefully the trial wasn’t for a cat burglar. Imagine , we wanted to test their abilities. We gave them several trial cases on a video screen, then asked them to decide guilty or not guilty by pressing a button. Their score is the number of decisions that they got correct. As a control we had some humans and snails. Not sure what the snails control for, but I like them (their eyes are cute when they stick out).
So, I’m doing a one-way ANOVA and I have three groups (helpfully labeled cat, human, snail) and an outcome: number of felons correctly identified as guilty. The data above are in the format SPSS expects them to be in. Then I decide that I want to do a robust ANOVA because the data have a weird distribution. To do this, R needs to see my in columns like so:
Cat Human Snail
75 65 78
48 68 68
72 58 58
58 65 48
78 65 65
Took me a while to work out how to do this very easy task. Basically, if your original data are saved in a data frame called ‘jury’ that has two variables (percent and animal) then you create a new data frame (I’ve called it newData) using unstack():
This will break up the variable ‘percent’ into columns based on the variable ‘animal’ in the data frame called ‘jury’.
There’s also a package called reshape that has functions cast() and melt() which took me ages to get my head around, but that I ended up using a lot in the book. They’re better for things that aren’t as simple as this example.