By Dave Collingridge
When I was first introduced to the R statistics program a few years ago, I was reluctant to get into the R scene. I was already proficient with my point-and-click SPSS software. I saw little reason to learn R. But my R-user colleagues encouraged me to give it a try. They raved about R’s versatility, graphics, and cost—it’s free!
I was curious. What did R have to offer that SPSS did not have to offer? As more and more people around me were learning R, I decided to give it a try. “Everybody was doing it, so why not me?” I thought. That’s right, I succumbed to the old “everyone’s doing it” mentality we were all warned about as kids.
More from Dave
Methodspace blogger Dave Collingridge is a senior research statistician for a large healthcare organization located in Utah, USA. He has published several quantitative and qualitative research articles in healthcare, psychology, and statistics and has been a member of Methodspace for several years. See his debut post here.
R is fairly easy to learn if you were the kind of person who enjoyed playing around with MS-DOS. For the rest of us, R has a steep learning curve. This means that R is challenging to learn, but once you get comfortable with the programming interface, it becomes easy to use. The secret is sticking with R until you get to the top of the hill. At the base of the steep learning curve you’ll find the crushed egos of countless people who gave R a try, but quit. My ego ended up at the base of the learning curve more than a few times. Every time I quit R, I picked up my ego and started up the hill again.
After a few years of self-study, a few R books, and a few R classes, I reached the top of the learning curve. My climb to the top probably took longer than most others, but only because I had other statistical tools that I could turn to when R got confusing. Now I am a proficient R user, whatever that means. Just like a calculator that offers more functions than anyone knows what to do with, there are still many things about R that I do not understand.
This blog is about taking in the view from the top the learning curve and telling you what I see. Hopefully this information will help some of you make an informed decision about whether to learn R, assuming you have not already done so.
1. Was it worth it? Yes, but only because I learned that R could do things that SPSS could not do. This point is essential in determining whether someone becomes a full-fledged R convert. R must have something to offer that goes beyond point-and-click programs like SPSS, and people must find these benefits useful.
What does R offer that is unique? A lot, but in order to learn about the special things R can do, you need to read a few journal articles and books on advanced R analyses. That is what I did. As I read about the novel applications in R, I began to discover new and more useful ways of analyzing data. The result has been articles containing analyses like recursive partitioning, GAMLSS spline regression, permutation tests, and bivariate density plots. Recently I tried cluster analyzing a dataset with mixed variables using the 2-step algorithm in SPSS. The analysis failed because of missing data. I ended up running the analysis in R with a package that allows for mixed variables and missing data. Even doing something as simple as a test of proportions is a lot easier in R because you don’t have to enter the raw data—you just enter the totals.
If you want to read about some of the interesting things R can do, here’s an online resource to get you started: http://cran.r-project.org/web/packages/HSAUR/index.html
2. Are there any things that SPSS is better at? Yes. In my opinion the SPSS spreadsheet is top notch for ease of use and reliability. The SPSS spreadsheet is my preferred platform for things like cleaning, organizing, and transforming data. SPSS lets me see things as they happen. When I am happy with how datasets look in SPSS, I import them into R.
R, on the other hand, runs data manipulations “out-of-view” which is a bit spooky. Sure you can view the data, but only after you’ve run a command. I’ve also found that R data viewers are less aesthetically appealing than SPSS’s (pronounced something like /es▪p▪ess▪ess▪es/). Pure R users don’t seem to complain about this, however. I think it is a matter of personal preference. Note that there are some data manipulations that are easier in R, but this is not a reason to learn R if you are already competent with SPSS.
3. Are basic analyses like t-tests and ANOVA better in R? Not really. SPSS can run a t-test, regression, and ANOVA just as well as R. This means that if you work with basic analyses and you already have something like SPSS, there is little point in learning R.
4. Are graphs and figures better in R? It depends. For some basics things, SPSS figures and graphs look nicer. R’s graphic capabilities only gain the advantage when you get into advanced analyses where flexibility is needed in order to create specialized figures. This means that if you only work with basic figures, there is little reason to learn R.
5. What if I can’t afford an expensive program like SPSS or STATA? Should I learn R? Yes. Once you learn R you’ll likely never go back to anything else.
6. Is there a certain amount of “coolness” that comes with using R? Maybe a little, but not enough to get you a date (although a potential employer might like the fact that he or she doesn’t have to spend thousands of dollars on new software if you’re hired). The R user community is not an elitist group. Generally speaking the R community is very helpful and welcoming of others, as evidence by the number of online R resources, instructional blogs, and Q&A discussion websites.
Bottom line — R is highly recommended for the following people.
1. Those who want to broaden their statistical horizons by exploring new and exciting ways of analyzing data.
2. Those who can’t afford or don’t want to spend a lot of money on a statistics package.
3. Those who enjoy programming and things like MS-DOS.