SAS v. R: Ease of learning

By , April 19, 2010 10:01 am

Two days ago, I wrote an introduction to this series.

Today, I will discuss ease of learning. Unlike the earlier post (and, I hope, most of the ones to come) this one is inherently subjective. “Ease of learning” is not the same for everyone – indeed, one thing I’d like to explore here and in the comments is why some people find SAS easier to learn, while others find R easier to learn. (Note that I am only discussing ease of use for statistical analysis and data management necessary to do that analysis).

Personally, I have found and continue to find SAS much easier to learn. There may be several reasons for this.

Happenstantial reasons
First, I learned SAS before I learned R. I’ve been using SAS for nearly 20 years (scary thought!) and R for about 5. Second, I did a lot of learning of SAS in graduate school, where many of my professors gave us the programs they had written. I learn best by example. With R, though, I have mostly learned on my own. Third, I’ve been to a lot of conferences about SAS, and none about R – purely because there have been SAS conferences that were convenient for me to go to, and there have not been such R conferences (although one is coming up this summer). Those three reasons might be lumped together into a “happenstance” group – None of them are inherently about the software.

Reasons that are due to my own traits
Next, I consider some of my own traits that might relate to my feelings about ease of learning:
First: I am not a programmer. I took one programming course way back when I was an undergraduate (in 1977). We learned a bit of ALGOL. But this was in the days of punchcards, and frequent machine outages, and freshman were on the bottom of the priority stack. I didn’t learn that much (although I got an A) and some of what I learned was about binary arithmetic and AND gates and stuff. Not that useful for programming in R.

Second: I like voluminous help files. In a later post, I will discuss getting help, but there is little doubt that SAS documentation is lengthier than R documentation.

Reasons due to differences in the software
Finally, some reasons that I think relate to differences in software.
First, R is object oriented (amended per comments: R is a function language that is object oriented), SAS is more a procedural language (amended for clarity). I’m not too sure about this, but I think that some people may have brains that prefer object oriented code, while others have brains that prefer procedural code. I find procedural code easier to understand. I also think people who are trained as programmers may find object oriented easier – but that is just a guess based on casual observation.

Second, all of SAS is written by one group of people. There are, to be sure, some inconsistencies in how things work, but there is an overall style. The base packages of R were also written by one group, but they are supplemented by a huge number of other packages, written by different people – often, these packages duplicate each other or base R in terms of their basic goals, but differ in how they approach getting to those goals.

Questions for commenters
I’d like this to be a conversation. So … to start us off:
1. Do you find SAS easier or R easier?
2. Why? Does it have to do with your background, or differences in the packages, or both?
3. Do you prefer terse help files, or lengthy ones?
4. What might make R easier to learn?
5. What might make SAS easier to learn?

23 Responses to “SAS v. R: Ease of learning”

  1. Alex says:

    Hi Peter,
    Thank you for this post. I think another thing to consider is the helpfulness of the error messages in each. I have a much harder time understanding the error messages in R and then correcting my mistakes.

  2. Peter Flom says:

    I do as well. I plan to write a post about error messages. Thanks for your interest

  3. Paul Miller says:

    Hi Peter,

    I started learning R a couple of months ago.

    Here are my 2 cents concerning your questions:

    1) Do you find SAS easier or R easier?

    Sometimes SAS. Sometimes not.

    2) Why?

    It depends on the material I’m using to learn. The first book I read on R programming was “R for SAS and SPSS Users.” The book is excellent and I found myself learning a great deal very quickly. If you are able to learn what you’re interested in using a book like this one, then I think R is no harder to learn than SAS.

    The problem is that the books available don’t cover all the topics you might like to learn. For example, I’ve been learning to use the RODBC package in R to read SQL Server tables. The package itself works extremely well. In fact, my initial assessment is that it does a better job of reading in the tables than SAS does. Unfortunately, the documentation is not very detailed and so I’ve found learning to use it a little difficult.

    3) Do you prefer terse help files, or lengthy ones?

    I prefer lengthy help files with lots of examples.

    4) What might make R easier to learn?

    More user friendly documentation. One of my R books says that R help files are written for intermediate to advanced users and not for beginners.

    5) What might make SAS easier to learn?

    More access to the program itself.

    A rather clever person I corresponded with said “SAS is like a well-manicured mansion, while R is more like a natural prairie. Each is beautiful in its own way.” I would say SAS is like a mansion with many rooms, a great deal of which I am not allowed to enter. For example, awhile ago, I bought a book titled “Analysis of Clinical Trials using SAS…” Much to my dismay, I discovered that I couldn’t run a good deal of the sample code because I don’t have Proc IML. I often find that I’m unable to learn new things in SAS because I don’t have access to things that I need. I often feel that these are things that should be included in the base version of the program. Proc IML would be a case in point. So I’m forced to look elsewhere.

  4. Michael Tuchman says:

    You can write functional code in R very easily. Additionally, your functions exist on an equal status to R’s internally defined functions. They are just as much a part of the language.

    By contrast, Until 9.2, you could even DEFINE your own functions in SAS. I suspect we use the term “functional programming” very differently, however.

    I use the term “functional programming” to mean the design of a solution using only mathematically defined functions. Executing a program is essentially just evaluating a large composite function.

    R comes much closer to the way programmers define the term ‘functional programming’ than SAS does.

    However, the advantage is that when you’re done – for somebody else to trust your code, they have to trust all of your functions, which you must clearly document.

    This is where the advantage of SAS’s pre-packaged solutions shines. If I want my boss to trust my results, I can rely on SAS’s credibility rather than my first generation code’s credibility.

    Both have their place.

  5. Michael Tuchman says:

    I find R easier. It took me two years to adjust my mind to the fact that I cannot define my own functions in SAS. (You can as of 9.2. I started my SAS career somewhere in v.7) If I want to multiply matrices, apply a function over a matrix, I definitely prefer R.

    When I am learning a new area of Stats, though, I find SAS much easier. I’d rather make sure I can exhaust what a particular PROC can do before I go off and write my own code.

    I tend to prefer lengthy help files, but both R and SAS have that. I still keep a copy of Aster’s book nearby for when I just want to jog my memory.

    What makes SAS easier to learn is when you know what model you ant to run, you simply have to find the parameters to do what you need to do.

    I’ve found R’s graphics easier to learn, despite having a programming interface. It is much easier for me to annotate and develop a sophisticated graph using R than with SAS PROC ANNOTATE. It just flows nicer.

  6. Peter Flom says:

    Hi Michael

    I did know you could write your own functions in R; I didn’t know you could do that in SAS 9.2

    I was using the term “functional programming” as a contrast to “object oriented” programming. I got the term from Wikipedia. Is there a better term?

    Peter

  7. Peter Flom says:

    Hi Michael
    I agree that R graphics are easier to modify, although, in 9.2, SAS has made enormous strides – PROC ANNOTATE will, I think, mostly fade away. The default ODS graphics for a lot of SAS PROCS are quite nice, although modifying them is still something of a pain.

    For writing your own code, R is definitely easier.

    But, I, like you, also find SAS easier for learning new stuff.

    One place where I disagree with you is documentation – I’ll cover that in more detail in a later post, but if you compare, say, what you get when you type ?glm in R to what you get in the documentation for PROC GENMOD in SAS – well, the r output isn’t paged, but it looks like a maximum of 4 pages or so. Whereas PROC GENMOD’s documentation is nearly 100 pages.

    Peter

  8. Michael Tuchman says:

    And it is actually even worse than that because there no editorial consistency to the documentation. Everybody documents their own stuff. Caveat emptor.

  9. Michael Tuchman says:

    For access to the program itself, I see R standing out. I can get the body of almost any function except the stuff written natively. Most of R is written in R.

    Still, if I want somebody to get me an analysis of a data set and report their insights, I’d still recommend SAS first.

    I see SAS providing absolutely no access to the program itself, and I that is by design. It’s not a bad design decision, and they’ve stuck to it consistently.

    SAS and R both have an army of mutually supporting users. That will mitigate some of the problems with the documentation.

    Screw up in SAS, and you can at least hold SAS accountable, assuming you’re running the right model with the right assumptions. Screw up in R, and it’s likely your fault for writing bad code. There’s an element of risk control that, for me, favors SAS.

    However, the schemer in me recognizes the cool features in R.

  10. Matt Revelle says:

    Hi Peter,

    I’m new to SAS, but after looking over some code examples I’d say it’s a mix of declarative and procedural.

    I haven’t dug into the object system in R, but it just seems to be single dispatch (meaning you call a function like “plot” and the actual plot function used depends on the type of the first argument; e.g., an igraph graph, an hclust tree, etc. R is very functional programming friendly with it’s lexical scoping and anonymous functions; I imagine that’s courtesy of Scheme’s influence on the language.

    Even more than R, SAS is more a statistics language than a programming language. Sort of how SQL isn’t a programming language, it’s a relational data language.

  11. moclanmomo says:

    Hey Peter,

    great Blog. I find this a particularly nice topic to share my experiences.

    I graduated in Statistics in 2007 and am currently wrapping my PhD thesis, so I come from a different position than you. Namely, I learned R from the start, it was permeating everything in my studies. Everything was done with R. Later, I was introduced to SPSS and even later to SAS (in a survival analysis course). I won’t share my SPSS experience, but SAS was figuratively excruciating to learn for me. It was this pile of (to me) illogical statements and options that you could only handle with the user’s manual. It would actually have taken me less time to implement a Kaplan-Meier estimation procedure in R myself than using PROC SURV (or however it was called). Also, I didn´t know what has been calculated, you just get output. But that output were literally meters of paper if printed (I am sure there are options where you can regulate that). However, you just have the options, SAS gives you.

    This is what R means to me, you have the freedom and chance to do everything yourself and check what others have done. Also, current developments in statistics seem to happen only in R, I couldn’t have done my PhD papers with anything else.

    I suppose I am from a younger generation that grows up with R and sees this as the natural and “best” way to do statistics. A practitioner may think otherwise, I remember a (medical) colleague once saying “We use SAS because if people die due to a bug in the program, they are liable and not we.” This sounds cynical but I think this is very deep down to the core. R never provides such warranties.

    Concerning help, I think actually R has an amazing body of introductory texts and help. If you just look on documentation on the R homepage, there are many freely available documents. Then there is the UseR! series of books. Then there is the Journal of Statistical Software. I doubt that “SAS documentation is lengthier than R documentation”.

    But one thing is clear to me: R aims at people who know what they are doing. Absolutely. You can see this with standard output in R which is very minimalistic. You must ASK R what you want from it. SAS and SPSS put everything out. And therefore you need to know how to program in R to use it, really. But if you do, you feel bound and limited with SAS or SPSS.

    I also need to be a bit of a smartarse: SAS is if I remember correctly a declarative way of programming and R is imperative (object-oriented) From wikipedia:
    declarative: “declarative programming is a programming paradigm that expresses the logic of a computation without describing its control flow. Many languages applying this style attempt to minimize or eliminate side effects by describing what the program should accomplish, rather than describing how to go about accomplishing it”
    imperative: “imperative programming is a programming paradigm that describes computation in terms of statements that change a program state. In much the same way that imperative mood in natural languages expresses commands to take action, imperative programs define sequences of commands for the computer to perform.”

    This definitions highlight what is the main difference to me. This is also why I can’t really see SAS being anywhere near a competitive product with R (or for that matter and in my opinion, even comparable). SAS is competitive with SPSS. Like Windows is competitive with MAC not with Linux (that may be with say FreeBSD).

    Regards

  12. Peter Flom says:

    Hi Moclanmomo
    A few responses -
    1) Re documentation – in this post, I meant the documentation that is native to the program. That is, what you get by typing ?XXX in R vs. the SAS documentation. On that, there is no contest – SAS is longer by an order of magnitude.

    2) You can get the details of what SAS produces in the SAS documentation – in the details section. You can’t get code or algorithms, but you can get the math.

    3) The rest of your points are correct, but could just as well be used as arguments why R will never be competitive with SAS – namely, it does whatever you tell it, however lame-brained (or brilliant) that may be. Of course, you can do dumb things in any program, but in SAS, you can be reasonably sure that the algorithm isn’t at fault. With a contributed R package – well – who knows?

    But I’d rather say that R and SAS don’t compete with each other, but serve different needs and different audiences with some overlap in each

    Peter

  13. 1. Do you find SAS easier or R easier?

    SAS is easier when the PROC I need already exists. R is easier when the function I need does not already exist. In general, I find Stata easier than either R or SAS for most of my applications. It is more extensible than SAS but less extensible than R, but its documentation is superior to either. YMMV.

    2. Why? Does it have to do with your background, or differences in the packages, or both?

    I learned SAS first, then S-plus/R, then Stata. (I also know MATLAB.) I use Stata most currently, for years I programmed almost exclusively in SAS. When SAS didn’t have a procedure I needed, though, I was stuck.

    I often have to do analyses that require custom programming. The extensibility of Stata or R really helps here. Better extensibility also means that Stata and R are more likely to have more up-to-date procedures than SAS; the user communities can extend the software without waiting for a vendor.

    3. Do you prefer terse help files, or lengthy ones?

    Lengthy ones with a terse abstract.

    4. What might make R easier to learn?

    Better documentation. A less newbie-hostile mailing list. Those who blog or tweet about R are typically much more genial than the denizens of the mailing list.

    5. What might make SAS easier to learn?

    Better documentation! It’s all there, but the last I used it, the search functionality was abysmal.
    For me, Stata typically hits the sweet spot of extensibility, documentation, features, and community.

  14. Peter Flom says:

    Hi Michael
    I’ve never used Stata, but I agree with all your other points. Perhaps I should investigate Stata.

  15. Liang Xie says:

    The first statistical language I learnt was SAS and I’ve been using it for long, i.e. my mind has totally adjusted to the way SAS programming goes. It did take me some time to get comfortable to it, though. On the other hand, the first programming language I learnt was Pascal, hence I found no difficulty to get used to R at the very beginning. I used R for part of my dissertation and I enjoy very much the flexibility and ease to implement new algorithms in R.

    In my work, SAS is used for formal statistical analysis due to production concerns and R is utilized for visualization/ EDA/ validating my new ideas. I found both of them are pretty handy, and I believe I’ve already developped parallel brains to work with either of these two.

  16. Alfred says:

    Both languages are easy to learn to normal people, otherwise he/she could not do analytical job. It is not 0 or 1.
    Choose one for your specific purpose.
    compare the results comed from both languages to identify the differentiation.
    learn sas for your career. sas is not just the statistical algorithms.
    for a safe business, huge data and big projects, use sas.
    in 1990s, many people learned oracle instead of sybase and sql-server because they can make more. R is still a hobby to most of professional analysts.

  17. Alfred says:

    I would not say “SAS is more a statistics language…”.
    It is a programming language similar to C to call statistical procedures. actually, we can use C to do the same job if we develop many statistical functions.

  18. Loren Tauer says:

    I need to use R to take advantage of some nice routines written by someone else, but I am finding the language very frustrating. Can someone direct me to a document that lists the various error messages. The R language gives cryptic messages that is almost as bad as Fortran 30+ years ago. Also a manual that explains how to construct input and output files would be very helpful. The manuals on the R website and downloadee with R are not particularily helpful in this regard. I need to write batch files that I can call which also reads and writes the data to files. I do not want to offend anyone, but I find R very difficult to write routines and I have used GAUSS, Matlab and other econometric packages.

    Thanks

  19. rishika says:

    could you please tell me how to run Regression with Autocorrelated and Heteroscedastic Errors in SAS, as I am a very new user and not able to gather it.

  20. Peter Flom says:

    For autocorrelated, you want to look into PROC MIXED, probably. If errors are heteroscedastic, you might want to transform the variable, or you might need NLMIXED or GLIMMIX

  21. sgill says:

    Having used both R and SAS, I feel both are good.

    Now my 2 pence:
    1) SAS and R are capable enough for most tasks.
    2) SAS is easier to learn than R.
    3) R has a lot of documentation (but hazy:-)) and user base.
    3) SAS and R have poor web integration.

    R is worst for any mouse events on generated web pages. I don’t want to name some packages which have no choice for new functionality in R. Everything is limited in these statistical packages and u just cant play with R and SAS for any beautification.

    After all my experience in statistics, i feel that I will use only perl for statistical purposes from now on.

    To hell with these softwares, I want flexibility.
    a) Java usage is tedious for data mining work,
    b) R is complicated for humans,
    c) C++ packages and debugging …aah not any more,
    d) python – Run and sleep for 10 hours->still no output
    ->no clarity on any package properly though the
    anguage is very clean and Scipy and Numpy very
    capable

    Finally, I will use perl with MOOSE. I successfully used perl to do TTest, PCA and also wrote code for heatmap, clusters using GD module with fantastic mouse events to lead to a desired output page or a pop-up legend. Also circles or whatever shape you want as clusters from output in few lines of code.

    Long live perl!!!!! I love perl!!!!! Full power in hand!!!!!

  22. Peter Flom says:

    Thanks for your post. Any references or links for learning more about perl for statistics?

  23. sgill says:

    Hi Peter,

    As you know, there 3 gr8 steps: Data preparation, Data analysis and Data presentation which form the heart of any stat tool.

    All these can be handled in perl nicely and there is very little requirement from your end. You just need to know what you want to do and I bet you are better than any one in this site in using statistics.

    simple way to do stat with perl.

    step 1: Read “beginning perl” book by Simon Cozens

    step 2: Read chapter 15 of an old book “Algorithms in Perl”.
    (It wud be gr8 to read the whole book)

    step 3: Search for any module you want use in CPAN
    (As you know what type of statistical process you want
    to use)
    http://search.cpan.org

    Eg: Type “kmeans” in text box provided and you get

    —-> PDL::Stats::Kmeans

    Install module and use for your needs

    *** If u understand how to use CPAN, then u can surely
    do any task with ease.

    step 4: Learn GD (Graphic Display) for any visualization you
    need. Since in real world everything you need to
    deliver is via web, GD can help u for ur type of
    visualization apart from routine things for graphics.
    Also, lot of 3rd party visualization tools can be
    interfaced by perl. This is where perl can kick all
    standalone softwares.

    step 5: Once u know a route, automate the process….next time
    its just cake walk. Also, u can use web services,
    system cron for ur tasks.

    Apart from the above, check these for some basic insight:

    http://www.jstatsoft.org/v11/i01/paper
    (has basic perl statistics info and also R with perl discussion)

    http://pdl.perl.org/?page=screenshots/index
    (for handling N-dimensional arrays/scientific computing)

Leave a Reply