I got into a conversation on Twitter (find me there as @peterflomstat) about the user-friendliness of statistical software. I have heard R described (appropriately, I think) as being “expert friendly”. This led to a conversation about whether and when that is good or bad. But we agreed that it would be hard to discuss that in 140 character blocks.

Many statistical procedures are complex. Even ones that are often regarded as relatively simple (e.g. ordinary least squares regression) have some oddities and make assumptions and so on. If statistical software is so easy to use that anyone can simply point-and-click and get results, that can easily lead to really silly statistics being done. If there is some form of review by someone who knows statistics then this silliness may be caught (or it may not). But sometimes there is no such review. If software is a little harder to use, then perhaps some of this can be avoided. “I did multiple regression” “Did you check for outliers?” “Huh?” is not a good conversation.

Of course, it is also possible to do really silly things with other statistical software. But there is a little bit of a bar to entry.

But I think that “user-friendliness” really has two aspects:

- How much do you have to know about
**statistics**in order to use the software? - How much do you have to know about
**programming**in order to use the software?

My view is that a fairly high requirement on the first of these is a good thing, while a high requirement on the second is not. Many statisticians are good at programming, but that should not be a prerequisite.

It’s nice when the program makes it easier to get the information if you just know a little about statistics. It’s nicer if you don’t have to do a lot of complex programming to get results.

For this reason – I suspect Excel will always be popular.

I fear Excel. It’s a spreadsheet. It’s not designed for statistical analysis. It’s hard to save code. It’s hard to check assumptions. And it is very limited in what it can do.

Hi Peter,

If you could make a t-shirt with “I fear Excel” printed on it, I’d by a dozen. Maybe to make clear that I’m not afraid of numbers it could be “I’m a [programmer|statistician] … I fear Excel”.

Peter,

I absolutely agree about the programming and Excel.

A few years ago I had a client who was using excel for data manipulation (not even analysis). Months into his analysis he discovered that he had sorted one column of data in excel, but not all–so the data from that one variable was now from all the wrong individuals.

He had to redo everything.

The real danger with excel is there’s no warning or way to tell that you did something like this.

I love excel and use it all the time–just not for statistics.

Karen

I wouldn’t say I *love* Excel. It has a lot of flaws – like, e.g., requiring an = sign before what should obviously be a formula. And the ways it formats cells can often be ….. troublesome. But it serves a purpose.

Of course, you can do dumb things (like sorting one column) in any program; but with R or SAS (or probably other programs too) you have a record. This can also be useful for getting help.

Many parallels with GIS (and presumably other technical software) as well.

“Which map projection did you use?” “Huh?” “Oh dear..” is a conversation I’ve had many times…

I wonder if software just didn’t include defaults and instead made you choose from a list of options that non-users wouldn’t be familiar with, if that would help at all. Even the presence of a default often gives the impression that it is the most often correct option.

I think good defaults can be useful, but they can also be a crutch. Not sure what would be best.

I vote for providing the best crutch one can find. Whether we like it, or not, I think it would be safe to say that most statistical analyses are accomplished by non-statisticians. I, for one, would prefer to increase the likelihood of the non-statisticians obtaining and disseminating correct answers. And, the more required analytical settings that preexist as default settings, the less work (and chance of error) for those who do know what they are doing.

Of course if one corrupts their data, doesn’t ensure that statistical assumptions have been met, or simply uses inappropriate techniques, their results would be incorrect regardless of the ease or difficulty involved in doing an analysis.