In any form of regression model, we often think of the effects as *additive*. That is, we suppose that the effect of one variable can be added to the effect of another to get an accurate model. This is never strictly true, but how true is it? Is it true enough? How can we tell? Read more!

Suppose that you are running a regression and one of your independent variables is the hour of the day (or day of the year) that something happened. Using time as a linear variable doesn’t make much sense: 23:59 is close to 00:01. You could categorize time (e.g. into morning, midday, evening, night) but that throws away information and invokes magical thinking. So…. what to do? Read more!

I will be giving a 4 hour course at SESUG in Savannah this fall.

The course is titled: Lies, damn lies and…. SAS to the Rescue!

It is designed for people who don’t know a lot of statistics but have to read statistics, interpret statistics and/or supervise statisticians and data analysts.

Cluster analysis is a set of methods for finding subjects (people, corporations, drugs, whatever) that “go together” in terms of some set of variables. There are a lot of different methods and it can be hard to know when you have good clusters. There are various statistical measures that attempt to do this, but they aren’t very intuitive.

Rather than use one of these, I prefer the following:

Do a lot of different clustering. Look at the clusters from each. Try to name them. Now, if your colleagues say “Yeah! That’s right!” to the name scheme, you have a good clustering. If they say something like “well….I dunno….that doesn’t seem right, somehow” then you still have work to do.

One question that sometimes arises in doing statistical analysis is whether to use a sophisticated method that is (in one way or another) more appropriate than a more typical method. The reason for its appropriateness might be that the usual method violates assumptions (e.g. we should use robust regression rather than OLS regression in some cases), answers the question better (e.g. we might use quantile regression instead of OLS regression in some cases), or is more efficient.

But the reviewers and editors at a journal may not know of the new method and may have issues with it. It might even lead to the paper being rejected.

What are your thoughts on this?