If you are a PhD student you are probably aware that it is possible to buy a dissertation. There are lots of companies that offer a finished dissertation for a price. I’m not going to link to them, but you can find them. This is a bad idea. But hiring a consultant can be a good idea. Why? Read more!

In any form of regression model, we often think of the effects as *additive*. That is, we suppose that the effect of one variable can be added to the effect of another to get an accurate model. This is never strictly true, but how true is it? Is it true enough? How can we tell? Read more!

Suppose that you are running a regression and one of your independent variables is the hour of the day (or day of the year) that something happened. Using time as a linear variable doesn’t make much sense: 23:59 is close to 00:01. You could categorize time (e.g. into morning, midday, evening, night) but that throws away information and invokes magical thinking. So…. what to do? Read more!

I will be giving a 4 hour course at SESUG in Savannah this fall.

The course is titled: Lies, damn lies and…. SAS to the Rescue!

It is designed for people who don’t know a lot of statistics but have to read statistics, interpret statistics and/or supervise statisticians and data analysts.

Cluster analysis is a set of methods for finding subjects (people, corporations, drugs, whatever) that “go together” in terms of some set of variables. There are a lot of different methods and it can be hard to know when you have good clusters. There are various statistical measures that attempt to do this, but they aren’t very intuitive.

Rather than use one of these, I prefer the following:

Do a lot of different clustering. Look at the clusters from each. Try to name them. Now, if your colleagues say “Yeah! That’s right!” to the name scheme, you have a good clustering. If they say something like “well….I dunno….that doesn’t seem right, somehow” then you still have work to do.