When learning statistics, you may learn about ANOVA (analysis of variance), ANCOVA (analysis of covariance) and ordinary least squares regression. The way these are taught in many fields leaves many people confused. Indeed, many people do not realize these are all the same model.In matrix algebra terms (don’t worry if you don’t know what that is) all three models can be written as follows:
Y = XB + e
where Y is a vector of your dependent variable, X is a matrix of your independent variable, B is a vector of parameters to be estimated and e is error.
If ANOVA, ANCOVA and OLS regression are all the same model, why do they seem so different? Why is the output different?
The main reason is that they developed in different fields. ANOVA developed in agriculture, where people were trying to see how plots could be made more productive. Regression developed when people were trying to figure out the size of the Earth. This separate development continued for quite a while.
The other main reason is that in ANOVA all the independent variables are categorical; in ANCOVA most are categorical and some continuous and in regression (despite what some think) the independent variables can be anything, but it developed to deal mostly with continuous variables. Thinking about origins, this makes sense. In agriculture, the main independent variable is categorical: Plot A or plot B or plot C. There is no plot 1.5. In measuring the Earth, everything is numeric.
Nowadays, though, programs such as R or SAS can automatically code categorical variables, creating dummy codes. These can be parameterized in different ways, but that’s a topic for another post.