The simplest case is when there is only one IV, and it is continuous. In this case, we can make a scatterplot of the DV and the IV.
Above is a scatterplot of the heights and weights of a group of young adults in Brooklyn, New York. (It’s from a project I worked on, long ago).
It is traditional to put the IV along the X axis, and the DV along the Y axis.. Just by looking, it is clear that there is some relationship between height.
There are various ways to model this relationship, and these can be represented by various lines, see below
OLS regression assumes that the relationship is linear, that is, it fits a straight line to represent the relationship. Algebra tells us that any straight line can be represented as an equation like
Here, y is height, x is weight, and a and b are parameters which we attempt to estimate (hence, simple linear regression, and regression generally, is a parametric method). Various lines might be fit to these points; we need a method to choose one of them, or, in other words, to select a and b. Ideally, the points would lie exactly on the line, but there is no such line for these points. Any line will miss most of the points; we need a method to say how badly a line misses the points. The most common way is through ordinary least squares (OLS) which uses the sum of the squared distances from the line to the points.
When there are more than one IVs, the method is quite similar, but instead of a scatterplot in two dimensions, we have to imagine a space with as many dimensions as there are variables, and then minimize the distances in that space. Fortunately, the computer takes care of all this, and gives us output. The only difference that need concern us is that now if there are p IVs, the equation looks like . That is, each of the IVs has an associated parameter.
How multiple linear regression controls for the effects of other variables
One interesting feature of multiple linear regression is that the effect of each IV is “controlled” for the other IVs. That is, the parameter for variable accounts for the effect of on, assuming that , and so on stay the same. If, for example, we were interested in people’s weights as effects.
Of their age, sex, and height, then the resulting equation would show how men and women of a given age and height differ; how age is related to weight, if sex and height are kept constant, and how height is related to weight, if age and sex are kept constant.
Assumptions of multiple linear regression
Multiple linear regression (and simple linear regression as well) makes certain assumptions about the data.
1. Linearity As discussed in the previous diary, the model assumes that the relationship between the DV and the IVs can be well-estimated by a straight line
2. Normality of residuals.
Residuals refers to the distances between the line and the points. Multiple linear regression assumes that these distances are normally distributed with a mean of 0.
3. Homoscedasticity and independence of residuals
Not only must the residuals be normally distributed, they must have equal variance (that’s called homscedasticity) and they must not be related to the IVs.
Sensitivity and specificity are measures of the effectiveness of a diagnostic test. Most often they are used...
In a previous article I looked at how to go wrong with the mean. Today, I will look at a set of alternative...
In regression and ANOVA, an interaction occurs when the effect of one independent variable on the...
Question: Is the method of determining outliers by flagging data that are more than 3 standard deviations away...
Regression refers to a collection of techniques for modeling one variable (the dependent variable or DV), as a...
A frequentist fellow named Smith Kept silent (he pleaded the fifth) When the judge inquired Re assumptions...
If you picture the data as a 2 x 2 crosstab, then quasi-complete separation occurs when one of the cells is...
There are many reasons to write a grant, and many places to apply for one - from small grants for a few...
Question: How do I calculate a t-test for proportion in a small sample of 8 to do hypothesis testing? My...
Question: How does a good statistician contrast with a good mathematician? How do the fundamental skills of...
Signal versus noise Description and inference That is statistics
In this article I showed some problems with the average or arithmetic mean and in this one I discussed...