In most regression techniques we assume that the data are independent. Often, this is reasonable. For example, if I collect data on the heights, weights, and other variables from a random sample of American adults, the data are independent – what I weigh does not depend on what you weigh. But there are times when it is not a reasonable assumption Two types of analysis where this is the case are repeated measures and clustered data.
Repeated measures refers to cases where the same subjects are measured repeatedly. For example, if we were interested in the effect of different diets on weight loss, we would likely weigh the same people repeatedly. What I weigh next month does depend on what I weigh today. The dependence isn’t perfect, but it is clearly not independent data. Clustered data refers to subjects that are clustered in space. The classic example is students, who are nested with classes, which are nested within schools. My score on some standardized test is probably related to my classmates’ scores – because students from the same class share many characteristics – most obviously, they have the same teacher, but they are also likely to be similar in other ways.