Often, when reading a statistics book, you will see some variation on the phrase “independent data“. Many models assume that the data are independent. Sometimes this is abbreviated as part of the acronym iid which means independent and identically distributed.
You may get confused between this and the case of independent and dependent variables, which I discussed here. But the two ideas are quite different.
When we say data are independent, we mean that the data for different subjects do not depend on each other. When we say a variable is independent we mean that it does not depend on another variable for the same subject.
For instance, if we are trying to predict the weight of adult humans, we might gather a sample of adults, and collect various bits of information – height, weight, sex, age, and perhaps many others. Weight is a dependent variable because it depends on the other variables – taller people tend to be heavier; men tend to be heavier than women, and so on. But the data are independent if the weight and other variables for one person aren’t related to those for another.
Sometimes, though, the data are dependent . One example is if we measured some variables on a bunch of children, but chose kids who were in particular classes in particular schools: Kids in a class are likely to be more similar to each other than kids in different classes.
Another example is when we measure the same person (or other subject) more than once. If I give a bunch of students a midterm and a final, their final grade is likely to depend on their midterm grade, not just because of a general relationship between the two grades, but because it is the same person.
Specialties: Regression, logistic regression, cluster analysis, statistical graphics, quantile regression.