Often, when reading a statistics book, you will see some variation on the phrase “**independent data**“. Many models assume that the data are independent. Sometimes this is abbreviated as part of the acronym iid which means independent and identically distributed.

You may get confused between this and the case of **independent and dependent variables**, which I discussed here. But the two ideas are quite different.

When we say **data** are independent, we mean that the data for different subjects do not depend on each other. When we say a **variable** is independent we mean that it does not depend on another variable for the same subject.

For instance, if we are trying to predict the weight of adult humans, we might gather a sample of adults, and collect various bits of information – height, weight, sex, age, and perhaps many others. Weight is a **dependent variable** because it depends on the other variables – taller people tend to be heavier; men tend to be heavier than women, and so on. But the ** data are independent if the weight and other variables for one person aren’t related to those for another.**

**Sometimes, though, the data are dependent . One example is if we measured some variables on a bunch of children, but chose kids who were in particular classes in particular schools: Kids in a class are likely to be more similar to each other than kids in different classes. **

**Another example is when we measure the same person (or other subject) more than once. If I give a bunch of students a midterm and a final, their final grade is likely to depend on their midterm grade, not just because of a general relationship between the two grades, but because it is the same person.**

I specialize in helping graduate students and researchers in psychology, education, economics and the social sciences with all aspects of statistical analysis. Many new and relatively uncommon statistical techniques are available, and these may widen the field of hypotheses you can investigate. Graphical techniques are often misapplied, but, done correctly, they can summarize a great deal of information in a single figure. ** I can help with writing papers, writing grant applications, and doing analysis for grants and research.**

** Specialties:** Regression, logistic regression, cluster analysis, statistical graphics, quantile regression.

You can **click here to email** or reach me via phone at 917-488-7176. Or if you want you can follow me on Facebook, **Twitter**, or LinkedIn.

So for example, I am doing a study at school and I am looking at information from a leadership survey and an employee engagement survey. I want to know how the employee ranks their supervisor on the leadership survey and separately how they rank themselves on the employee engagement survey. Then I want to see if there is a relationship (correlation). I wanted to use Kendall’s Tau because I understand it can report the strength of the relationship and handle independent ordinal data.

Am I thinking about this correctly????

Sounds right to me!

Many thanks ^_^

Hello, can you explain independent data in example with plants. For examlple you have 2 fields, each field devided into 3 sections 2 control and 1 gmotreetment. from each section we ll take random samples from different spots in different time. In my opinion the data points within section is dependent because they are living in the same conditions, but if i will take 10 spots in 1 section, and from each spot ll take 1 individual and measure it. If these data will be independent between these 10 spots or not??? and i understand that all these 3 sections have depenent data, because they are growing it the same conditions, but if i ll separate them can these data be independent?

I am not an expert on plants, by any means, but it seems to me that data from a single section will be independent if considered alone – then it would be as if the other sections and fields did not exist. But if you are considering multiple sections, then data within one section is dependent.

hi, can you please explain if we comparing two types of pizza calories/slice.one cheese pizza and another pepperoni pizza. is this indepent or paired data?

That is independent data since the calories in any particular slice of pizza are not dependent on the calories in another.

Hi, could you please tell me if this data is dependent or independent? The energy expenditure of a basking shark whilst feeding, and whilst not feeding, with the two activities occurring intermittently throughout a single 2 hour period. I’m assuming dependent but I’m struggling to apply what you have said to my data set. If you could expand on it at all I’d really appreciate it. Also if possible what statistical test would you recommend? Thanks

It’s dependent data because it’s the same shark. But you will need something more complex – probably a multilevel model or some sort of time series model.

helloo..are shoe sizes dependent or independent data

On the same person? Dependent. On random people? Independent

Hi

I have frequency count of the same words from 2 documents. How to test whether the 2 distributions are statistically significantly different or not?

For each word you could do a Poisson or negative binomial regression with the count as the DV and the name of the document as the IV.

For the whole distribution, you could do chi-square

Peter

What is the data collected from the independent variable called?

Data is not collected from variables. The independent variable is called the independent variable.

Hi, could you please tell me how collect date for board size independent variable.

I don’t have any idea.

Hi, if I want to dig into the Education in Egypt for example. So # of enrolled students is the dependent, and governmental expenditure is the dependent? Could you please give me hints on what other independent variables I could include.

Since I don’t know anything about education in Egypt, I don’t really know what other IVs would be good.

for dependant data how do you go about building reression models?

You can use either multilevel models or generalized estimating equations.

Hi, I have a dataset of online interactions for bartered goods. We are interested in language used when people reject offers, and have categorized instances of such language as either courteous or discourteous. We have two samples from different countries, and would like to know if there is a significant difference for each type of language between the samples. However, there are often numerous instances per rejected offer, such that the total number of say, discourteous instances is larger than the total number of interactions. Would those instances then be dependent, since some come from the same interaction? I assume this would rule out a chi square test. Would a good analysis strategy then be to calculate the mean number of instances per interaction for each country sample and then run a t-test to test for significant differences? Could this be done separately for courteous and discourteous instances?

Yes, they would be dependent. I think you need some kind of multilevel model.

Peter

Thank you for you kind answer. I just wan to know how they can be dependent if the LST of the landfill is effected by the subsurface activity and getting much higher of the air temp regardless of the session. Running correlation, it will show strongly correlated as they are generally have the same trend of going up and down except for some cold days. Please also if you can elaborate more about multilevel tests.Name of this test if possible Thank you very much Peter

That’s not what dependent means. It’s not the same as correlated. The reason a multilevel model is needed is because the errors in a regression will not be independent and that’s an assumption of regression.