Graphics: The good, the bad, and the ugly
This is a talk that I gave at NDRI. I also gave a version of this talk at Yale and at BrainScope
This is a talk that I gave at NDRI. I also gave a version of this talk at Yale and at BrainScope
The t-test is a statistical test of whether two sample means (averages) or proportions are equal. It was invented by William Sealy Gosset, who wrote under the pseudonym “student” to avoid detection by his employer (the Guinness Brewing Company). Guinness prohibited publications by employees, because another employee had divulged trade secrets in writing.
There are also one-sample versions of a the t-test, to tell if a sample has a mean equal to some fixed value, but these are relatively little used. Continue reading 'The t-test'»
When we have quantitative data, one thing we often want to know is where the center is, and, for that, we can look at the mean, median, mode, trimmed mean, and other measures. But we also want to know how spread out the numbers are. Are they all clustered near the median? Or are they all over the place? This can be very important. For example, if you were a 9th grade math teacher, then you would have very different classes if one had scores on a previous test like this:
9.1 9.0 8.9 8.9 9.1 9.0 9.1 8.9 9.0 8.9 9.2 8.8 8.8 9.1
And another had
10.0 8.0 9.0 10.2 7.8 9.0 9.0 7.2 10.8 10.0 8.0 7.5 10.5
Even though both have means right around 9.0.
Most generally, a dependent variable (DV) is something which we think depends on one or more independent variables (IV). That is, we think that, if the IVs change, the DV will change. There is a causal element here – but statistics cannot test the causation, it can only see if there is a relationship.
The proper interpretation of IVs and DVs depends on whether we are doing an experiment, or conducting an observational study. In an experiment, the IV is under the control of the experimenter. In an observational study, it is not. Thus, the common notion that an independent variable is something which the researcher varies is not completely correct. Continue reading 'What are dependent and independent variables?'»
In an earlier article, we looked at simple linear regression, which involves one independent variable (IV) and one dependent variable (DV).
When there are more than one IVs, the method is quite similar, but instead of a scatterplot in two dimensions, we have to imagine a space with as many dimensions as there are variables, and then minimize the distances in that space. Fortunately, the computer takes care of all this, and gives us output. The only difference that need concern us is that now if there are p IVs, the equation looks like
. That is, each of the IVs has an associated parameter.
One interesting feature of multiple linear regression is that the effect of each IV is “controlled” for the other IVs. That is, the parameter for variable
accounts for the effect of
on
, assuming that
,
and so on stay the same. If, for example, we were interested in people’s weights as effects. Continue reading 'What is multiple linear regression?'»
Panorama Theme by
Themocracy