Today, we think of probability as an intensively mathematical subject. But the mathematics took a long to develop and people were thinking about probability long before it did. Read more!

On this site I have written quite a lot about regression analysis.

But what

isregression analysis?

Briefly, regression analysis is a set of methods for relating dependent variables to independent variables. In the vast majority of regressions there is one dependent variable and any number of independent variables. There are a lot of types of regression analysis. But, you may ask

What is regression analysis

for?

The main thing that makes regression analysis special is the idea of controlling for other variables. The mathematics behind this are interesting to geeks like me, but the basic idea is that of holding other variables constant. It is a way of looking at the relationship between the dependent variable and each independent variable while holding the values of the other independent variables constant.

“OK”, you say

But what is regression analysis *used for*?

And the answer is: Almost everything!

How do voting preferences relate to sex?

How do college grades relate to age?

How does income relate to age, sex, ethnicity and race?

How does the likelihood of getting cancer relate to smoking?

Almost anything!

The chi-square test can refer to several different types of tests. Here I will discuss the one-way and two-way tests. The two-way test generalizes to multi-way tests in a natural way. These tests are tests for **nominal** variables (for a discussion of what a nominal variable is, see this post). The one-way test tests whether a variable is distributed according to some proportions that you specify beforehand. The two-way and multi-way tests test whether two (or more) variables are associated.

Read more!

When you have bivariate data – that is, data on two variables – either or both may be categorical or continuous. When there is one of each, and you want to compare the distribution of one across levels of the other, a parallel box plot is a good option. Suppose, for example, you want to compare the heights of people across ethnic groups. Read more!

Regression to the mean is a well known statistical artifact affecting correlated data that is not perfectly correlated. It was first noticed by Sir Francis Galton in the late 19th century. He noted that the tallest fathers will have sons who are not as tall, and, similarly, the shortest fathers will have sons who are not as short. But this is true, not because of any general tendency toward mediocrity: Indeed, the range of heights of people shows no signs of diminishing. How can this be? Read more!