The chi-square test can refer to several different types of tests. Here I will discuss the one-way and two-way tests. The two-way test generalizes to multi-way tests in a natural way. These tests are tests for **nominal** variables (for a discussion of what a nominal variable is, see this post). The one-way test tests whether a variable is distributed according to some proportions that you specify beforehand. The two-way and multi-way tests test whether two (or more) variables are associated.

The basic idea of these tests is to compare the **observed** frequencies to the frequencies that would be expected under a null hypothesis. In the one-way test the null hypothesis is that the sample you have is distributed in the same way as the proportions you specified. In the two-way tests the null hypothesis is that the variables are not associated.

Intuitively, if the null is true then the observed frequencies should be close to the expected frequencies. However, the formula requires a bit beyond that. It is:

where is the observed frequency in a particular cell and is the expected frequency and the summation is over all the cells. We can then compare this value to those in a table with degrees of freedom = where r is the number of rows and c is the number of columns (for a one way test, just use one of these).

This may be clearer with some examples.

** Example of a one way test **

Suppose you wish to tell if a die is “honest”. If it were honest then the proportions of each number would be equal to . You roll the die 24 times and get the following:

1 – 3

2 – 5

3 – 4

4 – 6

5 – 3

6 – 3

These are the observed frequencies. The expected frequencies are for each number. So

and the degrees of freedom would be . This is not significant at any reasonable level.

** Example of a 2 way test **

Suppose you wish to see if men and women voted equally for Obama, Romney or someone else in the last presidential election. You sample 100 male voters and 100 female voters and get the following results:

[table]

Candidate, Men, Women

Romney, 53, 45

Obama, 45, 53

Other, 2, 2

[/table]

these are the observed frequencies. To get the expected frequencies if there were no difference we first have to find the marginal totals, which just means the sum across each row and down each column. Then, for each cell, the expected frequencies are . In our example, this is

[table]

Candidate, Men, Women

Romney, 98*100/200, 98*100/200

Obama, 98*100/200, 98*100/200

Other, 4*100/200, 4*100/200

[/table]

or

[table]

Candidate, Men, Women

Romney, 49, 49

Obama, 49, 49

Other, 2, 2

[/table]

and, finally, we have

Well, you ** could ** do all these calculations but, luckily, you don’t have to! The computer does them for you.

** Logic of the test **

What should be clear is that the farther the observed counts deviate from the expected counts, the more significant the results.

“multi-way tests test whether two (or more) variables are associated.”

Associated? You mean independent. As usual great stuff and thanks for share