In a previous article I looked at how to go wrong with the mean. Today, I will look at a set of alternative measures that deal with some problems of the mean; in particular, they deal well with data that is highly skewed or has outliers. This set is the trimmed mean, the most well known of which is the median.
What is the trimmed mean?
The trimmed mean is the mean with some extreme values taken out.; for example, in the 10% trimmed mean the largest and smallest 10% of the values are removed and then the mean is taken on the remaining 80%. This strikes some people as odd, almost as “cheating”. But these same people are often perfectly comfortable reporting the median, which is simply the 50% trimmed mean. This is strange. People are fine with chucking all the data, but not with chucking some of it. Still, it is true that the median and median are very often reported, while other trimmed means are not.
What is the Winsorized mean?
The Winsorized mean is similar to the trimmed mean, except that rather than deleting the extreme values, they are set equal to the next largest (or smallest) value.
Example of the trimmed mean
Suppose you are taking an introductory statistics class. The professor decides to collect data on the heights of all the students in the class. However, she is unaware that the coach of the men’s basketball team has recommended the class to his team. Each student writes his or her height on a card and passes it in; each student also writes an M or F for male and female.
For the men: 70, 72, 74, 68, 69, 87, 82, 73, 67, 70, and for the women: 62, 67, 66, 61, 69, 70, 69, 68, 62, 63.
He first takes the two averages. For the men, the mean is 73.2 inches, or 6 feet 1.2 inches. For the women, he gets 65.7 inches, or 5 feet 5.7 inches. The professor knows that the average height of men is not close to 6’1″. She looks at the values and sees some are odd and decides to calculate the 20% trimmed means. That is, she removes the 2 tallest and 2 shortest men and the 2 tallest and 2 shortest women; then she calculates the mean on the remaining 6. It is easier to do this if you first sort the data (e.g. for the men, change:
70, 72, 74, 68, 69, 87, 82, 73, 67, 70
67, 68, 69, 70, 70, 72, 73, 74, 82, 87
and then remove the two smallest (67 and 68) and two largest (82 and 87).
The results? For the men, she gets 71.3 inches (much less than 73.2) and for the women she gets 65.8 inches (almost exactly 65.7). Then she uses that information to investigate and writes a note to the coach thanking him for sending students to her class.
Example of the Winsorized mean
To calculate the Winosrized mean we would again order the data but then change the extreme values; for the men:
67, 68, 69, 70, 70, 72, 73, 74, 82, 87
69, 69, 69, 70, 70, 72, 73, 74, 74, 74
The 20% Winsorized mean is 71.4 (very close to the trimmed mean, which is common).
What about the median?
She decides to also show the median height for each group. The median is defined as the value that splits the group in two: Half are higher, half lower. For the men it is 71.0 inches and for the women it is 66.5 inches. These values are, in this example, quite close to the 20% trimmed means.
How to calculate these numbers?
With only 10 students in each group, it is easy to do the calculations by hand. But if the groups were larger, it would be a pain. Fortunately, statistical software packages exist to do the work for you. In `R` you can do all the above calculations with the following code:
men <- c(70, 72, 74, 68, 69, 87, 82, 73, 67, 70) #Data
mean(men, trim = .2) #Trimmed mean
women <- c(62, 67, 66, 61, 69, 70, 69, 68, 62, 63)
mean(women, trim = .2)
quantile(men) #Quantiles, including the median (50% quantile)
In `R` the `psych` package has winsor and winsor.means functions; these do Winsorization slightly differently; using the values at the exact quintiles (that is, here, the 80%tile and 20%tile) which is easy for computers but not that easy in hand calculation. The result for the men is 71.68.
In articles to come, I’ll look at other measures of central tendency.
I specialize in helping graduate students and researchers in psychology, education, economics and the social sciences with all aspects of statistical analysis. Many new and relatively uncommon statistical techniques are available, and these may widen the field of hypotheses you can investigate. Graphical techniques are often misapplied, but, done correctly, they can summarize a great deal of information in a single figure. I can help with writing papers, writing grant applications, and doing analysis for grants and research.
Specialties: Regression, logistic regression, cluster analysis, statistical graphics, quantile regression.