The average, or mean, is one of the simplest statistics there is. You have a bunch of numbers, you add them up, divide by how many there are, and …. That’s it! How could you go wrong with the average (mean)? Well…. It’s surprisingly easy to do so.
First, a good example, to set the stage. If you weigh 5 people (Peter, John, Mary, Ed and Sally) and they weigh 180, 190, 130, 186 and 100 pounds, you can just add that up, divide by 5 and you’ve got the average weight for those 5 people. That’s the mean.
Now, what can go wrong?
Averaging rates is a bad idea. For example, suppose I drive to work at a constant speed of 60 miles per hour and drive home at 40 miles per hour, over the same route. What’s my average speed? EASY! The mean of two numbers (60+40)/2 = 50.. So, my average speed is 50, right? Wrong. Let’s say it’s 60 miles to work. Then the trip to work takes me 1 hour, the trip home takes me 1.5 hours, total time is 2.5 hours to drive 120 miles. 120/2.5 = 48, not 50. Or let’s say that Bob is a professional baseball player. He bats .200 for the first half of the season, and .400 for the second half. So, his average for the whole season must be …. .300, right? WRONG. In fact, there is not enough information. You can’t find his overall average from the information given. For example, maybe in the first half he comes to bad 100 times and gets 20 hits; in the second half, he comes to bat 500 times, and gets 200 hits. Then, for the season, he has 600 at bats and 220 hits, and his average is .367.
Averaging times is also problematic. Let’s say you want to find out the average time you went to bed in the last week, and you record: 10 PM, 10PM, 11PM, 1AM, 2AM, 10PM and 10 PM. How to find the average? (10 + 10 + 11 + 1 + 2 + 10 + 10)/7 = 7.71 ? Huh? Between 7 and 8 O’clock? Maybe the problem is AM and PM. Let’s go to a 24 hour clock. (22+ 22 + 23 + 1 + 2 + 22 + 22)/7 = 16.25 … around 4 PM !?!?
The right way to solve this is to take (e.g) hours past the previous noon. So 10, 10, 11, 13, 15, 10, 10 and now the average is 11.28, or just about 11:15. That makes sense.
If there are extreme values, often called outliers, then the mean can be, if not exactly wrong, then certainly misleading. If you are figuring the average height of a group of college students, and your sample happens include the center on the basketball team, who is 7’2″ tall, then your average won’t be a very good representation of the real average height at your school.
So, even with the mean, you can go wrong.
Specialties: Regression, logistic regression, cluster analysis, statistical graphics, quantile regression.