The average, or mean, is one of the simplest statistics there is. You have a bunch of numbers, you add them up, divide by how many there are, and …. That’s it! How could you go wrong with the average (mean)? Well…. It’s surprisingly easy to do so.

First, a good example, to set the stage. If you weigh 5 people (Peter, John, Mary, Ed and Sally) and they weigh 180, 190, 130, 186 and 100 pounds, you can just add that up, divide by 5 and you’ve got the average weight for those 5 people. That’s the mean.

Now, what can go wrong?

Averaging rates is a bad idea. For example, suppose I drive to work at a constant speed of 60 miles per hour and drive home at 40 miles per hour, over the same route. What’s my average speed? EASY! The mean of two numbers (60+40)/2 = 50.. So, my average speed is 50, right? Wrong. Let’s say it’s 60 miles to work. Then the trip to work takes me 1 hour, the trip home takes me 1.5 hours, total time is 2.5 hours to drive 120 miles. 120/2.5 = 48, not 50. Or let’s say that Bob is a professional baseball player. He bats .200 for the first half of the season, and .400 for the second half. So, his average for the whole season must be …. .300, right? WRONG. In fact, there is not enough information. You can’t find his overall average from the information given. For example, maybe in the first half he comes to bad 100 times and gets 20 hits; in the second half, he comes to bat 500 times, and gets 200 hits. Then, for the season, he has 600 at bats and 220 hits, and his average is .367.

Averaging times is also problematic. Let’s say you want to find out the average time you went to bed in the last week, and you record: 10 PM, 10PM, 11PM, 1AM, 2AM, 10PM and 10 PM. How to find the average? (10 + 10 + 11 + 1 + 2 + 10 + 10)/7 = 7.71 ? Huh? Between 7 and 8 O’clock? Maybe the problem is AM and PM. Let’s go to a 24 hour clock. (22+ 22 + 23 + 1 + 2 + 22 + 22)/7 = 16.25 … around 4 PM !?!?

The right way to solve this is to take (e.g) hours past the previous noon. So 10, 10, 11, 13, 15, 10, 10 and now the average is 11.28, or just about 11:15. That makes sense.

If there are extreme values, often called outliers, then the mean can be, if not exactly wrong, then certainly misleading. If you are figuring the average height of a group of college students, and your sample happens include the center on the basketball team, who is 7’2″ tall, then your average won’t be a very good representation of the real average height at your school.

So, even with the mean, you can go wrong.

I specialize in helping graduate students and researchers in psychology, education, economics and the social sciences with all aspects of statistical analysis. Many new and relatively uncommon statistical techniques are available, and these may widen the field of hypotheses you can investigate. Graphical techniques are often misapplied, but, done correctly, they can summarize a great deal of information in a single figure. ** I can help with writing papers, writing grant applications, and doing analysis for grants and research.**

** Specialties:** Regression, logistic regression, cluster analysis, statistical graphics, quantile regression.

You can **click here to email** or reach me via phone at 917-488-7176. Or if you want you can follow me on Facebook, **Twitter**, or LinkedIn.

Your comment on averaging rates extends to fuel economy (e.g., miles per gallon), which is why the EPA uses harmonic (not arithmetic) means to compute fuel economy of vehicles.

It is ironic that a harmonic mean, which the average American has never heard of, is used to compute fuel economy, which EVERY American over age 16 is familiar with!

See p. A-ii and A-iii of http://www.nytimes.com/packages/pdf/business/20050728_EPA/trend-05a1.pdf

Rick

Yes, that’s another good example. And the harmonic mean is cool.

In a wierd coincidence, this topic came up in real life in discussion with a roofer about estimating the area of my roof:

http://blogs.sas.com/iml/index.php?/archives/43-Statistics-Can-Save-You-Money-Estimates,-Areas,-and-Arithmetic-Means.html

We get this wrong every day in the wind industry. Every place on the planet has an “average wind speed” right? Well, no, because wind speeds don’t have a normal distribution, it is closer to a Weibull distribution. So even if the “annual average wind speed” is the same at two sites, the estimated energy output could be dramatically different depending on the shape of the Weibull function.

A related article about going there and back …

http://www.datagenetics.com/blog/september22013/index.html