Graphics for univariate data – Pie is delicious but not nutritious

By , February 8, 2011 11:00 am

When you have univariate data, that is, a single measure on a variety of units, the most common statistical graphic is a pie chart.  But pie charts should not be used.  Ever.  When there are a lot of units, pie charts are unreadable.  When there are only a few units, pie charts waste space.  And research shows that, even with a moderate number of units, pie charts can distort the data (for example, using different colors leads to different estimates of the size of the wedges).  Fortunately, there are better methods.

Suppose we have data on the population of the 50 states plus Puerto Rico.  A pie chart of this data is a mess:

A pie chart with lots of units

Nothing more need be said.  That’s unreadable.

The US Census groups the states into nine divisions and four regions.  A pie chart of the population of the nine divisions looks this

A pie chart with a moderate number of units

This is at least readable.  If I was going to really use this, I would drop the numbers on the wedges, or perhaps round them.  But there are better methods.

A pie chart of the regions looks like this

A pie chart with few units

Here I told SAS to use a legend rather than label each wedge, but that’s minor.  The real problem is that this chart uses a huge amount of space to show 4 bits of information.  It would be better to just give the populations in text, or maybe a table.

If pie charts are bad, what should be used instead?  One good option is the dot plot.

Here is a dot plot for divisions:

The SAS code to do this is

ods html ;
proc sgplot data = divisionpop;
dot geographical_area/response = pop08 nostatlabel;
yaxis discreteorder = data;
run;
ods html close;

But there are ways to improve this. First, we should order the divisions by population size; second, we should make the numbers easier to read by changing to millions (we do that in a data step). Then we get:

The SAS code for this is

ods html ;
proc sgplot data = divisionpop3;
dot geographical_area/response = pop08millions nostatlabel;
yaxis discreteorder = data;
run;
ods html close;

Finally, we might want to use a log scale:

The SAS code for this is

ods html ;
proc sgplot data = divisionpop3;
dot geographical_area/response = pop08millions nostatlabel;
yaxis discreteorder = data;
xaxis type = log logbase = 10 logstyle = logexpand;
run;
ods html close;

When we have a lot of units, we need to do more to make it all fit on the page.

/* Reduce the font size of the GraphValueFont */
proc template;
define Style styles.mystyle;
parent = styles.default;
style GraphFonts from GraphFonts
"Fonts used in graph styles" /
'GraphTitleFont' = (", ",10pt,bold)
'GraphFootnoteFont' = (", ",8pt)
'GraphLabelFont' = (", ",8pt)
'GraphValueFont' = (", ",6pt)
'GraphDataFont' = (", ",8pt);
end;
run;

/* Increase the height of the graph. */
ods graphics / height=600px;

and then we can implement these and run

ods graphics on;
ods html style=styles.mystyle;
proc sgplot data = statepop3;
dot geographical_area/response = pop08million nostatlabel;
yaxis discreteorder = data;
xaxis type = log logbase = 2 logstyle = logexpand;
run;
ods html close;
ods graphics off;

Creating

7 Responses to “Graphics for univariate data – Pie is delicious but not nutritious”

  1. Jon Peltier says:

    Oops! That last chart omits half the state name labels.

  2. Peter Flom says:

    Uhoh. I gotta fix that. I think it’s just the image size. Thanks for pointing that out

  3. Peter Flom says:

    I think I fixed it now.

  4. Rick Wicklin says:

    Because the populations of the geographical areas (the 2nd dot plot) do not span several orders of magnitude, some people might prefer to view those data in the original scale (millions of people). If so, I recommend adding x=zero to the axis:
    XAXIS MIN=0;

  5. Peter Flom says:

    Hi Rick

    Yes, that’s a hotly debated issue in this sort of graph. I can see both sides of the issue. With a 0, the distances of the dots are on a ratio scale; without the 0, they cover more of the plot. Good to know about that option, so people have a choice

  6. Basil says:

    I think one of the best methods is a box plot. Combine the data as you did for regions and show a boxplot. You know more about the region than just a single number.

  7. Peter Flom says:

    Hi Basil

    Box plots can be good. They show quite a bit of information, but they delete the names of the individual points. I think they are best for cases where there are more units (say, 50 or more, roughly) or where there is no interest in the particular values.

Leave a Reply

Panorama Theme by Themocracy