When you have bivariate data – that is, data on two variables – either or both may be categorical or continuous. When there is one of each, and you want to compare the distribution of one across levels of the other, a parallel box plot is a good option. Suppose, for example, you want to compare the heights of people across ethnic groups.

The very simplest boxplot looks like this

and the code to produce it is equally simple

`ods html path="c:\personal\Graphics" (url=none) file="boxplot1.html";`

proc sgplot data = hypo;

vbox variable /category = group;

run;

ods html close;

Now the line near the middle of each group shows the median, the diamond shape near the middle shows the mean, the box shows the interquartile range (IQR) (25%tile to 75%tile), and the horizontal lines that are farther from the mean show the values that are 1.5 IQRs above the 75%tile and 1.5 IQRs below the 25%tile.

but there are some problems. For one thing, the label “variable” on the y-axis is not informative and the x-axis label should be capitalize. That is easy to fix

`ods html path="c:\personal\Graphics" (url=none) file="boxplot1.html";`

proc sgplot data = hypo;

vbox variable /category = group ;

xaxis label = "Group";

yaxis label = "Height (inches)";

run;

ods html close;

A more substantive error is that some of the outliers overlap, making it hard to tell how many there are. The SPREAD option on the VBOX statement would solve this, if the levels of the variable were exactly the same, but that is not the case here. One solution, at least when there are not a large number of outliers, is the DATALABEL option, which labels each outlier, producing the following:

but this is not ideal. Even with only a relatively small number of outliers, the graph is cluttered.

We can do better using the Graph Template Language.

I specialize in helping graduate students and researchers in psychology, education, economics and the social sciences with all aspects of statistical analysis. Many new and relatively uncommon statistical techniques are available, and these may widen the field of hypotheses you can investigate. Graphical techniques are often misapplied, but, done correctly, they can summarize a great deal of information in a single figure. ** I can help with writing papers, writing grant applications, and doing analysis for grants and research.**

** Specialties:** Regression, logistic regression, cluster analysis, statistical graphics, quantile regression.

You can **click here to email** or reach me via phone at 917-488-7176. Or if you want you can follow me on Facebook, **Twitter**, or LinkedIn.