Scatterplots and enhancements
When you have two numeric variables and are interested in the relationship between them, the basic statistical graph is the scatterplot. These can be good, but there are ways to enhance them and there are also alternatives which can be better in some circumstances. In some circumstances, scatterplots can be problematic, and there are ways to deal with these problems. In this post, I show SAS code to create a basic scatterplot and some enhanced versions.
I found a data set of unemployment rate and infant mortality for each of the 50 states in the United States, plus the District of Columbia. A very simple scatter plot can be generated with the following SAS commands
<code>
ods html;
proc sgplot data = UnempIM;
xaxis label = “Unemployment (%)”;
yaxis label = “Infant Mortality (%)”;
scatter x = Unemployment y = InfantMortality;
run;
ods html close;
</code>
Which creates the following plot

This is fine, but it’s possible to add substantially more information. For instance, the following code adds a loess line and an ellipse
proc sgplot data = UnempIM;
xaxis label = "Unemployment (%)";
yaxis label = "Infant Mortality (%)";
*scatter x = Unemployment y = InfantMortality;
loess x = Unemployment y = InfantMortality;
loess x = Unemployment y = InfantMortality /smooth = 1 nomarkers;
ellipse x = Unemployment y = InfantMortality;
run;
ods html close;
ods graphics off;
creating the following graph.

That’s fairly easy to do in SAS. But, we can get more complex and more informative by using the graph template language. We can add density plots to show the distribution of each variable. This code
*More control;
proc template;
define statgraph scatdens2;
begingraph;
entrytitle "Scatter plot with density plots";
layout lattice/columns = 2 rows = 2 columnweights = (.8 .2) rowweights = (.8 .2) columndatarange = union rowdatarange = union;
columnaxes;
columnaxis /label = 'Unemployment (%)' griddisplay = on;
columnaxis /label = '' griddisplay = on;
endcolumnaxes;
rowaxes;
rowaxis /label = 'Infant Mortality (%)' griddisplay = on;
rowaxis /label = '' griddisplay = on;
endrowaxes;
layout overlay;
*scatterplot x = unemployment y = infantmortality;
loessplot x = unemployment y = infantmortality;
loessplot x = unemployment y = infantmortality/smooth = 1 nomarkers;
ellipse x = unemployment y = infantmortality/type = predicted;
endlayout;
densityplot infantmortality/orient = horizontal;
densityplot unemployment;
endlayout;
endgraph;
end;
run;
ods html;
proc sgrender data = UnempIM template = scatdens2;
run;
ods html close;
Produces the following:

[...] This post was mentioned on Twitter by David Napoli, Peter Flom. Peter Flom said: Scatterplots and enhancements in #SAS with code on my blog http://ow.ly/3HLyx #graphics [...]
Nice. I like it! It’s great that you’re discovering PROC SGPLOT and the GTL. A few comments:
1) By default, the LOESS statement adds points AND a curve, so you really don’t need the first SCATTER statement. Notice that your SGPLOT image has markers which look strange. The SCATTER statement displays circles, the first LOESS statement overlays the same points but with a different shape, and the third overlays yet another shape. You can use the NOMARKERS option to prevent this:
loess x = Unemployment y = InfantMortality;
loess x = Unemployment y = InfantMortality /smooth = 1 nomarkers;
ellipse x = Unemployment y = InfantMortality;
2) I think the loess curve with smooth=1 is more easily accomplished with the REG statement.
3) For people who don’t like to program, the %sgdesign macro brings up a GUI interface that allows you to create the second image using drag-and-drop and menus. For details and examples, see
http://support.sas.com/documentation/cdl/en/grstatdesignug/62589/HTML/default/viewer.htm
Hi Rick
Glad you like it and thanks for the pointers.
I’ll fix it up in a little while.
GTL is fantastic.
Peter
Mister Flom,
I’m not SAS fluent, and I encounter a problem with your syntax that I’m not able to fix. Have you any idea about what’s going on ? I’d really want to have the opportunity to create such a kind of graph.
Thanks for this great post !
18 loessplot x = zfeffev y = wd14151/smooth = 1 nomarkers;
———
22
202
ERROR 22-322: Syntax error, expecting one of the following: ;, ALPHA, CLM, CURVELABEL,
CURVELABELATTRS, CURVELABELLOCATION, CURVELABELPOSITION, DATATRANSPARENCY, DEGREE,
GROUP, INCLUDEMISSINGGROUP, INDEX, INTERPOLATION, LEGENDLABEL, LINEATTRS, MAXPOINTS,
NAME, PRIMARY, REWEIGHT, SMOOTH, TIPFORMAT, TIPLABEL, WEIGHT, XAXIS, YAXIS.
ERROR 202-322: The option or parameter is not recognized and will be ignored.
19 ellipse x = zfeffev y = wd14151;
20 endlayout;
21 densityplot wd14151/orient = horizontal;
22 densityplot zfeffev;
23 endlayout;
24 endgraph;
25 end;
WARNING: Object will not be saved.
26 run;
NOTE: PROCEDURE TEMPLATE used (Total process time):
real time 0.46 secondes
cpu time 0.27 secondes
WARNING: Errors were produced.
NOTE: The SAS System stopped processing this step because of errors.
27
28 ods html;
29 proc sgrender data = MaiaD.poster2 template = scatdens2;
30 run;
ERROR: Impossible de restaurer ‘scatdens2′ depuis le stock de modèles !
NOTE: The SAS System stopped processing this step because of errors.
NOTE: PROCEDURE SGRENDER used (Total process time):
real time 0.01 secondes
cpu time 0.00 secondes
31 ods html close;
Hi,
I am pretty sure you are missing a semicolon on the line before the first one you posted (the one that starts with LOESSPLOT
Peter
Actually no… Don’t understand what’s happening. Thanks anyway for the reply.
Hmmm. Well, if you find out, let me know. You might try asking on SAS-L, there are lots of experts there, including some from SAS. Or you could call tech support.
Thanks ! Actually, I have everything… except the scatterplot. But in the SAS help, nomarkers in an option of the loess statement but not of the loessplot one. I’ll ask on SAS-L.
Hi again,
Everything is ok : I thought that “*scatterplot x = unemployment y = infant mortality;” was a comment, but actually this is the syntax which create the scatterplot, and not the line below.
I’m concerned about using Loess in conditions like this. Seems the variation found is insignificant.