Scatterplots and enhancements

By , January 21, 2011 7:19 am

When you have two numeric variables and are interested in the relationship between them, the basic statistical graph is the scatterplot.  These can be good, but there are ways to  enhance them and there are also alternatives which can be better in some circumstances.  In some circumstances, scatterplots can be problematic, and there are ways to deal with these problems.  In this post, I show SAS code to create a basic scatterplot and some enhanced versions.


I found a data set of unemployment rate and infant mortality for each of the 50 states in the United States, plus the District of Columbia.  A very simple scatter plot can be generated with the following SAS commands

<code>

ods html;
proc sgplot data = UnempIM;
xaxis label = “Unemployment (%)”;
yaxis label = “Infant Mortality (%)”;
scatter x = Unemployment y = InfantMortality;
run;
ods html close;
</code>

Which creates the following plot

Simple scatterplot

This is fine, but it’s possible to add substantially more information.  For instance, the following code adds a loess line and an ellipse

proc sgplot data = UnempIM;
xaxis label = "Unemployment (%)";
yaxis label = "Infant Mortality (%)";
*scatter x = Unemployment y = InfantMortality;
loess x = Unemployment y = InfantMortality;
loess x = Unemployment y = InfantMortality /smooth = 1 nomarkers;
ellipse x = Unemployment y = InfantMortality;
run;

ods html close;
ods graphics off;

creating the following graph.

Add loess line and ellipse

That’s fairly easy to do in SAS. But, we can get more complex and more informative by using the graph template language. We can add density plots to show the distribution of each variable. This code


*More control;
proc template;
define statgraph scatdens2;
begingraph;
entrytitle "Scatter plot with density plots";
layout lattice/columns = 2 rows = 2 columnweights = (.8 .2) rowweights = (.8 .2) columndatarange = union rowdatarange = union;
columnaxes;
columnaxis /label = 'Unemployment (%)' griddisplay = on;
columnaxis /label = '' griddisplay = on;
endcolumnaxes;
rowaxes;
rowaxis /label = 'Infant Mortality (%)' griddisplay = on;
rowaxis /label = '' griddisplay = on;
endrowaxes;
layout overlay;
*scatterplot x = unemployment y = infantmortality;
loessplot x = unemployment y = infantmortality;
loessplot x = unemployment y = infantmortality/smooth = 1 nomarkers;
ellipse x = unemployment y = infantmortality/type = predicted;
endlayout;
densityplot infantmortality/orient = horizontal;
densityplot unemployment;
endlayout;
endgraph;
end;
run;

ods html;
proc sgrender data = UnempIM template = scatdens2;
run;
ods html close;

Produces the following:

Scatterplot with density plots

10 Responses to “Scatterplots and enhancements”

  1. [...] This post was mentioned on Twitter by David Napoli, Peter Flom. Peter Flom said: Scatterplots and enhancements in #SAS with code on my blog http://ow.ly/3HLyx #graphics [...]

  2. Rick Wicklin says:

    Nice. I like it! It’s great that you’re discovering PROC SGPLOT and the GTL. A few comments:
    1) By default, the LOESS statement adds points AND a curve, so you really don’t need the first SCATTER statement. Notice that your SGPLOT image has markers which look strange. The SCATTER statement displays circles, the first LOESS statement overlays the same points but with a different shape, and the third overlays yet another shape. You can use the NOMARKERS option to prevent this:
    loess x = Unemployment y = InfantMortality;
    loess x = Unemployment y = InfantMortality /smooth = 1 nomarkers;
    ellipse x = Unemployment y = InfantMortality;

    2) I think the loess curve with smooth=1 is more easily accomplished with the REG statement.

    3) For people who don’t like to program, the %sgdesign macro brings up a GUI interface that allows you to create the second image using drag-and-drop and menus. For details and examples, see
    http://support.sas.com/documentation/cdl/en/grstatdesignug/62589/HTML/default/viewer.htm

  3. Peter Flom says:

    Hi Rick

    Glad you like it and thanks for the pointers.

    I’ll fix it up in a little while.

    GTL is fantastic.

    Peter

  4. EpiFunky says:

    Mister Flom,

    I’m not SAS fluent, and I encounter a problem with your syntax that I’m not able to fix. Have you any idea about what’s going on ? I’d really want to have the opportunity to create such a kind of graph.

    Thanks for this great post !

    18 loessplot x = zfeffev y = wd14151/smooth = 1 nomarkers;
    ———
    22
    202
    ERROR 22-322: Syntax error, expecting one of the following: ;, ALPHA, CLM, CURVELABEL,
    CURVELABELATTRS, CURVELABELLOCATION, CURVELABELPOSITION, DATATRANSPARENCY, DEGREE,
    GROUP, INCLUDEMISSINGGROUP, INDEX, INTERPOLATION, LEGENDLABEL, LINEATTRS, MAXPOINTS,
    NAME, PRIMARY, REWEIGHT, SMOOTH, TIPFORMAT, TIPLABEL, WEIGHT, XAXIS, YAXIS.

    ERROR 202-322: The option or parameter is not recognized and will be ignored.

    19 ellipse x = zfeffev y = wd14151;
    20 endlayout;
    21 densityplot wd14151/orient = horizontal;
    22 densityplot zfeffev;
    23 endlayout;
    24 endgraph;
    25 end;
    WARNING: Object will not be saved.
    26 run;
    NOTE: PROCEDURE TEMPLATE used (Total process time):
    real time 0.46 secondes
    cpu time 0.27 secondes

    WARNING: Errors were produced.
    NOTE: The SAS System stopped processing this step because of errors.
    27
    28 ods html;

    29 proc sgrender data = MaiaD.poster2 template = scatdens2;
    30 run;

    ERROR: Impossible de restaurer ‘scatdens2′ depuis le stock de modèles !
    NOTE: The SAS System stopped processing this step because of errors.
    NOTE: PROCEDURE SGRENDER used (Total process time):
    real time 0.01 secondes
    cpu time 0.00 secondes

    31 ods html close;

  5. Peter Flom says:

    Hi,
    I am pretty sure you are missing a semicolon on the line before the first one you posted (the one that starts with LOESSPLOT

    Peter

  6. EpiFunky says:

    Actually no… Don’t understand what’s happening. Thanks anyway for the reply.

  7. Peter Flom says:

    Hmmm. Well, if you find out, let me know. You might try asking on SAS-L, there are lots of experts there, including some from SAS. Or you could call tech support.

  8. EpiFunky says:

    Thanks ! Actually, I have everything… except the scatterplot. But in the SAS help, nomarkers in an option of the loess statement but not of the loessplot one. I’ll ask on SAS-L.

  9. EpiFunky says:

    Hi again,

    Everything is ok : I thought that “*scatterplot x = unemployment y = infant mortality;” was a comment, but actually this is the syntax which create the scatterplot, and not the line below.

  10. Nitpicker_mft says:

    I’m concerned about using Loess in conditions like this. Seems the variation found is insignificant.

Leave a Reply

Panorama Theme by Themocracy