The title of this post is a quote from baseball great Yogi Berra. Yogi was famous for saying things that sounded kind of strange, but had some wisdom in them, but I bet he never thought he’d be quoted in a statistics blog!
I am not a baseball fan and I am not really sure exactly what Yogi meant with the above quote – perhaps he meant that if you watch what’s going on, you’ll play better.
In statistics, though, I use it to mean that the first step in any analysis ought to involve looking. In particular, we ought to look at the frequencies of categorical variables and the distributions of continuous ones. This is good for checking for odd patterns, data entry errors, coding problems and impossible values. For example, a common (though dangerous!) practice is to use some combination of 9’s for “missing”. If you collect data on the weight of adult humans (in kilograms) and you have some values that are 999, it is probably something like that. But not noticing this will certainly give you some odd results!
You also ought to make graphs of the variables. Certainly univariate graphs (such as density plots) are useful for continuous variables. Bivariately, you should look at any relationships you think are important. Some good graphs here are scatter plots (perhaps with some enhancements) for two continuous variables (but a mean difference plot may also be useful). For one categorical variable and one continuous variable there are parallel box plots, quantile quantile plots and, again, mean difference plots. For two categorical variables one underutilized method is the mosaic plot.
Even once you start formal modeling, you should still keep looking. Software such as R or SAS makes it relatively easy to get important graphs to diagnose the performance, assumptions and possible problems with various statistical methods.
The worst thing you can do is just use the computer as a black box and assume it knows what it’s doing.