At first, it seems obvious that the answer to this question is yes. After all, who wants a biased estimator? But … sometimes, the answer is no. Sometimes a biased estimator is better.

In statistics, there is often a trade off between bias and variance. That is, we can get an estimate that is perfectly unbiased or one that has low variance, but not both. One example of this is using ridge regression to deal with colinearity. Colinearity occurs when one independent variable is close to being a linear combination of some set of other independent variables. It causes many problems, including high variance in the estimates. For instance, R provides simulated data on genetics that looks like this (first six rows shown):

Phenotypes SNP1 SNP2 SNP3 SNP4 SNP5 SNP6 SNP7 SNP8 SNP9 SNP10 SNP11 SNP12 [1,] 1.4356316 1 1 1 0 0 1 0 0 0 0 1 0 [2,] 2.9226960 1 0 0 0 1 0 0 1 0 0 1 0 [3,] 0.5669319 0 0 0 0 0 0 0 0 0 0 2 0 [4,] 4.8515051 1 0 0 0 1 0 0 0 0 0 1 0 [5,] 0.1525582 1 1 1 0 0 1 0 0 0 0 1 0 [6,] 2.9522701 1 0 0 0 1 0 0 0 0 0 0 0

These data are highly colinear. If we use ordinary least squares regression, e.g. with:

linmod <- lm(Phenotypes ~ ., data = as.data.frame(GenCont))

then the parameter estimates have huge variance:

Coefficients: (1 not defined because of singularities) Estimate Std. Error t value Pr(>|t|) (Intercept) 0.97258 0.11631 8.362 6.51e-16 *** SNP1 0.35468 0.30497 1.163 0.245 SNP2 0.19159 0.60079 0.319 0.750 SNP3 NA NA NA NA SNP4 0.28496 0.46292 0.616 0.538 SNP5 1.72750 0.30362 5.690 2.20e-08 *** SNP6 -0.50793 0.51379 -0.989 0.323 SNP7 -0.46772 1.02390 -0.457 0.648 SNP8 -0.15980 0.11236 -1.422 0.156 SNP9 0.41477 1.02258 0.406 0.685 SNP10 0.61043 0.65749 0.928 0.354 SNP11 0.08350 0.08847 0.944 0.346 SNP12 -0.86977 1.02455 -0.849 0.396

and only one is significant.

But if we use ridge regression with:

ridgemod <- linearRidge(Phenotypes ~ ., data = as.data.frame(GenCont))

then the parameter estimates are a little bit off, but the variance is much reduced:

Coefficients: Estimate Scaled estimate Std. Error (scaled) t value (scaled) Pr(>|t|) (Intercept) 1.533386 NA NA NA NA SNP1 0.277296 4.045409 0.266120 15.201 < 2e-16 *** SNP2 -0.110458 -1.256154 0.216332 5.807 6.38e-09 *** SNP3 -0.110458 -1.256154 0.216332 5.807 6.38e-09 *** SNP4 0.005230 0.011635 0.371693 0.031 0.97503 SNP5 0.531173 6.323006 0.315368 20.050 < 2e-16 *** SNP6 -0.119164 -1.373227 0.223047 6.157 7.43e-10 *** SNP7 0.113844 0.113730 0.372181 0.306 0.75993 SNP8 -0.099149 -1.028581 0.355807 2.891 0.00384 ** SNP9 -0.008321 -0.008312 0.372386 0.022 0.98219 SNP10 0.058562 0.101128 0.371567 0.272 0.78549 SNP11 -0.096526 -1.495699 0.329250 4.543 5.55e-06 *** SNP12 -0.334279 -0.333945 0.372248 0.897 0.36966

and 6 are significant.