In a recent article in Sociological Methodology entitled “How to impute interactions, squares, and other transformed variables”, Paul T. von Hippel shows that, when y0u have missing data and are using interactions, squares, or other transformed variables in a regression, it is better to transform first, and then impute.
In multiple imputation, the problem of missing data is dealt with by imputing multiple sets of data, and then combining them. When there are no interactions or quadratics, the process is well-understood. But relatively little is known about the proper procedure when you do have transformations. von Hippel shows, using both mathematics and example data, that it is better to first transform the data that you do have, and then impute. Although this leads to the odd situation that, e.g. the imputed values X^2 are not equal to the square of the imputed values for X; doing it in the reverse order (that is, imputing and transforming) yields biased estimates of the regression coefficients.
This is so both for ordinary least squares regression and other regression models.
I found the article fascinating and accessible.
[learn_more caption=”Author Bio”] I specialize in helping graduate students and researchers in psychology, education, economics and the social sciences with all aspects of statistical analysis. Many new and relatively uncommon statistical techniques are available, and these may widen the field of hypotheses you can investigate. Graphical techniques are often misapplied, but, done correctly, they can summarize a great deal of information in a single figure. I can help with writing papers, writing grant applications, and doing analysis for grants and research.
Specialties: Regression, logistic regression, cluster analysis, statistical graphics, quantile regression.