Partial Least Squares

If you often have regression problems in which you have a great many independent variables, partial least squares is a technique you should know about.

What is partial least squares?

The Cambridge Dictionary of Statistics defines partial least squares as

An alternative to multiple regression which instead of using the original explanatory variables directly, constructs a set of k regressor variables as linear combinations of the original variables. The linear combinations are chosen sequentially in such a way that each new regressor has maximal sample covariance with the response variable subject to being uncorrelated with all previously constructed regressors. (p. 245)

Thus, like principal components analysis, it is a data reduction method.

Why use partial least squares?

There are several routes to solving the problem of having too many independent variables. Sometimes, you do not care about the original variables in themselves. When this is the case, many people first perform principal components analysis (PCA) and then use the first few components as regressors. Unfortunately, there is no assurance that the first few components capture the important parts of the data for regression purposes. PCA is designed so that the first component(s) capture the most possible variance in the independent variables; it takes no account of the dependent variable. Partial least squares remedies this.

Performing Partial Least Squares
In SAS&reg, there is PROC PLS. In R there is the pls package.

Further reading

An Introduction to Partial Least Squares

Overview and Recent Advances in Partial Least Squares/a>

Leave a Comment!

Your email address will not be published. Required fields are marked *

Related Posts