Linear Discriminant Analysis (LDA)
Introduction to Discriminant Analysis
Originally developed in 1936 by R.A. Fisher, Discriminant
Analysis is a classic method of classification that has stood the test of
time. Discriminant analysis often produces models whose accuracy
approaches (and occasionally exceeds) more complex modern methods.
Discriminant analysis can be used only for classification (i.e., with a
categorical target variable), not for regression. The target variable may
have two or more categories.
To explain discriminant analysis, let's consider a classification involving
two target categories and two predictor variables. The following figure
by Balakrishnama and Ganapathiraju shows a plot of the two categories with
the two predictors on orthogonal axes:
A visual inspection shows that category 1 objects (open circles) tend to
have larger values of the predictor on the Y axis and smaller values on the
X axis. However, there is overlap between the target categories on both
axes, so we can't perform an accurate classification using only one of the
predictors.
Linear discriminant analysis finds a linear transformation ("discriminant
function") of the two predictors, X and Y, that yields a new set of
transformed values that provides a more accurate discrimination than either
predictor alone:
TransformedTarget = C1*X + C2*Y
The following figure (also from Balakrishnama and Ganapathiraju) shows the
partitioning done using the transformation function:
A transformation function is found that maximizes the ratio of
between-class variance to within-class variance as illustrated by this
figure produced by Ludwig Schwardt and Johan du Preez:
The transformation seeks to rotate the axes so that when the categories are
projected on the new axes, the differences between the groups are
maximized. The following figure (also by Schwardt and du Preez) shows two
rotates axes. Projection to the lower right axis achieves the maximum
separation between the categories; projection to the lower left axis yields
the worst separation.
The following figure by Randy Julian of Lilly Labs illustrates a
distribution projected on a transformed axis. Note that the
projected values produce complete separation on the transformed axis,
whereas there is overlap on both the original X and Y axes.
In the ideal case, a projection can be found that completely separates the
categories (such as shown above). However, in most cases there is no
transformation that provides complete separation, so the goal is to find
the transformation that minimizes the overlap of the transformed
distributions. The following figure by Alex Park and Christine Fry
illustrates a distribution of two categories ("switch" in blue and
"non-switch" in red). The black line shows the optimal axis found by
linear discriminant analysis that maximizes the separation between the
groups when they are projected on the line.
The following figure (also by Alex Park and Christine Fry) shows the
distribution of the switch and non-switch categories as projected on the
transformed axis (i.e., the black line shown in the figure above):
Note that even after the transformation there is overlap between the
categories, but setting a cutoff point around -1.7 on the transformed axis
yields a reasonable classification of the categories.
The DTREG Discriminant Analysis Property Page
Controls for discriminant analyses are provided on a screen in DTREG that
has the following image: