Matrix plots are used to display all pairs of X-Y plots for a set of quantitative variables. They are a good method for detecting pairs of variables that are strongly correlated. It is also possible to detect cases that appear to be outliers.
The matrix plot at the right has two additions: 1. A box-and-whisker plot for each variable in the diagonal locations. 2. A robust LOWESS smooth for each plot, which highlights the estimated relationships between the variables.
A radar or spider plot is used to display the values of several quantitative variables on a case-by-case basis. The plot at the left compares characteristics of 3 different brands.More:Radar-Spider Plot.pdf
A principal components or factor analysis derives linear combinations of multiple quantitative variables that explain the largest percentage of the variation amongst those variables. These types of analyses are used to reduce the dimensionality of the problem in order to better understand the underlying factors affecting those variables. In many cases, a small number of components may explain a large percentage of the overall variability. Proper interpretation of the factors can provide important insights into the mechanisms that are at work.
A cluster analysis groups observations or variables based on similarities between them. The dendrogram at the left shows the results of hierarchical clustering procedure, which begins with separate observations and groups them together based upon the distance between them in a multivariate space.
The Discriminant Analysis procedure is designed to help distinguish between two or more groups of data based on a set of p observed quantitative variables. It does so by constructing discriminant functions that are linear combinations of the variables. The objective of such an analysis is usually one or both of the following:
1. to be able to describe observed cases mathematically in a manner that separates them into groups as well as possible. 2. to be able to classify new observations as belonging to one or another of the groups.
The Neural Network Classifier implements a nonparametric method for classifying observations into one of g groups based on p observed quantitative variables. Rather than making any assumption about the nature of the distribution of the variables within each group, it constructs a nonparametric estimate of each group’s density function at a desired location based on neighboring observations from that group. The estimate is constructed using a Parzen window that weights observations from each group according to their distance from the specified location.
Partial Least Squares is designed to construct a statistical model relating multiple independent variables X to multiple dependent variables Y. The procedure is most helpful when there are many predictors and the primary goal of the analysis is prediction of the response variables. Unlike other regression procedures, estimates can be derived even in the case where the number of predictor variables outnumbers the observations. PLS is widely used by chemical engineers and chemometricians for spectrometric calibration.