The late Prof. John Tukey
had a major impact on statistical data analysis. In his
classic book entitled Exploratory Data Analysis, he
introduced many techniques for discovering unique features
contained in data. STATGRAPHICS Centurion contains several
of his procedures, plus other methods designed to help
extract information:
1.
BoxandWhisker Plots 
fivenumber summaries of data samples, with optional
indicators for outside points.
2.
StemandLeaf Displays 
data tabulation created by building a graphic from the
numeric values.
3.
Median Polish of TwoWay Tables
 a technique for discovering a common type of pattern
in twoway tables.
4.
Resistant Method for Fitting a
Straight Line  alternative method for fitting a
straight line which is resistant to the potential
presence of outliers.
5.
Nonlinear Smoothers for Time Series
Data  resistant smoothers based on running
medians.
6.
Rootograms  similar to
histograms but based on the square roots of class
frequencies.
7.
Bubble Charts  coded XY
scatterplots where the symbol size represents the value
of an additional quantitative variable.
8.
Radar/Spider Plots  technique
for comparing several samples of multivariate data.
9.
Scatterplot Matrices 
organized arrays of 2variable scatterplots.
10.
Coded Maps  maps in which states
are colorcoded according to the value of a selected
variable.
BoxandWhisker Plots
A boxandwhisker plot is
a schematic diagram that displays a five number summary of a
data set based on the: minimum, lower quartile, median,
upper quartile, and maximum. It is drawn with a central box
that covers the middle half of the data values, a line at
the median, and whiskers out to the most extreme values
(unless values appear to be far away from the center, in
which case they are shown as separate outside points.)
If desired, notches can be added to the boxes to display the
uncertainty in the location of the true population medians.
StemandLeaf Displays
Tukey's stemandleaf
display illustrates the distribution of the data values in a
sample by using the leading digits from each data value to
create stems and the following digits to create leaves. The
digits to the right of the vertical line each represent one
observation. Any unusual outside points are shown on
special HI and LO stems.
StemandLeaf Display for Temperature: unit = 0.1
12 represents 1.2
LO96.3 96.4
2 96
6 967789
19 970111222344444
40 97556666777888888899999
(38) 9800000000000111222222222233333444444444
52 98555666666666677777777888888888899
19 99000001112223344
4 9959
2 1000
HI100.8 
Median Polish of TwoWay Tables
The Median Polish
procedure constructs a model for the data in a twoway table
by sweeping out column and row medians. The resulting model
for the data consists of a typical value common to all cells
in the table, plus specific row and column effects.
Polished Table
Sweeping 3 times.
Cause 
None 
Grams 1_14 
Grams 15_24 
Grams 25 
Row effect 
Lung cancer 
0.5 
0.2025 
0.2 
0.86 
0.1175 
Upper resp. cancer 
0.0 
0.0275 
0.0 
0.02 
0.4525 
Stomach cancer 
0.24 
0.0875 
0.16 
0.09 
0.2825 
Colon cancer 
0.0025 
0.0 
0.1575 
0.0725 
0.015 
Prostrate caner 
0.405 
0.0125 
0.015 
0.035 
0.3075 
Other cancer 
0.015 
0.0375 
0.015 
0.135 
0.2025 
TB 
0.06 
0.0025 
0.03 
0.0 
0.3925 
Bronchitis 
0.125 
0.0575 
0.055 
0.245 
0.2075 
Other respitory 
0.24 
0.0025 
0.0 
0.28 
0.0025 
Thrombosis 
0.305 
0.0125 
0.015 
1.235 
4.073 
Cardiovascular 
0.0925 
0.09 
0.2425 
0.1175 
1.685 
Hemorrhage 
0.0875 
0.085 
0.1525 
0.1775 
1.47 
Ulcer 
0.0175 
0.02 
0.0525 
0.0275 
0.435 
Violence 
0.125 
0.1725 
0.185 
0.125 
0.0925 
Other 
0.035 
0.2925 
0.035 
0.075 
0.9625 
Column effect 
0.09375 
0.00875 
0.00375 
0.1362 
0.5462 
Resistant Methods for Fitting a
Straight Line
When fitting a straight
line, outliers can have a big impact on the fit. Tukey
devised a fitting method that would be more resistant to
their presence. In his method, the data are divided into
three groups and the fitted line is determined from the
group medians.
Nonlinear Smoothers for Time Series Data
Tukey's resistant
nonlinear smoothers are very useful for displaying the trend
in noisy time series data. In the Time Series Smoothing
procedure, the smoothers are often used as preprocessors
before application of a weighted moving average.
Rootograms
When assessing how
closely a probability distribution matches a sample of data,
standard histograms suffer from the fact that the longer
bars are subject to greater sampling variability than the
shorter bars. By plotting the square roots of the
frequencies rather than the frequencies themselves, it is
easier to see where any significant discrepancies are
occurring. The visual comparison can be made even easier by
suspending the bars from the curve, so that deviations
between observed and expected frequencies can be judged by
comparing the bars to a horizontal rather than a curved
line.
Bubble Charts
A bubble chart can be
used to display four variables simultaneously: one on each
of the X and Y axes, one defining the size of the bubbles,
and one defining the colors.
Radar/Spider Plots
When a relatively small
number of samples need to be compared and the number of
variables is large, a radar or spider plot can be very
effective. The magnitude of each variable is shown along one
of the spokes.
Scatterplot Matrices
A great way to display
multiple quantitative variables is by creating a scatterplot
matrix. Each cell of the matrix contains a plot for a
selected pair of variables. All plots in any given row have
the same variable on the Y axis, while all plots in a given
column have the same variable on the X axis. Adding a
smoother to each cell helps illustrate any relationships.
Coded Maps
Special types of plots
can also be useful for displaying geographical data. The map
below illustrates the results of a poll taken several months
before the last U.S. presidential election.
