WilliamBruceCameronn/an/an/an/a
The Elements of Statistical Confusion, Or:
What Does the Mean Mean?1
Scientific writers assure us that mathematics is rapidly becoming
the language of all the sciences. In my own field, sociology, a casual
survey of the journals shows that it already competes strongly with
sociologese, which is an argot singularly difficult to displace. In any
field which strives for impartiality and objectivity in its descriptions of
nature, the cool and dispassionate language of numbers has its appeals, but
statistics, that promising young daughter of mathematics, is constantly
threatened with seductions into easy virtue hardly matched since the
Perils of Pauline.
The basic value and potential fault of numbers is that they are
remote from reality, abstract, and aloof from the loose qualitative
differences which immediately impinge upon our senses. Numerous selections,
generalizations, and discriminations take place before any aspect of sense
experience can be reduced to a number, and most of the time we are hardly
aware of these abstractions even as we make them. The simplest and most
basic statistical operation is counting, which means that we can identify
something clearly enough that we can recognize it when we meet it again,
and keep track of the number of such events which occur. This sounds simple
enough until we actually try to count objects, such as, let us say,
students in various colleges in the university. It is easy enough to simply
count everyone who enrolls, but deans, board members, and newspaper
reporters want to know how many there are in various divisions. Suppose a
student is finishing his undergraduate work and taking a few graduate
courses as well. Is he one undergraduate, one graduate, or one of each? If
someone takes a single course in evening college, is he then one evening
student, or only one-fifth of a student? (Remember, we are trying to keep
our private passions out of this description!) How many times he should be
counted obviously depends on what it is we are trying to count, and for
administrative purposes it may be best to count his appearance in
each of these divisions; but unfortunately, any public listing of 5000
appearances is very likely to be interpreted as 5000 skinsful of student
body, whereas we might find only 3000 epidermal units, or if you prefer
clichés, 3000 noses. Equally obvious, 100 evening college students taking
one two hour course each are in no meaningful way equivalent to 100 day
students, each with a sixteen hour load. The moral is: Not everything that
can be counted counts.
RATIOS, RATES AND PERCENTAGES
If we have counted things to our satisfaction, we can express the
numerical value of one class of objects in terms of the number of some
other, as a fraction or rate or
ratio (e.g., one teacher to each twenty-five students). The
meaning of this, of course, depends first of all on how we counted teachers
and students. To avoid argument with academics, we might better redefine
our units as people who meet classes, and enrollees. Also we must remind
ourselves that the real persons do not necessarily, if indeed ever,
confront each other in the frequencies the ratio suggests. The ratio is
merely a casual guess as to the most likely arrangement to expect by
chance, and contrary to the opinion of some people, academic affairs rarely
proceed entirely by chance.
One of the most useful modifications of the ratio is a statement of
relationships in percentage, or a ratio standardized to a base of one
hundred. A minimum of four mathematical operations have been performed to
obtain a percentage: two classes of events have been counted, the frequency
of one has been divided into the frequency of the other, and the result
multiplied by one hundred. Considered in this way, it is obvious that there
is plenty of room for simple errors, but the simplest of all is the bland
acceptance of the end figure as a kind of real object having a life of its
own. In other words, people tend to treat percentages like match sticks, or
houses, or dollar bills, rather than high-powered abstractions.
A parable: A teacher took a job as instructor at X college, and the
second year he received a raise of ten per cent. The third year enrollment
fell off, and the college was forced to cut everyone’s salary ten per cent.
"Oh well," he said philosophically, "easy come, easy go. I’m right back
where I started." Not if he was a math teacher, he didn’t! If this example
trapped you, figure it out on paper with a starting salary for the
instructor of, say $30,000, which is just as realistic as thinking that ten
per cent equals ten per cent, if you have not first made certain that the
two percentages are computed from the same, and reasonable, base. Even
comparing figures as percentages of the same base is misleading if the
base figure is not understandably related.… The sober, unhappy point is
that both of these two kinds of errors are offered constantly in
newspapers, journals, speeches, and elsewhere, and often the author blandly
omits any definition of the base whatsoever, viz: "Things are
looking better. Business volume is up ten per cent!"
Moral: 400 per cent is better in baseball than in taxes.
AVERAGES
Our society has so often eulogized man’s best friend that only the most
obtuse statistician would conclude that a typical man-and-his-dog average
three legs, but every day good average people make errors just as
gratuitous on the average in using averages. To speak of the average height
of a group of men and women, or the average age of the audience at a grade
school play, may yield results which, while less shocking, are fully as
bizarre. Here again, as with most common statistical devices, few people
really understand mathematically what the formulas mean, and yet they
develop a kind of mystical feel for their use. "Average man" calls up an
image of the man who lives across the alley. "Average day" means one
distinguished from the rest neither by drama nor by excessive monotony. In
fact, most people’s approach to the whole business of averages is so
intuitive that when the statistician writes "mean" they automatically
translate it to "feel," because the mean is meaningless.
To be sure, the sophisticated … have learned that "average" includes
medians and modes, and many even know that for some reason salaries are
better discussed in terms of the median …, but very few people have
learned that there are times when you should not "take an average" at all.
Most of us go ahead and take ’em
on general principles, just like Grandpa took physic. Of course, when
Grandpa had appendicitis, the physic killed him. You can’t go against
nature (or God) that way. But nature (or God) is less prompt in punishing
statistical errors, with the result that many folks develop a real talent
for sin.
Moral: "How mean can you get?"
CORRELATION
This is one of the handiest devices yet devised, and correspondingly,
one of the least understood. Unless you have had a course in statistics,
you probably do not know the formulas for this one, which may be just as
well, considering how many people take means, and how popular a catch-word
correlation has become. Most people think it is a high-powered word for
cause. Actually, it is not. In fact, "it" is not anything, because "it" is
a "they." While correlation customarily refers to Pearsonian r
(because this is an easy formula for people with easy consciences), there
are numerous ways of computing correlations, each with subtly different
meanings, but all with one thing in common: correlations are simply
mathematical statements about the degree to which some varying things tend
(or don’t tend) to vary together. J. S. Mill painstakingly explained, a
long time ago, that even when causes were somehow involved, you could not
safely infer that one of the variables in the correlation was causing the
other; but Mill is out of fashion these days, and correlations are popular.
Perhaps a good example of spurious causal reasoning might be the very high
positive correlation between the number of arms and the number of legs in
most human populations, which clearly proves what I have claimed all along,
that arms cause legs.
There is no point in the math-fearing layman’s even trying to grasp when
and how to use the various correlation formulas. You simply must study some
mathematics to gain even a hint of the restrictions, because the
restrictions grow in part out of the kind of data with which you deal, and
in part out of the mathematical assumptions you make in trying to get the
job done. If the mathematical assumptions are not met reasonably well by
the data (and they almost never are!), the resulting statement about
relationships among the data is in greater or lesser part grounds for
libel. But data, like nature and God, are slow to respond to statistical
calumny; so let us only seek to protect the reader.
Two other forms of correlation are beginning to appear in public, with
their own characteristic misinterpretations. These are multiple and partial
correlation.
If correlation means the mathematical relation between two sets of
variables, then multiple correlation means relationships between three sets
or more. Fair enough? This is especially handy when trying to describe a
complex set of interactions, such as rush hour traffic or the stock market,
or many human behaviors in which opposing and cooperating forces are
working, pushing and shoving, not working in any clearcut simple direction,
but nonetheless producing some kind of result. The "feel" most people have
for correlation carries over into multiple correlation, with probably not
much greater inaccuracy. Instead of feeling one thing affecting another,
they can go on feeling several things affecting another.
The real fun comes with partials. Multiples are confusing "because of"
(or correlated strongly with) the fact that they describe complex
situations. Partials are confusing because with them we symbolically do
what we can’t do in actual practice (but would love to!): we simplify the
situation by making everything hold still except the one thing we wish to
examine.
"Now," says the layman, "you’re getting somewhere. I knew there
was a simple
answer to all this if you’d just produce it. What was that partial
correlation for income and juvenile delinquency again?" Alas, we are worse
off than before, because with multiple Correlation we convinced him the
problem was complicated (although not for exactly the reasons he supposed);
but now we have inadvertently proven to him that it is all very simple, and
that all effects may be understood in terms of simple, discrete causes. If
I become inarticulate here, it is because in my town a layman (nice average
sort of man) published a statement in which he said income had virtually no
relation to juvenile delinquency, and cheerfully cited a partial
correlation to prove it.
What he did not know, and I failed to explain to him, was that partials
rule out the joint effects of several variables mathematically,
although these effects may be present and important empirically.
For example (and here my analogies really strain their mathematical
bonds!), in samples of water, the multiple correlation between hydrogen and
oxygen and the phenomenon called wetness is high. The partial correlation
for hydrogen and wetness, holding oxygen constant, is near zero. The same
goes for the partial between oxygen and wetness, with hydrogen held
constant. At this point I hope the readers bellow in a chorus, "You idiot,
it takes both hydrogen and oxygen together to produce water!" Amen,
and it probably takes low income, broken homes, blighted residential
property, and a host of other things, all intricately intertwined, to
produce juvenile delinquency. To say that the partial correlation with low
income, all other factors held mathematically constant, is near zero, does
not mean we can forget it in real life. It more probably means that this
one factor is the constant companion of all the rest.
Clearer illustration of multiple and partial correlation may be seen in
the State Fair mince pie, to which each member of the family
surreptitiously added brandy. Each did just a little, but the whole effect
on the judge was a lulu. To attribute some portion of the binge to any
single person’s brandy contribution would have only symbolic meaning, and
hardly be identifiable empirically, but it could not be ruled out. Camels
may ultimately collapse under straws.
CURVES, PROBABILITIES AND STATISTICAL SIGNIFICANCE
Most teachers have been exposed to the Normal Curve, usually in the form
of an edict from the administration concerning the proper distribution of
grades to hand out. In fact, in one institution some misguided
administrator computed the percentage distribution of grades for my class
of six students and compared it to the proposed institutional curve. The
curve is what you might expect to find if the frequencies of events ranged
around some mid-point purely by chance, like the impact points of artillery
shells fired as exactly as possible at a given target. The mathematical
specifications of the curve are complicated, but the basic point to
remember is that this is a curve of chance occurrences; in fact, some
people call it the curve of error. If any factor, however small,
consistently biases the possibilities of events, they will not group
themselves in this sort of curve, and it is sheer tyranny for us to insist
that they should do so. It is true that over a large number of cases (say
ten thousand) of students taking a given test with a similar general
background of ability and interest, the grades will approximate this
sort of curve. But the principle on which the curve is predicated says
explicitly in fine print that any given small portion (sample) of those ten
thousand (universe) might pile up at either end or in the middle, or be
found scattered all over it from here to Hoboken. This small sample,
colleague, is your class and mine, and it may not be
just your imagination: it is perfectly possible, statistically, that
they really are all F’s this year! Another year they may be all A’s.
Moral: The normal curve will never replace the Esquire
calendar.
The theory of sampling is a beautiful and fearful thing to behold, and
none but the statistical priesthood should be trusted to gaze upon it. But
the laity should at least become pious and agree to some key points in the
creed. First of all, size of sample is much, underline much, less
important than almost everything else about the sample. A carefully
designed sample of two hundred cases can tell more than a sloppily
collected sample of two thousand. The basic problem in sampling is to get
a sample which faithfully represents the whole population, or universe,
from which it was drawn. All the elaborate machinery of sampling is set up
to serve this purpose, and if the rules are not followed, the sample might
as well not be drawn at all. Good sampling is neither cheap nor easy, while
bad sampling is sometimes both. The casual layman who wants to know how
to make a sample is best advised as was the man who asked a doctor at a
dance what he would advise in a hypothetical case of illness. You will
recall that the M. D. seriously said, "I would advise that man to
see a doctor." The best advice before trying to draw a sample is to see
your local statistician. Otherwise, don’t do it yourself unless you are
sure you know how.
Moral: A free sample may be good for a disease you don’t have.
The question which must be answered about most information derived from
sample surveys is: "Is this statistically significant?" What this means is:
"Could the kind of frequencies of events we have discovered have occurred
purely by chance?" On this kind of answer rests our confidence in the Salk
vaccine, radar, strategy in sales campaigns, and many other kinds of events
where the improvement or change we seek is not total but is nevertheless
desirable. In some cases, as small a change as two or three per cent may be
significant—that is to say, is not likely to have occurred merely by
chance; while in others, a twenty or thirty per cent change may not be
significant. The techniques of determining significance are a serious study
in themselves, but the common sense cautions in using them may be summed up
in two statements: a difference that does not make a difference is not a
difference; and: there is a vast difference between something’s being
statistically significant and something’s being important.
1 From the of the American Association of University
Professors, 1957, 43:33–39. By permission.