WilliamBruceCameronn/an/an/an/a

The Elements of Statistical Confusion, Or: What Does the Mean Mean?1

Scientific writers assure us that mathematics is rapidly becoming the language of all the sciences. In my own field, sociology, a casual survey of the journals shows that it already competes strongly with sociologese, which is an argot singularly difficult to displace. In any field which strives for impartiality and objectivity in its descriptions of nature, the cool and dispassionate language of numbers has its appeals, but statistics, that promising young daughter of mathematics, is constantly threatened with seductions into easy virtue hardly matched since the Perils of Pauline.

The basic value and potential fault of numbers is that they are remote from reality, abstract, and aloof from the loose qualitative differences which immediately impinge upon our senses. Numerous selections, generalizations, and discriminations take place before any aspect of sense experience can be reduced to a number, and most of the time we are hardly aware of these abstractions even as we make them. The simplest and most basic statistical operation is counting, which means that we can identify something clearly enough that we can recognize it when we meet it again, and keep track of the number of such events which occur. This sounds simple enough until we actually try to count objects, such as, let us say, students in various colleges in the university. It is easy enough to simply count everyone who enrolls, but deans, board members, and newspaper reporters want to know how many there are in various divisions. Suppose a student is finishing his undergraduate work and taking a few graduate courses as well. Is he one undergraduate, one graduate, or one of each? If someone takes a single course in evening college, is he then one evening student, or only one-fifth of a student? (Remember, we are trying to keep our private passions out of this description!) How many times he should be counted obviously depends on what it is we are trying to count, and for administrative purposes it may be best to count his appearance in each of these divisions; but unfortunately, any public listing of 5000 appearances is very likely to be interpreted as 5000 skinsful of student body, whereas we might find only 3000 epidermal units, or if you prefer clichés, 3000 noses. Equally obvious, 100 evening college students taking one two hour course each are in no meaningful way equivalent to 100 day students, each with a sixteen hour load. The moral is: Not everything that can be counted counts.

RATIOS, RATES AND PERCENTAGES

If we have counted things to our satisfaction, we can express the numerical value of one class of objects in terms of the number of some other, as a fraction or rate or ratio (e.g., one teacher to each twenty-five students). The meaning of this, of course, depends first of all on how we counted teachers and students. To avoid argument with academics, we might better redefine our units as people who meet classes, and enrollees. Also we must remind ourselves that the real persons do not necessarily, if indeed ever, confront each other in the frequencies the ratio suggests. The ratio is merely a casual guess as to the most likely arrangement to expect by chance, and contrary to the opinion of some people, academic affairs rarely proceed entirely by chance.

One of the most useful modifications of the ratio is a statement of relationships in percentage, or a ratio standardized to a base of one hundred. A minimum of four mathematical operations have been performed to obtain a percentage: two classes of events have been counted, the frequency of one has been divided into the frequency of the other, and the result multiplied by one hundred. Considered in this way, it is obvious that there is plenty of room for simple errors, but the simplest of all is the bland acceptance of the end figure as a kind of real object having a life of its own. In other words, people tend to treat percentages like match sticks, or houses, or dollar bills, rather than high-powered abstractions.

A parable: A teacher took a job as instructor at X college, and the second year he received a raise of ten per cent. The third year enrollment fell off, and the college was forced to cut everyone’s salary ten per cent. "Oh well," he said philosophically, "easy come, easy go. I’m right back where I started." Not if he was a math teacher, he didn’t! If this example trapped you, figure it out on paper with a starting salary for the instructor of, say $30,000, which is just as realistic as thinking that ten per cent equals ten per cent, if you have not first made certain that the two percentages are computed from the same, and reasonable, base. Even comparing figures as percentages of the same base is misleading if the base figure is not understandably related.… The sober, unhappy point is that both of these two kinds of errors are offered constantly in newspapers, journals, speeches, and elsewhere, and often the author blandly omits any definition of the base whatsoever, viz: "Things are looking better. Business volume is up ten per cent!"

Moral: 400 per cent is better in baseball than in taxes.

AVERAGES

Our society has so often eulogized man’s best friend that only the most obtuse statistician would conclude that a typical man-and-his-dog average three legs, but every day good average people make errors just as gratuitous on the average in using averages. To speak of the average height of a group of men and women, or the average age of the audience at a grade school play, may yield results which, while less shocking, are fully as bizarre. Here again, as with most common statistical devices, few people really understand mathematically what the formulas mean, and yet they develop a kind of mystical feel for their use. "Average man" calls up an image of the man who lives across the alley. "Average day" means one distinguished from the rest neither by drama nor by excessive monotony. In fact, most people’s approach to the whole business of averages is so intuitive that when the statistician writes "mean" they automatically translate it to "feel," because the mean is meaningless.

To be sure, the sophisticated … have learned that "average" includes medians and modes, and many even know that for some reason salaries are better discussed in terms of the median …, but very few people have learned that there are times when you should not "take an average" at all. Most of us go ahead and take ’em on general principles, just like Grandpa took physic. Of course, when Grandpa had appendicitis, the physic killed him. You can’t go against nature (or God) that way. But nature (or God) is less prompt in punishing statistical errors, with the result that many folks develop a real talent for sin.

Moral: "How mean can you get?"

CORRELATION

This is one of the handiest devices yet devised, and correspondingly, one of the least understood. Unless you have had a course in statistics, you probably do not know the formulas for this one, which may be just as well, considering how many people take means, and how popular a catch-word correlation has become. Most people think it is a high-powered word for cause. Actually, it is not. In fact, "it" is not anything, because "it" is a "they." While correlation customarily refers to Pearsonian r (because this is an easy formula for people with easy consciences), there are numerous ways of computing correlations, each with subtly different meanings, but all with one thing in common: correlations are simply mathematical statements about the degree to which some varying things tend (or don’t tend) to vary together. J. S. Mill painstakingly explained, a long time ago, that even when causes were somehow involved, you could not safely infer that one of the variables in the correlation was causing the other; but Mill is out of fashion these days, and correlations are popular. Perhaps a good example of spurious causal reasoning might be the very high positive correlation between the number of arms and the number of legs in most human populations, which clearly proves what I have claimed all along, that arms cause legs.

There is no point in the math-fearing layman’s even trying to grasp when and how to use the various correlation formulas. You simply must study some mathematics to gain even a hint of the restrictions, because the restrictions grow in part out of the kind of data with which you deal, and in part out of the mathematical assumptions you make in trying to get the job done. If the mathematical assumptions are not met reasonably well by the data (and they almost never are!), the resulting statement about relationships among the data is in greater or lesser part grounds for libel. But data, like nature and God, are slow to respond to statistical calumny; so let us only seek to protect the reader.

Two other forms of correlation are beginning to appear in public, with their own characteristic misinterpretations. These are multiple and partial correlation.

If correlation means the mathematical relation between two sets of variables, then multiple correlation means relationships between three sets or more. Fair enough? This is especially handy when trying to describe a complex set of interactions, such as rush hour traffic or the stock market, or many human behaviors in which opposing and cooperating forces are working, pushing and shoving, not working in any clearcut simple direction, but nonetheless producing some kind of result. The "feel" most people have for correlation carries over into multiple correlation, with probably not much greater inaccuracy. Instead of feeling one thing affecting another, they can go on feeling several things affecting another.

The real fun comes with partials. Multiples are confusing "because of" (or correlated strongly with) the fact that they describe complex situations. Partials are confusing because with them we symbolically do what we can’t do in actual practice (but would love to!): we simplify the situation by making everything hold still except the one thing we wish to examine.

"Now," says the layman, "you’re getting somewhere. I knew there was a simple answer to all this if you’d just produce it. What was that partial correlation for income and juvenile delinquency again?" Alas, we are worse off than before, because with multiple Correlation we convinced him the problem was complicated (although not for exactly the reasons he supposed); but now we have inadvertently proven to him that it is all very simple, and that all effects may be understood in terms of simple, discrete causes. If I become inarticulate here, it is because in my town a layman (nice average sort of man) published a statement in which he said income had virtually no relation to juvenile delinquency, and cheerfully cited a partial correlation to prove it.

What he did not know, and I failed to explain to him, was that partials rule out the joint effects of several variables mathematically, although these effects may be present and important empirically. For example (and here my analogies really strain their mathematical bonds!), in samples of water, the multiple correlation between hydrogen and oxygen and the phenomenon called wetness is high. The partial correlation for hydrogen and wetness, holding oxygen constant, is near zero. The same goes for the partial between oxygen and wetness, with hydrogen held constant. At this point I hope the readers bellow in a chorus, "You idiot, it takes both hydrogen and oxygen together to produce water!" Amen, and it probably takes low income, broken homes, blighted residential property, and a host of other things, all intricately intertwined, to produce juvenile delinquency. To say that the partial correlation with low income, all other factors held mathematically constant, is near zero, does not mean we can forget it in real life. It more probably means that this one factor is the constant companion of all the rest.

Clearer illustration of multiple and partial correlation may be seen in the State Fair mince pie, to which each member of the family surreptitiously added brandy. Each did just a little, but the whole effect on the judge was a lulu. To attribute some portion of the binge to any single person’s brandy contribution would have only symbolic meaning, and hardly be identifiable empirically, but it could not be ruled out. Camels may ultimately collapse under straws.

CURVES, PROBABILITIES AND STATISTICAL SIGNIFICANCE

Most teachers have been exposed to the Normal Curve, usually in the form of an edict from the administration concerning the proper distribution of grades to hand out. In fact, in one institution some misguided administrator computed the percentage distribution of grades for my class of six students and compared it to the proposed institutional curve. The curve is what you might expect to find if the frequencies of events ranged around some mid-point purely by chance, like the impact points of artillery shells fired as exactly as possible at a given target. The mathematical specifications of the curve are complicated, but the basic point to remember is that this is a curve of chance occurrences; in fact, some people call it the curve of error. If any factor, however small, consistently biases the possibilities of events, they will not group themselves in this sort of curve, and it is sheer tyranny for us to insist that they should do so. It is true that over a large number of cases (say ten thousand) of students taking a given test with a similar general background of ability and interest, the grades will approximate this sort of curve. But the principle on which the curve is predicated says explicitly in fine print that any given small portion (sample) of those ten thousand (universe) might pile up at either end or in the middle, or be found scattered all over it from here to Hoboken. This small sample, colleague, is your class and mine, and it may not be just your imagination: it is perfectly possible, statistically, that they really are all F’s this year! Another year they may be all A’s.

Moral: The normal curve will never replace the Esquire calendar.

The theory of sampling is a beautiful and fearful thing to behold, and none but the statistical priesthood should be trusted to gaze upon it. But the laity should at least become pious and agree to some key points in the creed. First of all, size of sample is much, underline much, less important than almost everything else about the sample. A carefully designed sample of two hundred cases can tell more than a sloppily collected sample of two thousand. The basic problem in sampling is to get a sample which faithfully represents the whole population, or universe, from which it was drawn. All the elaborate machinery of sampling is set up to serve this purpose, and if the rules are not followed, the sample might as well not be drawn at all. Good sampling is neither cheap nor easy, while bad sampling is sometimes both. The casual layman who wants to know how to make a sample is best advised as was the man who asked a doctor at a dance what he would advise in a hypothetical case of illness. You will recall that the M. D. seriously said, "I would advise that man to see a doctor." The best advice before trying to draw a sample is to see your local statistician. Otherwise, don’t do it yourself unless you are sure you know how.

Moral: A free sample may be good for a disease you don’t have.

The question which must be answered about most information derived from sample surveys is: "Is this statistically significant?" What this means is: "Could the kind of frequencies of events we have discovered have occurred purely by chance?" On this kind of answer rests our confidence in the Salk vaccine, radar, strategy in sales campaigns, and many other kinds of events where the improvement or change we seek is not total but is nevertheless desirable. In some cases, as small a change as two or three per cent may be significant—that is to say, is not likely to have occurred merely by chance; while in others, a twenty or thirty per cent change may not be significant. The techniques of determining significance are a serious study in themselves, but the common sense cautions in using them may be summed up in two statements: a difference that does not make a difference is not a difference; and: there is a vast difference between something’s being statistically significant and something’s being important.

1 From the of the American Association of University Professors, 1957, 43:33–39. By permission.