An excerpt from: Stein, Z. (2014). Tipping the scales: social justice and educational measurement. (doctoral dissertation). Harvard University Graduate School of Education. Cambridge, MA. [pdf]
Testing in the name of the least well-off: Binet’s vision of testing and justice
If the impression takes root that these tests really measure intelligence, that they constitute a sort of last judgment on the child’s capacity, that they reveal “scientifically” his predestined ability, then it would be a thousand times better if all the intelligence testers and all their questionnaires were sunk without warning into the Sargasso Sea. One has only to read around in the literature…to see how easily the intelligence test can be turned into an engine of cruelty, how easily in the hands of blundering or prejudiced men it could turn into a method of stamping a permanent sense of inferiority upon the soul of a child.
-Walter Lippmann (1922)
Binet is justly famous as the inventor of the IQ test, which was an ingenious solution to a difficult problem put to him by the French minister of public education in 1904: devising a means for identifying students who could not be placed in normal classrooms because they required special education. (While accounts of Binet’s invention and its subsequent mutations can be found in almost every psychology text book, the one offered here draws from several sources: Brown, 1992; Gould, 1996; Lagemann, 2000; Sokal, 1990.) His approach to this problem would change the face of psychology and education, with ramifications affecting an incredible array of institutions and cultural practices. Importantly, Binet’s work came to be used and understood in ways that Binet himself would have strongly opposed, so it is worth looking more closely at the original procedures and ideas surrounding the birth of standardized testing. As will become clear, Binet intended his instrument for use as a part of justice-oriented testing practices, but his intention was lost in the enthusiasm for efficiency that dominated the contexts of its American importation.
Before Binet’s invention there were a wide array of competing approaches to psychological measurement, most of which involved commitments to a kind of faculty psychology tracing its lineage to phrenology and other physicalistic means of determining individual differences. As early as the 1880s the pioneering American psychologist James Cattell (who coined the term “mental test” in 1889) began importing the instruments and techniques of Francis Galton’s “anthropometry,” which consisted of standardized physical apparatuses for detecting minute differences of sensory and motor capabilities, such as “reaction time to sound” and “least noticeable difference in weight” (Sokal, 1990). The results of the various physical tests were often taken as a proxy for a variety of psychological faculties, such as perceptiveness and perseverance. Cattell was instrumental in putting together an exposition of the new science at the 1893 World Fair in Chicago. Thousands of individuals were tested, greatly increasing a growing public fascination with the standardized and scientific measurement of minds. But the popularity of this approach would be short lived, due in part to growing awareness of the work being conducted by Binet. Emerging criticisms focused on the limits of physiological measures as indices of meaningful psychological differences. A growing desire spread through the profession for “giving tests as psychological a character as possible” (James Mark Baldwin, quoted in Sokal, 1990, p. 35).
Binet’s tests offered just that. They made no use of complex physical apparatuses and involved linguistically mediated tasks that clearly elicited the so-called “higher mental processes.” When compared to the ‘mad-scientist’ laboratory of anthropometric testing instruments, Binet’s tests appeared much more similar to the examinations given in schools for centuries to determine the knowledge and skills possessed by students. Yet Binet’s tests were fundamentally different from traditional forms of academic evaluation. These differences revolve around the requirements of standardization and objectivity.
For one, he was not interested in “learned skills” such as reading and mathematics, nor was he interested in the knowledge associated with traditional academic subject matter. Instead he aimed to bring together a large series of seemingly everyday tasks, such as counting coins or determining which of four female faces were “prettier.” The tasks were thought to get at more general processes of reasoning. The idea was that mixing together a wide range of tasks would allow for an inference to the child’s general ability.
The various tasks were administered one-on-one by trained examiners in a sequence scaled by their order of difficulty. Each order of difficulty was assigned an age level, defined as the youngest age at which a child of normal intelligence should be able to complete the tasks. The child began with tasks for the youngest age and proceeded up the scale until they could no longer get them right. During the first decade of its use there was a variety of ways in which the results were quantified, but researchers eventually settled on a common method. The child’s “mental age” was indicated by the last task in the age-graded scale they could complete. Their “general intellectual level” was then determined by dividing this test-determined mental age by their actual chronological age (multiplying the result by 100 to eliminate the decimal point) and thus the intelligence quotient, or IQ, was invented.
Binet was interested in the degree of the discrepancy between a child’s mental age and his or her actual age. Knowing that a child’s mental age was greatly behind his or her chronological age allowed them to be identified as in need of special educational accommodations. Indeed, this was the only reason the test was invented. Binet consistently stressed the pragmatic and empirical nature of the scale and “consistently declined to award any theoretical interpretation to his scale of intelligence…. [He also] declined to define or speculate upon the meaning of the score he assigned to each child” (Gould, 1996, p. 180). He argued that:
intelligence is too complex to capture with a single number…. The scale properly speaking, does not permit the measure of intelligence, because intellectual qualities are not a single scalable thing like height…. We feel it is necessary to insist on this fact, because later, for the sake of simplicity of statement, we will speak of a child of 8 years having the intelligence of a child of 7 or 9; these expressions, if accepted arbitrarily, may give place to illusions.
(Binet, quoted in Gould, 1996, p. 181)
Even more important than Binet’s theoretical reservations about the interpretation of scores awarded by his test were his ethical and pedagogical concerns about its possible and preferable uses. In fact, Binet’s ideas about the use of his instrument touch on all three principles of just institutionalized measurement, and thus represent a profound alternative in educational measurement, the road not taken.
Binet understood the essential need for objectivity in an instrument designed to serve such critical institutional purposes and built the test accordingly. But he worried about the potential harm that could be done when the scores took on an institutional life of their own. Binet pleaded passionately on behalf of the learning-disabled and protested against the use of his tests in ways that stigmatized the child:
If we do nothing, if we don’t intervene actively and usefully, he [the learning-disabled child] will continue to lose time…. and will finally become discouraged. The situation is very serious for him, and since his is not an exceptional case (since children with defective comprehension are legion), we might say that it is a serious question for all of us and for all of society…. [Shame on those] teachers who are not interested in students who lack intelligence. They have neither sympathy nor respect for them, and their intemperate language leads them to say such things in their presence as ‘This is a child who will never amount to anything… he is poorly endowed… he is not intelligent at all.’ How often have I heard these imprudent words… Some recent thinkers seem to have given their moral support to these deplorable verdicts by affirming that an individual’s intelligence is a fixed quantity, a quantity that cannot be increased. We must protest and react against this brutal pessimism; we must try to demonstrate that it is founded upon nothing. (Binet, 1909, p.100-101)
Binet believed the tests should be used only as a means for helping the least well-off children in ways that were most relevant and beneficial to them. He even developed and implemented a program of “mental orthopedics” intended to supplement the use of the tests and aid children identified as needing special attention and guidance. Gould best summarizes Binet’s “three cardinal principles for the use of his tests… all of which were later disregarded by the American hereditarians who translated his scale into written form as a routine device for testing all children:
- The scores are a practical [objective] device; they do not buttress any theory of intellect. They do not define anything innate or permanent. We may not designate what they measure as “intelligence” or any other reified entity.
- The scale is a rough, empirical guide for identifying mildly retarded and learning-disabled children who need special help. It is not a device for ranking normal children.
- Whatever the cause of difficulty in children identified for help, emphasis shall be placed upon improvement through special training. Low scores shall not be used to mark children as innately incapable. (Gould, 1996, p. 185)
These were the principles intended to guide the use of the first scientifically refined standardized tests. They are fully congruent with the principles of just institutionalized measurement, positing each child’s right be objectively measured in ways that are both relevant and beneficial. IQ tests implemented in schools according to these principles would be insulated from co-optation as part of efficiency-oriented testing practices, such as those discussed immediately below, where whole student bodies were ranked and sorted so that resources could be funneled away from the least well-off and towards those with greater “innate” abilities. It bears repeating: this is the opposite of the test’s intended use. If Binet had had his way his tests would have been instruments used only to identify and help the least well-off, period. Moreover, having only this pragmatic use, with no accompanied theoretical meaning, the test score would not have served as an enduring label for the child, being best forgotten by child and teacher alike.
This last point—that the test score must not become a permanent label—is directly related to discussions in Chapter 3. Recall the potential for the terms of the education commodity proposition and efficiency-oriented testing to obscure how students understand themselves and are understood by others. The benefits of objectivity and quantification (and as discussed below, the benefits of mass administration and efficiency) must be weighed against the direct effect of testing on the social relations constituting the educational process. Testing practices, like other forms of measurement, set the terms of mutual understanding and facilitate coordinated interpersonal activities. The meaning of the test for the student, teacher, and administrator conditions their relationship; it establishes a shared sense of “what is the case.” Because of this inevitability stemming from the nature of testing, tests should be designed and used in ways that assure they do not create mutual understandings that are systematically distorted by the meaning of the test—as when, for example, the test scores are understood as predominately markers of “innate” or “inherited” differences.
Gould clarifies the issue in terms that reflect the difference between justice-oriented and efficiency-oriented testing:
The differences between strict hereditarians and their opponents is not, as some caricatures suggest, the belief that a child’s performance is all inborn or all a function environment and learning. I doubt that even the most committed antihereditarians have ever denied the existence of innate variation among children. The differences are more a matter of social policy and educational practice. Hereditarians view their measures of intelligence as markers of permanent, inborn limits. Children, so labeled, should be sorted, trained according to their inheritance and channeled into professions appropriate for their biology. Mental testing becomes a theory of limits. Antihereditarians, like Binet, test in order to identify and help. Without denying the evident fact that not all children, whatever their training, will enter the company of Newton and Einstein, they emphasize the power of creative education to increase the achievements of all children, often in extensive and unanticipated ways. Mental testing becomes a theory for enhancing potential through proper education. (Gould, 1996, p.183; emphasis added)
It is one of the great ironies in the history of testing that the man who invented the first and most widely used standardized test understood the social justice implications of his invention and articulated a vision to assure its appropriate use, only to be completely ignored by his most enthusiastic and ambitious followers. The individuals who so drastically repurposed Binet’s invention, all of them Americans, were convinced of an extremely consequential theoretical commitment, namely that what the IQ test measured was a fixed, inherited trait. Their idea was that a person’s intelligence is an inalterable inherited property of the mind that is best thought of as akin to a strictly biological trait, such as height, and that this trait could be measured objectively by an IQ test. This idea, which has supporters to this day (Herrnstein & Murray, 1994), was invented by American psychologists during a very specific historical epoch. Wedding this idea to Binet’s invention allowed for a much wider range of ostensibly valid theoretical and institutional applications and turned what was potentially an instrument of justice into an instrument of injustice. The idea that IQ tests measured an innate ability would have a massive impact on the shape of American education during the first half of the twentieth century, as the proliferation of scientific racism backed advances in the mass administration and institutionalization of standardized testing. These developments in institutionalized measurement would, to take a phrase from Condorcet (1785), “make nature herself an accomplice in the crime of political inequality.”