An excerpt from: Stein, Z. (2014). Tipping the scales: social justice and educational measurement. (doctoral dissertation). Harvard University Graduate School of Education. Cambridge, MA. [pdf]
Methods: reflective equilibrium, provisional justifiability, and the need to make sense of history
Ethical frameworks, like the theory of just educational measurement being built here, cannot be directly confirmed or disconfirmed in the way scientific hypotheses can. But there are systematic methods for building and justifying them. Rawls and others (Daniels, 1996) argue for a methodological approach to building and justifying ethical frameworks, which aims to make them suitable for guiding reform and policy. The components of ethical frameworks—principles, judgments, and empirical generalizations—must be explicated and then “tested” against the varied experiences and competing accounts already available on the topic. The ethical framework thus undergoes a process of iterative revision until it is brought into a state of broad reflective equilibrium (Rawls, 2001). This is a state of “provisional justifiability,” occurring when an ethical framework is shown upon reflection to be internally coherent, empirically tenable, and consistent with considered moral experience. Justifying an ethical framework thus requires demonstrating its ability to maintain a broad reflective equilibrium, its ability to handle a wide variety of particular cases while still maintaining its logical consistency, its ability to account for accepted facts, and its capacity to make sense of our most assured moral judgments.
To clarify, reasonable individuals typically work to achieve a narrow reflective equilibrium whenever a novel occurrence forces them to reconsider their beliefs, a process by which long held beliefs are opened to revision in light of new experience. For example, someone believing that standardized testing practices promote discipline, higher standards, and accountability is likely to revise this belief, or at least qualify it, when confronted with the details of large-scale faculty-organized cheating in urban school districts (discussed in Chapter 4). Revising this belief requires revising other beliefs that are related to it, but not necessarily letting go of commitments to discipline, standards, and accountability. On an individual level, reflective equilibrium is the process through which ethical reasoning leads to learning and conceptual change, as the integration of new experiences reshapes existing beliefs (Kohlberg, 1984; Habermas, 1990).
Philosophers work to achieve a broad reflective equilibrium. This is a process during which philosophical principles and judgments are systematically “tested” against the best of our knowledge and the various convictions and realities of our lived experience. For example, a philosophical principle that would exclude all standardized testing from educational practices (e.g., “categorizing students is unethical”) must be revised, or at least qualified, if it is to reflectively accommodate arguments and data concerning the use of diagnostics with special populations, the fair organization of large-scale social benefit programs, or the importance of advancing the learning sciences. Revising this principle would have ramifications throughout the problematized ethical framework, as demands for internal coherence and consistency set off a cascade of conceptual revisions. For an ethical framework to maintain a broad reflective equilibrium it must be in dynamic contact with, and learn from, a variety of potentially dis-equilibrating realities, difficult case studies, and provocative thought experiments. Philosophers have used this method to build and justify a variety of ethical frameworks, most recently for bioethics (Buchanan, Brock, Daniels, & Wikler, 2000), disability advocacy (Nussbaum, 2006), and international law (Hayden, 2002).
An ethical framework concerning educational assessment must be able to account for our considered judgments about a wide variety of testing practices. Therefore in Chapters 4 and 5 the ethical framework developed in Chapters 1 through 3 is used to analyze a set of exemplary case studies that are rich with ethical complexities: the IQ-testing practices in the 1920s, the national testing infrastructure built by the Educational Testing Service (ETS) in the 1950’s, and the ascendency of test-based accountability in the twenty-first century. Organizing the available historical materials in a consistent way, specifically around testing practices and their related justifications, scaffolds the systematic application (and evaluation) of the proposed ethical framework, showing it to be reflectively equilibrated, and thus demonstrating its provisional justifiability. These historical sections also provide an overview of some of the most important episodes in the history of testing in the US.
Justice as the view from everywhere
John Rawls’s (1971; 1996) philosophical methods also involve the use of complex representational devices, which can be thought of as structured models or thought experiments. More broadly, model-based reasoning in the sciences involves the deployment of a variety of “useful fictions” that simplify phenomena and exemplify the properties or processes of interest (Elgin, 1996). One of the most common kinds of models is the miniature, such as the scale-models used in engineering and systems biology that represent large structures or long timelines in ways that “shrink” them down so they can be seen at a glance. The rest of this introduction is just such a miniature; it aims to bring a large and complex work into view by shrinking it down and distilling its most important properties. This means stepping back from the particular arguments to gain a view of the whole; it is a map of the forest, whereas each chapter thereafter deals with individual trees.
The simplest way to miniaturize my overall argument is to consider the design of standardized testing infrastructures as if from behind a Rawlsian “veil of ignorance.” The “original position” is the central representational device deployed by Rawls. It is intended to clarify the objectivity and universality of the “moral point of view.” The original position is basically a set of decision-making constraints that support reasoning about the nature of justice; it simply asks us to consider the basic institutions of a society as if ignorant of our eventual place in them. The archetypal case is drafting a constitution without knowledge of whom or where you will be in the society it creates. Thus it would be irrational to draft a constitution supporting slavery or limiting voting rights to landowning males because there is no guarantee you would not end up enslaved or disenfranchised. Engaging Rawls’s thought experiment means that instead of viewing social structures from my perspective (i.e., that of a well-educated white male), I am forced to consider society from everyone else’s perspective as well (e.g., that of a woman, of a minority, etc.). A social system that can be viewed as reasonable from this meta-perspective is one that provides justice for all.
It is worth quoting Rawls (2001, pp. 14-17) at length summarizing the motives and design of his famous thought experiment:
We start with the organizing idea of [a just] society as a fair system of cooperation between free and equal persons. Immediately the question arises as to how the fair terms of cooperation are specified…. They [are to be] settled by an agreement reached by free and equal citizens engaged in cooperation, and made in view of what they regard as their reciprocal advantage, or good…. The difficulty then is this: we must specify a point of view from which a fair agreement between free and equal persons can be reached. This point of view must be removed from and not distorted by the particular features and circumstances of the existing basic structures [of society]. The “original position,” with the feature I have called the “veil of ignorance,” specifies this point of view. In the original position, the parties are not allowed to know the social positions or the particular comprehensive doctrines [worldviews] of the persons they represent. They also do not know persons’ race and ethic group, sex, or various native endowments such as strength and intelligence…. We express these limits on information figuratively by saying the parties are behind a veil of ignorance…. The significance of the original position lies in the fact that it is a device of representation or, alternatively, a thought-experiment for the purpose of public- and self-clarification….
This is not the place to get into the complexities surrounding the original position and its various formulations (see: Freedman, 2007). Rawls intended this thought experiment for use only in adjudicating between different philosophical principles of justice, and thus not for thinking about more specific social structures and institutions. However, for the purpose of miniaturizing the overall argument, the thought experiment serves as a valuable heuristic and allows us to cut directly to the chase.
The overarching themes of this work can be distilled into a single question: what kind of standardized testing infrastructure could be agreed to in the original position? This question is at the center of the theory of just educational measurement that is the overall focus of the work. Of course, there is much more to it than that. As will be explained shortly, a theory of just educational measurement requires related theories about the nature of institutionalized measurement in general, as well as a supplemental philosophy of education, and a system of distinctions concerning the dynamics of testing-intensive educational reforms, a system which would center on the difference between justice-oriented testing and efficiency-oriented testing. But before adding these details to the miniature being built here, it is worth exploring the simplest way of thinking about testing infrastructures and social justice.
Several issues are clarified immediately through the “ideal role-taking” exercise of imagining that one could end up anywhere in the systems affected by testing infrastructures. Key stakeholder groups emerge, each with their own institutionalized relationship to testing and each containing a range of individuals (from least well off to most well off). Students and their parents are one group, and their range can be viewed both in terms of socio-economic factors and in terms of learning abilities. Teachers are another group, again including a range of individuals who vary according to their skills and socio-economic positions. Then there are the administrators at various levels within the school system (from principle to superintendent) who are also differentially positioned. Policy makers and politicians constitute another group, as do psychometricians and others representing the interests of testing companies. There are of course other stakeholders (e.g., educational researchers, college admissions officers, etc.), but the heart of the argument resides with these main groups, including students and their parents, teachers, administrators, policy-makers, and test providers.
The most vulnerable individuals in the social structures created by testing are the least-well-off students and their teachers (e.g., learning-disabled students in an inner city school and their special education teacher); the least vulnerable are the politicians and those representing the interests of testing companies (e.g., Arne Duncan and Educational Testing Service executives). As in many cases where injustices occur, the most vulnerable—those who are potentially most seriously affected—are also the least empowered and the furthest away from influence over the systems that profoundly shape their lives. The task of building a just standardized testing infrastructure (as with any basic social structure viewed from the original position) requires taking as primary the perspectives of the least well off because this is the social position that is of greatest concern from behind the veil of ignorance (e.g., it is the place you would least like to end up in the system). Justice requires maximizing benefits to the least well off while maintaining overall fairness within the system.
Broadly speaking this means a testing infrastructure that awards those who are already advantaged while punishing those who are already disadvantaged is unjust. The distribution of benefits resulting from such a testing infrastructure simply further disadvantages the least well off. As discussed below, the history of testing from the early IQ-testing movement to recent policies supporting test-based accountability has mostly fit this unjust pattern of differential reward and punishment.
Social justice is also implicated at the level of test design and administration. For example, objectivity and standardization are required by justice—this is the moment of truth expressed by those who tout the social justice benefits of testing. Indeed, many of the most egregious cases of injustice due to testing have involved a pretense of objectivity that disguises the existence and impact of overt biases and errors in test design and scoring.
As discussed below, all individuals have a right to objective measurement. On the other hand, even truly objective tests that are used in high-stakes contexts or that focus on a narrow range of constructs and item formats, result in injustice. Beyond objectivity, individuals have a right to measures that are relevant and beneficial. Arguments from the original position suggest the plausibility of these metrological rights, but a full understanding requires bringing in the rest of the theory of just educational measurement.