Not Just a Test

Not Just a Test

Why we must rethink the paradigm we use for judging human ability.


In the summer of 1966 the Office of Education published the now famous “Coleman Report,” which assessed the nation’s progress in achieving the school integration mandated by the 1954 Brown v. Board of Education decision. The report was based on a survey conducted by the Educational Testing Service under the direction of sociologist James Coleman, and it included a brief test of cognitive skills. As reported in Nicholas Lemann’s superb book on the history of ETS, The Big Test, these researchers hoped to find that the black-white gap in scores on this test would be smaller in the better-funded Northeastern schools than in the less-well-funded Southern schools. This would justify the use of federal funds to bolster underfunded public schools–an unprecedented policy at the time.

What they found, however, was a black-white score gap virtually as large in better-funded schools as in poorly funded schools. The immediate reaction was dramatic. Coleman dropped the idea of federal support for under-funded public schools, apparently presuming that it wouldn’t help much. The idea was later reinvigorated and passed into law as “Title I” funding for low-achieving, low-income schools. But this policy too seemed to lack a certain confidence. It targeted funds primarily at basic skills remediation, not at a broad, high-quality education for low-income students.

These events turned out to be a prophetic episode in a story that continues to this day. For perhaps the first time, the ETS survey revealed that the racial gap in test scores would be difficult to eliminate. But as important, the episode revealed a certain paradigm for using test scores in educational decision-making. With roots all the way back to the beginning of standardized testing in the early twentieth century, the paradigm is familiar: Based on tests taken early in life, lower-scoring people and groups get less educational attention, or more of a basic-skills education aimed at bringing them to minimal levels of competence, whereas higher-scoring people and groups get a richer education supported by more resources–better-trained teachers, more academically challenging curriculums, better opportunities, etc. The rationale for this “ability paradigm,” as I will call it, has always been a kind of meritocratic efficiency: maximizing the return on society’s investment by investing the most resources in those who, as indicated by test scores, have the ability needed to benefit from those resources.

But in the spirit of reflection occasioned by this anniversary of Brown, one might ask a difficult, two-part question: Has this paradigm all along been a major cause of the racial gap in test scores, and is it now, through this effect, a major remaining barrier to the full integration envisioned in Brown?

The paradigm has always involved a daunting set of assumptions: that there is a core intellectual ability; that there is a level of that ability that is indispensable to benefit from high-quality education; that the level of this ability that one has is fairly stable across the life span; that this ability can be accurately and reliably measured in people from virtually all backgrounds by a cognitive test in a single sitting at almost any time in a person’s development; and that, therefore, scores from these tests can be used to triage students efficiently into ability-appropriate educational tracks early in life. When you look at it, this is a lot to believe in. But like most assumptions, these beliefs are more implicit than explicit. We endorse them largely by using the paradigm they support.

Moreover, there are cultural frameworks that make these assumptions sensible to us. As the Lemann bookdescribes, James Conant Bryant, the president of Harvard University from 1933 to 1953, endorsed this paradigm and spearheaded the broad use of the Scholastic Aptitude Test as a means of making admissions to Harvard more inclusive and less dependent on family connections that otherwise dominated admissions in his day. He believed the SAT could find people with academic talent in unlikely places. It was to be an instrument of meritocracy. And it has had a democratic effect. I have many distinguished colleagues who, because they scored well on standardized tests early in life, got academic opportunities they would never have enjoyed based on family resources or connections.

In the past several decades, and especially in the Bush years, there has also been an intensifying ideology of school accountability: holding schools and their financing accountable to test scores. This movement usually stresses achievement tests, covering specific curriculums, over ability tests like the SAT. (Achievement–more than ability, perhaps–might be thought of as improvable through better education. Thus an emphasis on achievement might be expected to direct better education to lower-scoring students in order to bring up their scores. But the ability ethos still seems to shape people’s thinking about the kind of education that low-scoring students need.) These systems also divert high-quality resources and teachers away from low-scoring students in favor of remediation efforts, or efforts to enhance their test scores.

The sheer volume of students to be processed sustains the ability paradigm and its tests. The volume of applications to colleges, especially prestigious ones, has increased dramatically in recent years. Graduate and professional schools also confront large numbers of applications. And often grade averages and letters of recommendation do not discriminate well among applicants. In the context of this superheated competition, the hard number of a test score takes on more weight–even when it adds little information.

For these reasons, then, standardized tests have become a virtual truism in our society, the only means we can imagine for rationally meting out educational opportunity.

But what happens when this paradigm–especially its intensified use in the “standards” movement–meets the race gap in test scores that Coleman reported nearly forty years ago, and that every newspaper tells us persists to this day? The use of early-in-life test scores to make educational decisions, such as who gets into enriched reading groups and classes, who graduates from junior high school, who gets into college preparatory tracks in high school, who graduates from high school, etc., consistently channels African-Americans into a lower-grade education that sustains their lower test scores, alienates many of them from their education, contributes importantly to their high dropout rates and puts their lives on a course of restricted opportunity. When one looks upriver from the high dropout rates, the high incarceration rates, the high teenage-pregnancy rates and the high unemployment that disproportionately afflict African-American youth, one sees something systematic at work: an ability/testing paradigm that uses their early low scores to steer them (as well as many Latino and lower-income students more generally) into a low-expectation education as reliably as their parents and grandparents were steered into segregated schools. We have again set up two educational systems, and they are again separate but unequal.

Thus in assessing the nation’s progress toward the integrated society envisioned in Brown, the remaining barriers may be not so much lawsuits against busing as a paradigm we use for thinking about human ability.

In writing the recent decision that upheld the University of Michigan Law School’s use of affirmative action, Justice Sandra Day O’Connor said that the use of this policy in college admissions should end in twenty-five years. This is tantamount to saying that the race gap in test scores should end in twenty-five years. There is admirable optimism in this statement. But it is important to understand where the low test scores of African-Americans and other disenfranchised groups come from.

Consider, for example, what research shows about the experiences of African-Americans in school. They are more likely to go to poorly funded schools in run-down buildings, and more likely to be taught by uncertified and poorly trained teachers. Observational studies show that they often experience differential treatment even from well-intended teachers, such as being called on less in class and being invited less to special activities. They experience more corporal punishment and more frequent and longer suspensions from school for the same infractions as other students. They are more likely to be tracked into lower academic and special-education classes than other students. In junior high school and high school, they are likely to encounter an especially distracting peer group culture. They are counseled with lower expectations. They are more likely to go to schools with few or no Advanced Placement courses, and they are likely to have less access to test-prep courses and related tutorials. Much of this follows from their still living in substantially segregated communities with fewer resources. And much of this holds for middle-income African-Americans as well as for lower-income and working-class African-Americans. (Here it is important to stress that middle-income black families have 10 percent of the wealth of middle-income white families in the United States–reflecting differences in the intergenerational transfer of wealth and housing segregation’s depression of black home values. And wealth plays a significant role in a family’s educational decision-making.)

My own research and that of my colleagues reveals another identity-linked contextual pressure that affects especially the academic vanguard of African-American students: stereotype threat. Joshua Aronson and I first demonstrated the effect of this threat in a series of experiments conducted at Stanford University. One at a time, we gave academically strong black and white students a difficult section of the advanced Graduate Record Exam in literature. This situation alone, we reasoned, would be enough to put black students under a special pressure. For them, but not for white students, frustration on this test can confirm the negative stereotype about their group’s lower intellectual ability–or at least raise the possibility that they might be seen that way. And because these are strong students who care about performing well, the prospect of confirming or being seen to confirm this stereotype could be upsetting and distracting enough to undermine their performance right there on the test. It did. Under this pressure–a pressure that is likely to be present for black students on the difficult sections of any important academic test, during academic performances more generally and even during interpersonal interactions in an academic context–black students performed worse than white students, even though we had statistically matched the two groups of students in skills.

Yet consider what happened when we gave another set of black and white students the same exam. This time, we told them it was an instrument we used to study problem-solving but that it did not measure intellectual ability. This simple instruction changed the meaning of the testing situation for black students. No matter how they did on this test–defined this way–it would not reflect on their ability and thus could not confirm or disconfirm the negative stereotype about blacks. With the pressure off, black students’ performance went up to match that of statistically matched white students. This tells us that what depressed their performance when they understood the test to be a normal test of intelligence was stereotype threat.

Taking a step backward, one can see that our social identities–like being African-American–have contingencies attached to them, specific things that a person with that identity has to cope with in specific settings. Black students face a particularly daunting set of identity contingencies in school–from having teachers with less training to being treated with low expectations to stereotype threat. This makes the experience of school different for them than for some other groups, even when they are in the same classroom, with the same teacher and with the same pictures on the wall.

As a cause of lower black test scores, these contingencies are bad enough. But the ability paradigm expands their effect. Treating their lower scores as if they were caused by low ability rather than by these contingencies, this paradigm puts African-American students on a track that insures they will not get the education they need to rise to the level of the other students. It seals the test-score gap in place.

But the wisdom of the ability paradigm and its use of early test scores would also seem to depend on how good its tests are. If they do a good job of measuring abilities that are indispensable to later school and life functioning, and if they can predict the level of that functioning well, then this paradigm could have considerable value, its flaws in relation to minority and low-income students notwithstanding. But if they measure a difficult-to-specify set of abilities and predict future performance only weakly, that would be another story.

In this regard, several features of standardized tests are worth keeping in mind. They are not designed to test some known, agreed-upon conception of intelligence. They are built bottom-up, by empirically identifying a set of items that predict school performance, not top-down, from some understood conception of what human intelligence is. When an item turns out to predict, we don’t know exactly why. It could be measuring a critical cognitive ability, or it could be measuring something more incidental about a person that affects, for example, his or her cultural fit at the next level of schooling. Thus it is very difficult to know what abilities or skills ability tests test.

Nor do they predict particularly well. The SAT, for example, correlates .42 with freshman grades (it correlates considerably less well with subsequent grades and life outcomes). This means that it measures about 18 percent of the characteristics, whatever they are, that determine freshman grades. It also means that even large point differences on the test–say, 100 to 200 points–do not tell you much about underlying differences in skills, and have very little predictive value. As my colleague Jay Rosner and I have said before, standardized tests are to real school performance what free-throw shooting is to basketball playing–not unrelated, but capturing only a small set of relevant skills.

Most of the testing companies say all of this in their guidelines, much like the caveats that accompany drug advertisements on television. But our need for a system to allocate opportunity is such that, like people who need the advertised drug, we tend to ignore the caveats.

After a talk I gave recently, an African-American school administrator from a suburb of New York City cautioned me. He said the black-white test-score gap was the only leverage he had for focusing more of his district’s resources on minority students. I could see his point. He had been living with inattention to these problems until news of the gap hit the local newspapers. But I have a larger fear: that whatever funds he wins to address this gap will be applied through the ability paradigm. Minority students will not be given a richer, more compelling education. They will likely be given a skills-focused, remedial education that will itself become a contingency of their identity, virtually guaranteeing the persistence of the race gap long beyond the Sandra Day O’Connor deadline. And the process of downwardly constituting them in this way will be understood not as a problem in the system of allocating educational opportunity but as a perhaps natural outgrowth of group differences in ability measured early in life.

In the years leading up to the Brown decision, the challenges to achieving an integrated society were legal. Today they are educational: to loosen the grip of the ability paradigm on the academic fate of African-American, Latino and poorer students. To this end, I hazard a few recommendations:

1. As much as possible, replace the word “ability” in our educational lexicon with words like “skill level” and “educational readiness.” These terms say no more than we know, and thus keep us self-conscious about assuming more than we know.

2. When placements are made to accommodate lower skill levels, do not allow them to become life sentences. Provide clear curricular pathways to upward mobility and see to it that some students ascend that pathway as role models.

3. As much as possible, especially early in schooling, focus high-expectation, demanding and enriched schooling on lower-scoring students. Emphasize getting them to identify with and be excited about their schooling.

4. Discourage the use of ability and aptitude tests in favor of tests based on specific curriculums to which all students have access.

5. Develop and use multiple, low-stakes, cumulative, curriculum-based assessments rather than single-sitting, high-stakes tests–including for use in college, graduate school and professional school admissions.

6. Develop and use additional metrics for such signs of student readiness as motivation and desire, breadth of life experience, degree of experience in the relevant domain, work discipline, maturity, etc.

7. Much of the current test-score gap comes from high-scoring students’ use of supplementary education tutoring, after-school and weekend programs, test-prep courses, etc. Coalitions of school, church, community and civil rights organizations should develop and extend these “shadow” educational resources to minority and low-income communities–extending Head Start programs into supplementary school programs that serve older children, for example.

Major changes in society and in organizations happen when everyone starts working on the same thing. Then things tip. This was true of the Brown decision itself. It finally happened when lawyers, social scientists, judges and educators all came together to make it happen. To get rid of test-score gaps, the same coming together is necessary. So in this year of commemorating Brown, let us remember the resolve that brought it about.

Claude M. Steele, the Lucie Stern Professor in the Social Sciences at Stanford University, is the author, with Asa Hilliard III and Theresa Perry, of Young, Gifted, and Black: Promoting High Achievement Among African-American Students (Beacon).

Thank you for reading The Nation!

We hope you enjoyed the story you just read, just one of the many incisive, deeply reported articles we publish daily. Now more than ever, we need fearless journalism that moves the needle on important issues, uncovers malfeasance and corruption, and uplifts voices and perspectives that often go unheard in mainstream media.

Donate right now and help us hold the powerful accountable, shine a light on issues that would otherwise be swept under the rug, and build a more just and equitable future.

For nearly 160 years, The Nation has stood for truth, justice, and moral clarity. As a reader-supported publication, we are not beholden to the whims of advertisers or a corporate owner. But it does take financial resources to report on stories that may take weeks or months to investigate, thoroughly edit and fact-check articles, and get our stories to readers like you.

Donate today and stand with us for a better future. Thank you for being a supporter of independent journalism.

Thank you for your generosity.

Ad Policy