Last month, economists at Harvard and Columbia released the largest-ever study of teachers’ “value-added ” ratings—a controversial mathematical technique that measures a teacher's effectiveness by looking at the change in his students' standardized test scores from one year to the next, while controlling for student demographic traits poverty and race.
Raj Chetty, John Friedman, and Jonah Rockoff analyzed the test scores and family tax returns of 2.5 million Americans over a 20-year period, from 1989 to 2009. The team concluded  that students who have teachers with high value-added ratings are more likely to attend college and earn higher incomes, and are less likely to become pregnant teens.
In a rare instance of edu-wonk consensus, both friends  and critics  of standardized tests are praising the study as reliable and groundbreaking. Indeed, these findings raise several interesting questions about how to evaluate and pay teachers—one of the most controversial topics in American urban politics. In his annual state-of-the-city speech  last Wednesday, New York Mayor Mike Bloomberg cited the new research as he promised annual bonuses of up to $20,000 for teachers rated “highly-effective,” based partially on value-added measures and partially on principals’ judgments. In a move that befuddled many casual observers of the education debate, the New York City teachers’ union, the United Federation of Teachers, immediately opposed the proposal .
If we now know teacher effectiveness has a real, measurable impact on both student academic achievement and life outcomes like teen pregnancy, why aren’t teachers’ unions supporting plans to pay teachers with high value-added ratings more money? Pundits like Nick Kristof  and the Daily News editorial page  have jumped in to claim the new research justifies merit pay plans like Bloomberg’s, and the one instituted  by former chancellor Michelle Rhee in Washington, D.C.
The policy implications of the Chetty, Friedman, and Rockoff paper are, however, far from clear. As the researchers note  in their conclusion, their study was conducted in a low-stakes setting, one in which student test scores were used neither to evaluate nor pay teachers. In a little-noted footnote (#64) on page 50, the economists write:
even in the low-stakes regime we study, some teachers in the upper tail of the VA [value-added] distribution have test score impacts consistent with test manipulation. If such behavior becomes more prevalent when VA is actually used to evaluate teachers, the predictive content of VA as a measure of true teacher quality could be compromised.
The importance of this caveat cannot be overstated. As I’ve written in the past , there is evidence of increased teaching-to-the-test, curriculum-narrowing, and outright cheating nationwide since the implementation of No Child Left Behind, which put an unprecedented focus on the test scores of disadvantaged children.
Despite these concerns about testing, the United Federation of Teachers has agreed in principal to a new evaluation system that depends in part on value-added; a similar system , after all, is already in place for determining whether teachers earn tenure. Negotiations between the union and the city are stalled not because, in the words of the Daily News, the union has “placed protecting the jobs of incompetents over the future financial well-being of children,” but because the union would like teachers who receive an “unsatisfactory” rating under the new system to have the right to file an appeal to a neutral arbitrator. Currently, the city Department of Education determines whether to hear appeals of teacher evaluations, and it rejects 99.5 percent  of the appeals filed.
Given the widespread, non-ideological worries about the reliability of standardized test scores when they are used in high-stakes ways, it makes good sense for reform-minded teachers’ unions to embrace value-added as one measure of teacher effectiveness, while simultaneously pushing for teachers’ rights to a fair-minded appeals process. What’s more, just because we know that teachers with high value-added ratings are better for children, it doesn’t necessarily follow that we should pay such teachers more for good evaluation scores alone. Why not use value-added to help identify the most effective teachers, but then require these professionals to mentor their peers in order to earn higher pay? That’s the sort of teacher “career ladder” that has been so successful  in high-performing nations like South Korea and Finland, and that would guarantee that excellent teachers aren’t just reaching 25 students per year, but are truly sharing their expertise in a way that transforms entire schools and districts.