What’s the best way to evaluate the performance of public school teachers? The answer just got more complicated — and for good reason.
Many factors — curriculum, leadership, class size — matter in schools, but high-quality instruction is critical to children’s learning, especially in high-poverty communities. In recognition, and to finally reform a resistant schooling system, the state Legislature and governor developed a statewide teacher evaluation system to hold educators accountable for their performance.
Though the details have changed substantially from year to year, there have been two animating principles: an effective teacher contributes to her students’ measured academic performance, and an effective teacher demonstrates professional practices in her classroom. And if a teacher is not deemed effective, she or he should be swiftly provided support to get better — and, if that intervention fails, be terminated.
But an Albany Supreme Court justice has just made it far harder for New York to use student test scores to assess teacher performance using mathematical formulas.
Last week, Acting Justice Roger McDonough ruled that the labeling of Long Island teacher Sheri Lederman as ineffective, based on the state’s 2013-14 statistical growth model, was “arbitrary and capricious” — and hence her rating was vacated and set aside.
This is thought to be the first time a judge has invalidated a teacher’s data-based rating, and it could have massive ramifications for the future.
What happened? Lederman, a much-admired teacher in Great Neck, got a state growth score of 14 out of 20 in the 2012-13 school year, during which 68% of her students met or exceeded state standards in both English Language Arts and mathematics.
The following year, she received a 1 out of 20, even though 61% of her students met or exceeded the English Language Arts standards, and 72% met the math standards.
(Disclosure: I wrote two affidavits for the case.)
In his ruling, McDonough cited evidence that the statistical method unfairly penalizes teachers with either very high-performing students or very low-performing students. He found that Lederman’s small class size made the growth model less reliable.
He found an inability of high-performing students to show the same growth using current tests as lower-performing students.
He was troubled by the state’s inability to explain the wide swing in Lederman’s score from year to year, even though her students performed at similar levels.
He was perplexed that the growth model rules define a fixed percentage of teachers as ineffective each year, regardless of whether student performance across the state rose or fell.
This wasn’t a final defeat for test score-based evaluations. Though he set aside Lederman’s growth score and rating, McDonough declined to issue a permanent injunction against the evaluation system. He reasoned that the old regulations have been superseded by a four-year moratorium on the use of state assessments in the growth model.
But the arguments that persuaded him in the case of Lederman will apply to a great many teachers in New York, and perhaps in other states and districts around the country. If a judge who listens carefully to both sides and takes the time to reflect before issuing a decision concludes that the state’s growth model of evaluation is arbitrary and capricious, is it any wonder that teachers, parents and citizens across the state do not view them as legitimate?
Reform champions should step back and acknowledge that, for the time being, we’re much better off using these systems for understanding teachers’ classroom practices, and how to improve them, than for firing those branded subpar.
Pallas is a professor of sociology and education at Teachers College, Columbia University.