社区黑料

Explore

What Education Can Learn from Major League Baseball

Hartney: Baseball has come up with an objective way to hold umpires accountable. Why can鈥檛 education leaders do the same for teachers?

Eamonn Fitzmaurice/社区黑料

Get stories like this delivered straight to your inbox. Sign up for 社区黑料 Newsletter

America鈥檚 pastime is back, but with a . This season, Major League Baseball has introduced an Automated Ball-Strike Challenge System that allows players to challenge umpire decisions in real time and overturn clearly wrong calls.

The change has been highly controversial. Purists see umpiring as more art than science and worry that technology strips the game of its human element, holding umpires to near-impossible . Supporters counter that ABS encourages data-driven decisions that improves fairness and accountability.

If this debate sounds familiar to folks in K-12 education it should. 

More than a decade ago, tried to fix a broken teacher evaluation system that to distinguish between high and low performers and rarely used measures of actual effectiveness in deploying, rewarding and retaining teacher talent. Most controversially, reformers embraced new technology known as 鈥渧alue-added鈥 measures of teacher effectiveness that were ultimately abandoned for reasons ranging from political resistance to usage.

In many ways, the ABS system for grading umpires offers a useful lens for revisiting what teacher evaluation reform got right, where it went wrong, and what reformers and critics missed. 

Five lessons stand out.

First, start with where value-added provided useful information鈥攁nd where it did not. These measures were never equally informative for all teachers. Instead, they were strongest at the extremes, where the signal is largest. They were also only available for a subset of teachers, since only some subjects and grade levels are tested. As Cory Koedel , value-added measures are most useful for identifying the highest- and lowest-performing teachers, while distinguishing among the middle is much more difficult. 

Baseball has built an evaluation system grounded in that insight. ABS is not trying to get every call right or perfectly rank umpires from first to worst. Rather, it is designed to catch the most obvious mistakes, identifying consistently poor umpiring. 

Consider embattled umpire The show he misses , including ones where the stakes are simply too high to ignore. That is what ABS is built to detect, and the same logic applies to teacher evaluation. Even if value-added measures could not perfectly rank teachers, they could identify clear cases where students are being shortchanged with consistently ineffective instructors. Reformers sometimes pushed these measures too far, but critics were too quick to dismiss information that, in some contexts, clearly meant something.

A second lesson comes from how unions responded to the reform moment. There are only two basic ways to evaluate performance: subjective judgment or objective measures. Value-added was an imperfect attempt to introduce more objective information into a system long dominated by subjective evaluation. Rejecting it without offering a meaningful alternative with real consequences for poor performance was not a defense of good evaluation. It was a rejection of evaluation altogether.

That shift is especially striking when contrasted with how unions approached the issue in baseball. MLB umpires are , and their association did not block ABS outright. Instead, the union agreed to a hybrid system that preserves their role while using the new data to ensure umpiring excellence. 

There was a time when teacher-union leaders spoke in similar terms. As once put it, unions should be willing to 鈥渋dentify excellence and not simply be concerned with protecting jobs and defending due process.鈥 But this mindset often failed to take root. After the National Education Association briefly showed some openness to evaluation reform in 2011, it course, holding that 鈥渟tandardized tests, even if deemed valid and reliable, may not be used to support any employment action against a teacher.鈥 Then NEA-president Lily Eskelsen Garc铆a went so far as to value-added as 鈥渢he mark of the devil.鈥

Some local unions have embraced peer review as an alternative: having teachers evaluate other teachers. In principle, this makes sense. A of Cincinnati鈥檚 evaluation system, which relied heavily on peer observation, found that teachers became more effective after being evaluated. But that is a different question from whether systems meaningfully differentiate performance or remove persistently low performers at scale. There, the evidence is less clear and evaluation systems designed to support improvement are not always well suited to making high-stakes personnel decisions.

Third, measurement only matters if it carries consequences. In baseball, it does. Umpires are graded using performance data and those evaluations directly affect playoff game assignments (bonuses). In other words, measurement is not symbolic, it is tied directly to outcomes. In education, that link is often missing. Many districts still rely on last-in, first-out rules that remove newer teachers first, regardless of their classroom effectiveness. Here again baseball offers a clear illustration of why that misses the mark. When analysts retiring veteran umpires with the younger group replacing them, the younger cohort was significantly more accurate.

Fourth, these debates remind us just how far actors with a vested interest will go to defend the status quo when reform threatens their career interests. Consider the  from veteran pitcher Walker Buehler that experienced players should receive a more generous strike zone. The logic is familiar. In education, similar arguments are made to insulate veteran teachers from performance-based decisions. In both cases, the argument is less about getting it right than about protecting incumbents.

Finally, this is about getting it right for the people the system is meant to serve. In baseball, the league operates under what is essentially a fiduciary obligation to the integrity of the game. That is, it has a duty to act in the best interests of the game itself, which includes striving for fairness and accuracy in how the game is played. As one recent put it, MLB鈥檚 governing rules and 鈥渂est interests of baseball鈥 authority create a fiduciary-like duty to minimize preventable errors that could undermine trust in outcomes. That obligation has real implications. It does not disappear because new technology is uncomfortable for employees or changes how the job is done. If better tools can reduce clear errors, ignoring them risks undermining the game itself. 

Education rarely operates this way. Student learning is often treated as one goal among many, balanced against . Courts have even declined to recognize a basic right to effective teaching, as in the 2014 ruling. But if schools exist for anything, they to educate students. That should be the north star. Evaluation systems are not about satisfying adults. They are about ensuring that students are not consistently shortchanged.

Baseball is showing that there is a better way. Imperfect tools can still improve decision-making when they are used where they are strongest. The question is not whether a measure captures everything. It is whether it helps us avoid the most obvious mistakes.

Did you use this article in your work?

We鈥檇 love to hear how 社区黑料鈥檚 reporting is helping educators, researchers, and policymakers.

Republish This Article

We want our stories to be shared as widely as possible 鈥 for free.

Please view 社区黑料's republishing terms.





On 社区黑料 Today