A recent pattern in Oklahoma education policy has been major education reforms passed in earlier years becoming highly controversial just as they are about to go into effect. A strong pushback from parents and educators has led to the rollback or modification of numerous reforms, from Common Core Standards to 3rd grade retention, A-F school grades, and end-of-instruction exams.
Another way to put it is that many of yesterday’s solutions have become today’s problems. Now another major reform is scheduled to be implemented next year, but lawmakers are working to head it off before this solution turns into the next problem.
Beginning in the 2015-2016 school year, current law says that all evaluations of teachers and administrators in Oklahoma public schools will be conducted using a “Teacher Leader Effectiveness” (TLE) model. Under the TLE, 50 percent of the evaluation score will be based on qualitative measures, such as classroom observations, and the other 50 percent will be based on student test scores. The qualitative portion of TLE is already being used, but next year would be the first time that test scores become part of the evaluation.
However, two bills working their way through the Legislature (HB 1290 and SB 706) would delay the quantitative evaluations until 2018-2019 and instruct the State Board of Education to continue studying how to implement this system. Former state Superintendent Janet Barresi had called for a two-year delay of TLE, and the State Board of Education under current Superintendent Joy Hofmeister has also recommended delay.
The “value-added” model for assessing teachers
Under current law, the quantitative portion of TLE would primarily use a “value-added” (VAM) model to evaluate education (VAM would be 35 percent of an educators’ total score, with another 15 percent from other quantitative achievement data and 50 percent from qualitative assessments). The value-added model tries to account for students’ different starting points and different challenges at home. A value-added score looks at the test score gain a student achieves over the course of the year and compares it to the gain by “peer” students — those with similar backgrounds and similar scores on previous tests.
Mathematica Policy Research, a private firm that has been contracted by the state to develop Oklahoma’s value-added assessments, describes it like this:
The basic approach of value-added models is to compare two test score averages for each teacher: (1) the average actual scores that the students obtained with the teacher and (2) the average estimated scores that the same students would have obtained with an average teacher. The difference in these two average scores—how the students actually performed with a teacher versus how they would have performed with the average Oklahoma teacher—represents a teacher’s value added to student achievement.
[pullquote]Teachers with smaller classes aren’t “more average” than other teachers, but the value-added score will make them that way just because they are harder to statistically quantify.”[/pullquote]The premise sounds simple, but in practice it becomes far more complicated. For example, students who spend time with multiple teachers would have their contribution to each teacher’s score weighted by the proportion of the time the student had with each teacher. Because a single pre-test may not fairly measure a student’s ability, the value-added score is adjusted based on test reliability data provided by the test developers. Because tests are not designed to be compared across school grades, student scores are translated into a common metric via a complicated, multi-step process. Because statistical significance is hard to achieve when teachers work with just a few students, a technique called “shrinkage” is used to adjust the scores of teachers with fewer students more towards the overall average.
In other words, each value-added score passes through a jungle of complex adjustments and statistical techniques before it shows up on a teacher’s assessment. Teachers with smaller classes aren’t “more average” than other teachers, but the value-added score will make them that way just because they are harder to statistically quantify.
What value-added leaves out
Other factors that may contribute to student achievement are even harder to account for in a value-added score, such as the classroom having air-conditioning and up to date textbooks, the availability of specialists like counselors, tutors, and social workers within a school, the effectiveness of principal leadership, or the fact that teachers may be intentionally paired with hard-to-teach students. Mathematica proposes to deal with this problem by ignoring it, citing that “experimental and quasi-experimental empirical studies suggest that these factors do not play a large role in determining teacher value added.”
However, other studies suggest that these factors can make a big difference. Value-added scores have also been shown to very unstable from year to year and between different classrooms of the same teacher. Since it’s unlikely that a teacher’s skills would vary so widely between two years or two classes in the same year, the evidence points to outside influences distorting the value-added measure.
Fundamentally, it is very difficult or impossible to boil down into a single number a phenomenon as varied and complex as quality teaching. As Oklahoma already struggles with a dire teacher shortage, we can’t afford to reduce teachers’ morale even more by evaluating their work using an arbitrary and unreliable formula. Test scores may be one piece of evidence used to evaluate educators, but over-reliance on the formula could drive even some of our best teachers away from the profession.