A recent pattern in Oklahoma education policy has been major education reforms passed in earlier years becoming highly controversial just as they are about to go into effect. A strong pushback from parents and educators has led to the rollback or modification of numerous reforms, from Common Core Standards to 3rd grade retention, A-F school grades, and end-of-instruction exams.

Another way to put it is that many of yesterday’s solutions have become today’s problems. Now another major reform is scheduled to be implemented next year, but lawmakers are working to head it off before this solution turns into the next problem.

Beginning in the 2015-2016 school year, current law says that all evaluations of teachers and administrators in Oklahoma public schools will be conducted using a “Teacher Leader Effectiveness” (TLE) model. Under the TLE, 50 percent of the evaluation score will be based on qualitative measures, such as classroom observations, and the other 50 percent will be based on student test scores. The qualitative portion of TLE is already being used, but next year would be the first time that test scores become part of the evaluation.

However, two bills working their way through the Legislature (HB 1290 and SB 706) would delay the quantitative evaluations until 2018-2019 and instruct the State Board of Education to continue studying how to implement this system. Former state Superintendent Janet Barresi had called for a two-year delay of TLE, and the State Board of Education under current Superintendent Joy Hofmeister has also recommended delay.

## The “value-added” model for assessing teachers

Under current law, the quantitative portion of TLE would primarily use a “value-added” (VAM) model to evaluate education (VAM would be 35 percent of an educators’ total score, with another 15 percent from other quantitative achievement data and 50 percent from qualitative assessments). The value-added model tries to account for students’ different starting points and different challenges at home. A value-added score looks at the test score gain a student achieves over the course of the year and compares it to the gain by “peer” students — those with similar backgrounds and similar scores on previous tests.

Mathematica Policy Research, a private firm that has been contracted by the state to develop Oklahoma’s value-added assessments, describes it like this:

The basic approach of value-added models is to compare two test score averages for each teacher: (1) the average actual scores that the students obtained with the teacher and (2) the average estimated scores that the same students would have obtained with an average teacher. The difference in these two average scores—how the students actually performed with a teacher versus how they would have performed with the average Oklahoma teacher—represents a teacher’s value added to student achievement.

[pullquote]Teachers with smaller classes aren’t “more average” than other teachers, but the value-added score will make them that way just because they are harder to statistically quantify.”[/pullquote]The premise sounds simple, but in practice it becomes far more complicated. For example, students who spend time with multiple teachers would have their contribution to each teacher’s score weighted by the proportion of the time the student had with each teacher. Because a single pre-test may not fairly measure a student’s ability, the value-added score is adjusted based on test reliability data provided by the test developers. Because tests are not designed to be compared across school grades, student scores are translated into a common metric via a complicated, multi-step process. Because statistical significance is hard to achieve when teachers work with just a few students, a technique called “shrinkage” is used to adjust the scores of teachers with fewer students more towards the overall average.

In other words, each value-added score passes through a jungle of complex adjustments and statistical techniques before it shows up on a teacher’s assessment. Teachers with smaller classes aren’t “more average” than other teachers, but the value-added score will make them that way just because they are harder to statistically quantify.

## What value-added leaves out

Other factors that may contribute to student achievement are even harder to account for in a value-added score, such as the classroom having air-conditioning and up to date textbooks, the availability of specialists like counselors, tutors, and social workers within a school, the effectiveness of principal leadership, or the fact that teachers may be intentionally paired with hard-to-teach students. Mathematica proposes to deal with this problem by ignoring it, citing that “experimental and quasi-experimental empirical studies suggest that these factors do not play a large role in determining teacher value added.”

However, other studies suggest that these factors can make a big difference. Value-added scores have also been shown to very unstable from year to year and between different classrooms of the same teacher. Since it’s unlikely that a teacher’s skills would vary so widely between two years or two classes in the same year, the evidence points to outside influences distorting the value-added measure.

Fundamentally, it is very difficult or impossible to boil down into a single number a phenomenon as varied and complex as quality teaching. As Oklahoma already struggles with a dire teacher shortage, we can’t afford to reduce teachers’ morale even more by evaluating their work using an arbitrary and unreliable formula. Test scores may be one piece of evidence used to evaluate educators, but over-reliance on the formula could drive even some of our best teachers away from the profession.

Great article.

When Mathematica says “experimental and quasi-experimental empirical studies suggest that these factors do not play a large role in determining teacher value added,” they are denying the reality of peer pressure because THEIR regressions can’t find it. Also, they ignore the question of how large is “large.” If they miss by 10% that’s great. Who would teach in an inner city school where you have a 10-30% chance PER YEAR of being misevaluated downward?

The more research we read on VAMs, the more problems we find. This is not surprising, since the statistical methods used capitalize on chance and rank teachers and schools despite non-significant differences in “value” added. This is merely a bastardization of an economic methodology that was designed to be used to measure how much value was added to a PRODUCT – a THING – as it want through each stage of the production process.

What we find is that the same teacher can be wonderful and horrible, depending on which class of students we look at. The VAM advocates are so desperate to have a metric, they ignore or rationalize irrational results. State teachers of the year have been shown to have horrible VAMs for one class or another — the methods are just not sound.

But we should have known that when, as soon as Sanders developed the first of these methods, he (1) kept them secret, so that they could not be scrutinized, and (2) sold them (and himself) to the highest bidders. Someone should check into how much money these companies are making for essentially selling an equation to each state — for millions of dollars each.

Thank you for this report!

I also want to point out that standardized tests are not designed to measure teachers; they are designed to measure a student’s academic performance at a particular moment in time. Therefore, (as every educator knows!) using these same tests to ‘evaluate’ teachers is neither reliable nor valid!!

Several years ago, Oklahoma dove headfirst into Florida’s (aka Jeb Bush’s) ‘reform’ practices without any existing data or research to back up the policies. Now, we’re reaping what we’ve sown. And like the article and others have pointed out – how could the implementation of VAM possibly help attract quality teachers to the field, much less attract them to high-needs schools???

The value added score sounds like a statistical expense that will not yield any better results than simply have the grades audited in each class, developing a trend that will be least costly on the tax payer than employing a for profit testing agency to complicate honestly a simple issue. Some students are higher performers, however that is not an excuse to “soften” statistically the performance in a classroom. If you have overloaded classrooms, common sense tells you that the students are not in an optimized learning environment, nor is the teacher able to optimize their teaching effectiveness. It becomes a grind of processing students through their education without true evaluations at any level. What happened 60 years ago? When less money was in the system, but greater results were achieved in academic performance. I mean I remember my math teacher “KNOWING” the math, able to teach it to different levels of students? Maybe everyone is looking in the wrong direction and should look at the value added model being incorporated into the requirements/efficacy of the Education major? Frankly money would be better spent with student, parent evaluations, tempered with peer and administrator evaluations. If a class average is failing, or deficient then something is wrong. Generally teachers who are weak instructors are given the stronger performing students. What is the value added model defining as the “average teacher”? How is it quantifying that base line? That seems like a highly subject variable. This is good money following a bad idea. Keep it simple, keep it out of the profit motivated testing companies and statisticians and get back to the fundamental concepts of business performance models. Physicians have surveys. Their employers do not spend their profit sector revenues on Value Added Model. What a Folly! No wonder govt toilet seats cost $600. Use what you have available to you for free; Your brain. Critical thinking is needed hear, not some voodoo statistical manipulations to achieve frankly any answer the client wants.

The HB 1290 and SB 706 will bring into law the sharper fact and evidence; Kids are not learning. From there you find the solution and the administration and all need to have a true collaborative partnership honed by parental/guardian and the school systems good relations.

NO MORE WASTING TAX PAYERS MONEY. Some ideas such as testing as an indicator is starving our knowledge base at best. The true evaluation is an audit of all grades in a particular class. This should be fairly easy to do as those grades are recorded and kept. When performance fails; Look at quarterly evaluations from the students. Get the classrooms smaller, give the teacher a voice in the curriculum and the speed at which often times TOO MUCH, is being shoved into the shoot. Common Sense.