NewsroomThe latest CSBA news, blog posts, publications, research and resources for members and the news media

Value added: Do new teacher evaluation methods make the grade?

By: Kristi Garrett Published: June 21, 2011

We used to know what a good teacher was—someone who got you hooked on reading, someone who listened to your personal crises and threw you a lifeline that kept you in school; someone who actually knew how to solve for x and showed you how to do it again and again.

But in a world accustomed to analyzing just about everything from nutritional values to Web statistics, it’s not surprising that legislators would demand ever-greater accountability to make sure they’re getting a good return on their educational investment. In that context, measuring a teacher’s effectiveness in quantifiable ways is a logical next step in a society driven by the SMART goals (specific, measurable, attainable, relevant and timely objectives) that pervade modern management.

The idea of using student performance on standardized tests to judge a teacher’s effectiveness really picked up steam after the Obama administration’s Race to the Top contest required states to ramp up teacher evaluation. In a wave of legislative action last year, nearly a dozen states took steps to require teacher evaluation systems to consider evidence of students’ academic growth.

What is ‘value-added’?

All over the country, school leaders are looking at so-called value-added ways of measuring teacher performance. Most methods take a student’s score on standardized tests from one year and, adjusting for various personal and school characteristics, predict a score for the next year. The teacher is then rated by how well students meet those predictions.

Using student test scores to judge teachers is, of course, fraught with trouble. Many teachers don’t like the idea of being held accountable for the many out-of-school factors that impact their students’ performance—issues like hunger, homelessness and learning disabilities. No matter how well those factors are controlled for, such teachers argue, there’s no perfect way to distill a teacher’s worth from a student’s test score.

So what’s wrong with the way teachers are evaluated now?

Why evaluations need to change

Consensus is growing that traditional teacher evaluations serve little useful purpose.

Typically, teachers are evaluated by a principal or other designee who visits their classroom and watches them teach for maybe up to an hour. The teacher is often uncomfortable being watched, the evaluator has no specific criteria to look for, and the only feedback most teachers get is a sign-off on an evaluation form. The chore is over for another year.

That serves no one well, say Stanford University researchers who identified several problems with current evaluation methods:

Most evaluations are conducted for compliance purposes and do not help the teacher learn how to be more effective.
The time principals have to observe teachers is extremely limited, and evaluations generally don’t delve into evidence that students are mastering the content.
The schedule for evaluations is based on local bargaining agreements and not on the needs of teachers.
Evaluators rarely use their observations to suggest professional development that would help the teacher.

This set of conditions, the researchers said in “A Quality Teacher in Every Classroom,” “further contributes to teachers’ views that evaluation is not about developing mastery of professional standards, but rather a routine designed to ensure that an administrator is performing his job.”

“The Widget Effect,” a report by the New Teacher Project, a national teacher-led organization focused on teacher recruitment and quality, found that performance was rarely used in decisions about hiring, pay, or tenure, yet it was always a factor in disciplinary actions. Overall, more than 90 percent of teachers got the highest ratings at evaluation time, even while excellence often went unrecognized and unrewarded. At the same time, novice teachers got little attention and professional development opportunities were few and far between.

Principles for better evaluations

Teachers unions generally agree with education policymakers about the principles of creating an effective teacher evaluation system. Proposals by the American Federation of Teachers, the California Teachers Association, the Center for Public Education, Stanford University’s National Board Resource Center, the Economic Policy Institute and others share similar objectives:

evaluations based on multiple measures
observations conducted regularly by trained evaluators
useful feedback linked to appropriate professional development
observations and assessments based on professional teaching standards with clear expectations
evaluations that consider all of the teacher’s contributions to student learning and outcomes
student learning as revealed in test scores is a factor

American Federation of Teachers President Randi Weingarten proposed a new system for overhauling teacher evaluation in an address to AFT’s Teacher Evaluation Conference last February.

“Way too often, teacher evaluations are superficial. They’re subjective,” she said then. “They miss a prime opportunity to improve teacher practice and, thereby, increase student learning. And that’s what it’s all about, isn’t it?”

She called for a comprehensive system to help teachers improve their practice and to evaluate them objectively.

“Such a system would help determine who is and who is not an effective teacher—something that neither drive-by nor test-score-driven evaluations do,” she said. “Our aim is to have a comprehensive, fair, transparent and expedient process that identifies, improves and—if necessary—removes ineffective teachers.”

The Economic Policy Institute, in a policy brief written by nearly a dozen noted education analysts including Linda Darling-Hammond, Diane Ravitch and Richard Rothstein, recommends that if student scores are used in an evaluation system, they should be just one of many measures, and that classroom observations should be based on professional teaching standards grounded in research on teaching and learning.

Those systems, EPI says, “use systematic observation protocols with well-developed, research-based criteria to examine teaching, including observations or videotapes of classroom practice, teacher interviews, and artifacts such as lesson plans, assignments, and samples of student work. Quite often, these approaches incorporate several ways of looking at student learning over time in relation to the teacher’s instruction.”

The National Comprehensive Center for Teacher Quality cautions that school leaders should be extremely careful when using value-added data in making high-stakes decisions about their teachers. They say that’s because test scores and other statistics are not enough to determine the impact of specific teaching practices, and it’s difficult to single out the effect of one teacher from the impact of their colleagues. The standardized tests used to provide the data were not designed for that purpose, and it’s tricky to create ironclad formulas and methodology for measuring teacher quality.

Existing models a good start

Several models already exist that could be helpful in developing a new teacher evaluation system: the California Standards for the Teaching Profession, the Performance Assessment for California Teachers, the Beginning Teacher Support and Assessment Program for those in their first few years, and National Board Certification for more experienced teachers.

“Structured performance assessments of teachers like those offered by the National Board for Professional Teaching Standards and the beginning teacher assessment systems in Connecticut and California have also been found to predict teacher’s effectiveness on value-added measures and to support teacher learning,” according to another EPI brief, “Problems with the Use of Student Test Scores to Evaluate Teachers.”

This summer, CSBA’s Task Force on Teacher and Administrator Evaluation began to study existing models of teacher and administrator evaluation and will recommend ways governance teams can help support more meaningful evaluations.

See “Do Pay Incentives Have Merit?”; California Schools, Summer 2006

CSBA’s Delegate Assembly also supports changing the state’s education policy to focus on teacher effectiveness as well as qualifications, and the body also supports efforts to use performance data in making placement and layoff decisions.

The Center for Public Education—an arm of the National School Boards Association—suggests several policy, technical and design questions for board members to ask when implementing a teacher evaluation system that includes the use of value-added data. (See “Building a Better Evaluation System” at www.centerforpubliceducation.org.)

California district ‘TAPs’ into national model

The Lucia Mar Unified School District, on California’s Central Coast, is the first in the state to implement a 10-year-old model called the System for Teacher and Student Advancement, also known as TAP.

A few years ago, Superintendent Jim Hogeboom was reading about performance-based compensation, then all the rage, and learned about TAP, which was developed by the Milken Foundation, known for recognizing outstanding teachers with cash awards.

“The more research I did into it, the more I got excited about not just the performance-based pay, but that they had a whole system that all tied together,” he says.

Evaluation is just one portion of the TAP system, which focuses on four areas: multiple career paths, ongoing professional development, classroom observation and feedback based on clear standards, and performance-based compensation.

“All the research shows that the single most important variable to affect student learning is the teacher in the classroom. Everyone knows that,” says Hogeboom. “What we haven’t done a good job of is supporting our teachers. How do we help our teachers get the feedback and coaching to be the best they can be? … Most teacher evaluation systems are a waste of time.”

Kristan Van Hook is senior vice president of public policy and development at the National Institute for Excellence in Teaching, which now runs TAP. The Milken Educator Awards offer recognition to top teachers, but the foundation started to see the need for a more comprehensive approach.

“What is it about a job that makes it a desirable job? The opportunity to interact with peers, the opportunity to grow professionally, the opportunity to be recognized for excellence,” she says. “TAP really grew out of that desire.”

Because the system is so all-encompassing, TAP requires three-quarters of a school’s teachers to approve it. Votes of 90 percent in favor are common—but not before teachers see the system in action.

What really helped convince his teachers, says Hogeboom, was visiting TAP schools in Arizona, Louisiana and Texas. The district used a planning grant to send 80 teachers and administrators to experience TAP for themselves.

“When they had a chance to sit in on those cluster meetings being led by master teachers,” he says, “and the teachers were saying, ‘Finally, we’re getting some support. Finally we’re getting weekly coaching’—that’s what sold TAP.”

Four to six times a year, TAP teachers are observed by trained evaluators from their school’s leadership team, which consists of the principal and mentor and master teachers. The evaluators score the teacher’s lessons on a scale of 1 to 5 on a comprehensive rubric with 19 elements that have proven to be good indicators of effective teaching. Evaluators get eight days of training on the rubric and how to implement it. A score of 3 represents good, solid teaching, with 5 being exemplary, producing student growth that’s well above the district average.

The evaluator follows up the classroom visit in a conference with the teacher to share specific commendations and suggestions for improvement.

The value-added teacher evaluation component of TAP counts the classroom observations and the teacher’s skills and responsibilities as 50 percent of the performance score, with the students’ achievement at 30 percent and schoolwide achievement growth at 20 percent. Teachers who don’t teach subjects that provide student test scores are judged by the schoolwide growth as 50 percent of their rating.

Because the evaluators are trained to be objective and follow a detailed and specific rubric, the score distribution across a school is a statistically reasonable bell curve. That contrasts with typical evaluation results wherein most teachers get the highest rating unless there are significant concerns, according to NIET.

The Lucia Mar district won a federal Teacher Incentive Fund grant of $7.2 million that will be used to support the master teachers and provide a fund for performance bonuses and stipends at six sites. A separate foundation grant will support the program at a seventh school. The grants also fund weekly “cluster” meetings led by the master and mentor teachers.
In several years, after the grant funding runs out, Hogeboom says, the district will need to negotiate a new contract with its teachers union that allows for performance-based compensation instead of—or perhaps as an optional alternative to—the standard step and column structure. Hopefully, teachers will be the ones to advocate for performance pay, he said, since typically 75 to 80 percent of TAP teachers get a bonus of some kind.

“If people feel comfortable that the criteria that you’re using to give teachers the bonuses is meaningful and worthwhile,” he says, “then I think they’re more willing to venture into performance-based compensation.”

That said, Hogeboom learned that bonuses did not drive teachers to try harder at the TAP schools they visited, a finding confirmed by TAP officials nationwide.

“That doesn’t mean teachers don’t want to be well compensated for doing a very difficult job,” Hogeboom says. “It just means that that’s not their primary motivation.”

Challenges of implementation

Because it’s so extensive, TAP is a pretty challenging reform, NIET’s Van Hook admits. “It does align many different pieces which need to be aligned, but it’s also challenging. Your compensation, your evaluation system, your professional development, your career ladder—there’s a lot of different moving pieces.”

Van Hook estimates the cost of implementing TAP at between $250 and $400 a student, depending on which resources the district can repurpose, such as creating the master teacher position.

Having the board’s support is vital, says Lucia Mar’s superintendent, so he sent three board members with the staff as they made site visits.

“They need to see it in action too,” Hogeboom says. “Our board has been probably the biggest fan of TAP from Day 1, and that’s been very helpful.”

Los Angeles USD and value-added

The governance team at the Los Angeles Unified School District put together a task force in the spring of 2009 to look into better ways to identify and support excellent teaching.

“The task force and our school board universally agreed that the current evaluation system was not accomplishing what we wanted it to,” says Noah Bookman, the program and policy adviser for the district’s Supporting All Employees Program. “Our current system did not help us identify those who were truly stellar performers … We wanted a system in which all teachers, all leaders, all the time, were focused on learning and developing their practice.”

To guide the work, the board issued a set of core principles calling for evaluations to include multiple measures, do a better job of differentiating teaching quality, seek evidence that students are learning, and provide useful feedback to teachers and administrators. The new ratings will also inform decisions about tenure, hiring and assignment, and will help parents make informed decisions about their child’s education.

For the value-added portion, the district ultimately adopted a model called Academic Growth over Time. Derived from LAUSD’s work with the University of Wisconsin’s Value-Added Research Center, AGT takes a multiple-measure, multiple-perspective approach, with an objective rubric for determining a teacher’s contribution to student learning.

“They are the go-to [people] for large systems like ours,” Bookman says.

Besides observing teachers regularly and considering student test scores, the new evaluations will rely on parent and student feedback and the teacher’s contributions to the school community. The results of the evaluation will help determine what types of support or training the teacher needs, as well as what recognition and bonuses might be appropriate.

AGT designers haven’t yet decided how much weight to assign to test scores, Bookman said. “Our belief is it would be inappropriate to judge a teacher’s effectiveness using only student outcomes data, but it would also be inappropriate not to use student outcomes data in the process. We sought to develop a model that, based on a lot of different information, helps us see how we’re doing as school teams and as individuals at taking students from Point A to Point B.”

LAUSD’s actual value-added formula is a perplexing string of doctorate-level math that attempts to control student test scores for race, ethnicity, mobility, English proficiency and special education status, among other factors, to determine the contribution their teacher made to the student’s achievement. Ideally, the analysis will identify groups of students who are progressing slower than the district average, allowing educators to identify less effective teaching practices and intervene so students can catch up.

“We envision every teacher participating in an ongoing development cycle that starts with the performance review process with multiple measures,” Bookman says, which “then involves the development of a personalized, individualized growth plan.”

The AGT model also provides a way to keep top teachers energized about their careers and provides a career path that does not require them to leave the classroom to be an administrator. Teachers with a track record of effective teaching can aspire to be model and mentor teachers who regularly coach and support their peers.

After piloting the new evaluation system this fall, the district would like to implement AGT in all 800 schools in 2012. In April the district began releasing the first results from the evaluation phase of the new process as part of their school report cards.

Next steps
California has yet to pass legislation requiring schools to implement some form of value-added teacher evaluation, other than removing obstacles to using test data in evaluations. The State Board of Education is studying the issue.

Ultimately, getting teachers to trust a district’s governance team that a new evaluation system will be implemented with fidelity is the biggest challenge, says Lucia Mar Superintendent Hogeboom.

“And if it includes performance-based compensation, the trust has to be even higher,” he says. “I think that’s the biggest hurdle—teachers trusting that the system is going to work.”

Kristi Garrett ( kgarrett@csba.org ) is a staff writer for California Schools.

NewsroomThe latest CSBA news, blog posts, publications, research and resources for members and the news media

Value added: Do new teacher evaluation methods make the grade?

Recommended reading