It looks like my query went out far and wide. I have received responses from all over the country on teacher evaluations particularly as it pertains to the Washington State House and Senate bills regarding teacher evaluations based on test scores.

For information regarding these bills, please see Washington State Bill Proposals HB 2427/SB 6203 and HB 2451: Teacher evaluations based on test scores. Parents around the country respond with their own experiences and Part 2: Washington State Bill Proposals HB 2427/SB 6203 and HB 2451: Teacher evaluations based on test scores: The studies.

One of the responses that I received was from the Washington Education Association. They provided me with a link to the Washington’s Teacher/Principal Evaluation Pilot program, a PowerPoint of the program, and a short paper on the subject, A System of Evaluation. To follow is their response regarding teacher evaluations based on test scores.

Is Connecting State-wide Student Test Scores

with Teacher Evaluation an Educationally Sound Solution?

 Linking teacher evaluation with student achievement is a controversial idea in educational circles.  Part of the controversy arises when we explore if a teacher’s job security should be partially determined by how well their students learn?  Teachers recognize that their primary responsibility is to help students learn.  As a result, it’s not unrealistic to expect that a teacher should be able to demonstrate that students in their classroom have grown in both knowledge and skills.  Let’s be clear, teachers are not afraid of linking student achievement with part of the evaluation process, they will, however, become incensed if the achievement data is unreliable and not valid for the purposes of evaluation.

When most people think about this idea, they default to state-wide test scores for this purpose.  It’s not hard to understand why people tend to go in this direction when you have President Obama, Secretary of Education Arne Duncan,  and a multitude of other agencies making claim that this is the solution that will bring around improved teacher quality in our schools.  This wave of trendy conversation, fashionable opinion, and political posturing has taken hold across the nation.  Tethering state level assessment scores with teacher evaluation is now proposed by policy makers, administrators and other stakeholders on a daily basis.  It appears that the main reason for moving in this direction is not to develop an evaluation system that promotes improved instructional practice but rather a strategy to get rid of teachers.

We cannot deny that there exists a misinformed chorus somewhere chanting “identify the weak and throw them to the lions.”  But, we have to speak up and point out that this makes no more sense than setting the grading curve to ensure that the bottom 10% always fail. This doesn’t aid in teaching or learning; it merely builds a cannibalistic feeding frenzy for those perceived to always be near the bottom of the bell shaped curve.

The primary point of a good evaluation system is to develop human growth and capacity and motivate staff to improve the skills, knowledge and craft of good teaching and in turn, improved student learning.  Hiring and firing are only a very small part of the whole aim of evaluation in the realm of developing human capacity.  We are focusing 90% of our energy around 5% or less of the teaching staff.  Shouldn’t we develop an evaluation system that actually focuses on improving teacher quality?    Unfortunately, the initiative of connecting student test scores with teacher evaluation is being touted without reflective thought or a deep understanding of the inherent flaws.

We need to take a step back and ask ourselves if connecting state-wide assessment data with teacher evaluation in Washington State is an educationally sound solution.  We also need to ask if making this connection is even logistically possible given the current assessment system.

The answer to those two questions is a resounding no and here’s why:

Using state-wide test scores is not an educationally sound solution

  • Reliable and Valid Data: The Washington State Institute of Public Policy (an independent research center) conducted a study of Washington’s state assessment system and concluded that the strand level data was so unreliable that it shouldn’t be used for even large scale building level program decisions.  If test data shouldn’t be used for program level decisions, it shouldn’t be used for individual teacher evaluation.
  • Test Scores Show Patterns Over Time:  While 3-4 or more years of test scores might reliably show student growth or lost educational ground against a particular standard, a single year of scores – which often is suggested as the yardstick in teacher evaluations, is not a reliable measure of teacher effectiveness.
  • Calibration for Growth:  In order for 3-4 years of test scores to be viewed, the assessment itself needs to be calibrated so educators can compare one year to the next.  Washington’s assessment is not calibrated for this purpose.  In other words, the assessment is not designed to show student growth from year to year.  Until our state assessment is appropriately calibrated, using state test scores for this purpose is an invalid and a reckless use of data marked by defiant disregard to what is educationally sound.
  • Context Matters. Many factors influence student achievement and, therefore, affect student test scores. These include home support, school attendance, family income level, and parents’ level of education. The result is that teachers in wealthier communities are likely to “look” better because their students are likely to score higher on tests. But this is often more a measure of students’ home environments than of teachers’ instructional effectiveness.
  • Multiple points of Data:  Test scores do not speak for themselves. The judgment must take a variety of factors into consideration. Richard Rothstein’s latest book “Teachers, Performance Pay, and Accountability” examines the myth that quantitative measures are widely used for performance evaluation in the private sector and he finds that they are not. When educators designed a rigorous evaluation system in Montgomery County, Maryland, they specifically avoided the test scores appearing on a teacher’s evaluation. The scores must not speak for themselves because there are too many factors that impact the data other than the teacher, and too much that the teacher does to produce outcomes that are not reflected in the data.
  • Value Added Models:  Though value added systems are touted as a valid way to demonstrate student improvement, recent research has called into question whether these models can sufficiently generate causal estimates of teacher performance and accurate student growth.  Questions remain about how well they account for uneven student learning trajectories and the nonrandom assignment of students into classrooms. In practice, such measurements fluctuate too much from year to year, scores are neither precise nor accurate, they require testing in every grade, and they exceed the capacity of most districts to carry them out.

Using state-wide test scores is not a logistically possible solution

  • Limited Tested Grades:  The state level assessment is given in grades 3-8 and 10.  How is it logistically possible to connect state assessment scores with educators who teach kindergarten, first grade, second grade, ninth grade, eleventh grade, and twelfth grade?  How can state-level assessment results be attached to teachers who have never administered the assessment?
  • Limited Tested Subjects: The state assessment addresses selected subject areas and does not cover many areas in the school curriculum.  Reading, mathematics, science, and writing are the only subject areas assessed on the state-wide tests and even they are not consistently assessed in every grade.  For example, students in grade 3 are only tested in reading and math.  Science is only tested in 5th, 8th, and grade 10.  How do you assign test scores to educators who teach social studies, health, physical education, visual art, choir, band, orchestra, technology, career and technical education courses, etc?  What do you do with librarians, counselors, occupational therapists, physical therapists, certificated school nurses, etc?  Connecting state assessments to teacher evaluation is a disjointed, non systems approach to determine teacher effectiveness.
  • Over 70% of Teaching Staff are Not Connected to State Assessments:  Combining the grades where the state assessments are not tested and the non-tested subject areas, over 70% of certificated teachers are not even connected with state-wide test scores making a comprehensive system using state test scores for teacher evaluations logistically impossible.

Possible ramifications

  • Teaching to the Assessment – If student test scores on a single assessment becomes the basis for teacher evaluation then the test will become the major focus, crowding out broader learning to an even greater extent than it already has.
  • The Campbell Effect:  Generally speaking, the Campbell effect states that when test scores become the goal of the teaching process, they lose their value as indicators of educational status and distorts the educational process in undesirable ways.   In other words, the pressure to have students score well on a single test for teacher evaluation becomes so intense that it leads to perverse and unscrupulous practices including:
    • Cheating on the test by both students and teachers
    • Data manipulation (think about the fuzzy math from the previous state assessment reports)
    • Distorts education by narrowing the curriculum
    • Distorts education by teaching to the test

The Campbell effect has been demonstrated in public and private sectors demoralizing the workforce charged with carrying out the assessments.

  • Less Collaboration and More Competition:  If teachers are being evaluated based on state test scores, they are less likely to collaborate and help their colleagues.  More and more research is mounting that suggests that teacher collaboration is one of the best forms of professional development.  Why would we want an educational system that doesn’t promote collaboration?

 What are Some Possible Solutions?   

In designing a way to measure teacher effectiveness, it is important to keep a couple of things in mind. First, the goal of a new evaluation model is growth and improvement. Yes, good teacher evaluation systems should weed out the small number of teachers who shouldn’t be in the classroom but the main goal of evaluation is pointing out areas of strength and areas where teachers need improvement and then providing support so they can become more effective.  We all have things we can improve on and developing a system that promotes growth is an educationally sound solution to increasing teacher quality across the state.  It is supported by research and connected to good practice.

Second, a comprehensive teacher evaluation system includes multiple forms of data and information to determine teacher effectiveness.  Measures such as observations, artifacts, reflective practice, self-assessment goal setting, professional contributions, and the impact on student learning are some examples of those measures.   Multiple measures provide opportunities for triangulation of data and provide a much stronger holistic representation of teacher effectiveness.  Other considerations in a good teacher evaluation system may include:

  • A strong evaluator training system and professional development component.  Certifying evaluators may be an option to strengthen the process.
    • A comprehensive beginning teacher support program
    • A comprehensive mentoring system
    • Mentor release time for collaboration with mentees
    • Ongoing professional development focused on improving instructional practice and student learning
    • Focused professional development and training on the analysis and use of student achievement data to improve instruction
    • Transparent access to data
    • Multiple opportunities for classroom observations by administrators
    • Opportunities for teachers to observe instructional practice in other classes
    • Peer collaborative observation protocols focused on growth, improvement, and teacher quality
    • Collaboration time with colleagues and principal focused on instructional practice and student achievement
    • Release time for employees to observe mentor(s) and other accomplished colleagues
    • Clear standards used in the evaluation criteria
    • Consideration of a differentiated evaluation rubric that measures growth of teaching practices against teaching standards
    • Additional administrator support to help implement a growth oriented evaluation system.