Minding Our Health: The Nudge, Part Two

Reposted with permission from Wrench in the Gears.

Center for Health Incentives

The topic of the hearing was reducing healthcare expenditures on chronic illness, which they claimed would amount of hundreds of billions of dollars in “savings.” Given the amount of money on the table, it seems clear this sector is ripe for outsourced, outcomes-based contracts that will deploy emerging technologies like health care wearables. Six measures of good health were identified during testimony: blood pressure, cholesterol level, body mass index, blood sugar, smoking status and either the ability to meet the physical requirements of your job (or on this one the Cleveland Clinic person said unmanaged stress.)

This piece expands upon my prior post about digital nudging and behavioral economics. Disruption in the healthcare industry mirrors the ed-tech takeover that is well underway in public education. If you explore the webpage for Catalyst, the innovation PR outlet for the New England Journal of Medicine (remember, social impact policy makers and many investors are based in Boston), you’ll notice the language being used to direct health care providers towards big-data, tech-centered solutions is eerily similar to the language being used on educators and school administrators.

The FCC’s “Connecting America: The National Broadband Plan” of 2010 outlined seven “national purposes” for broadband expansion. Healthcare and education were the first two topics covered in that report. Both chapters focus on “unlocking the value of data.” Who will the big winners be as we further digitize our lives? My assessment is the telecommunications industry and national security/police state will come out on top. Locally, Comcast and Verizon are key players with interests in both sectors.

Education and healthcare fall under the purview of Lamar Alexander’s Senate HELP (health, education, labor and pensions) Committee, so the similarities in tactics shouldn’t come as a surprise. In researching the $100 million federal Social Impact Partnerships Pay for Results Act (SIPPRA) launch I attended in Washington, DC last month, I noticed one of the Republican Senators who presented, Todd Young of Indiana, had attended the Booth School of Business MBA program at the University of Chicago. Recent Nobel Prize winner in behavioral economics Richard Thaler teaches there, and I was curious to see if Thaler’s thinking had influenced Young. Interactive version of Young’s map here.

I located C-SPAN coverage of a Senate hearing on healthy lifestyle choices, which Young participated in on October 19, 2017 (transcript follows). Lamar Alexander and ranking member Patty Murray, who inserted Pay for Success provisions into ESSA, chaired that hearing. Behavioral economics was discussed extensively. Young’s remarks start at timestamp 34:00.


The topic of the hearing was reducing healthcare expenditures on chronic illness, which they claimed would amount of hundreds of billions of dollars in “savings.” Given the amount of money on the table, it seems clear this sector is ripe for outsourced, outcomes-based contracts that will deploy emerging technologies like health care wearables. Six measures of good health were identified during testimony: blood pressure, cholesterol level, body mass index, blood sugar, smoking status and either the ability to meet the physical requirements of your job (or on this one the Cleveland Clinic person said unmanaged stress.)

The claim was that if an insured person met four of the six measures, saw a doctor regularly, and had their vaccinations up to date they would avoid chronic illness 80% of the time. Of course the conversation was entirely structured around individual “choice” rather than economic and racial systems that make it difficult for people to maintain a healthy lifestyle.

This neoliberal approach presumes people have free time for regular exercise, not considering they may be cobbling together several gigs to make ends meet. It presumes the availability of healthy food choices, when many black and brown communities are food deserts with limited access to fresh produce. It presumes the stress in people’s lives can be managed through medicalized interventions and does not address root causes of stress in communities steeped in trauma. It presumes ready access to a primary care physician in one’s community.

It is a gross simplification to push responsibility for chronic health conditions solely onto the individual, giving a free pass to social systems designed to harm large subsets of our communities. By adopting a data-driven approach to health outcomes, as would seem to be the case with the above six measures (check a box health), the federal government and health care systems appear to be setting health care consumers up to become vehicles for data generation in ways that are very much like what is happening to public education students forced to access instruction via digital devices. Imagine standards-based grading, but with health measures.

The people who provided testimony at the October 19 hearing included Steve Byrd, former CEO of Safeway, now at Byrd Health; Michael Roizen of the Cleveland Clinic; David Asch Director of the Wharton School’s Center for Health Care Innovation; and Jennifer Mathis of the Baselon Center for Mental Health Law and representing the Consortium for Citizens with Disabilities. Mathis was the only one who testified strongly on behalf of the rights of the insured to withhold personal information and was very concerned about the discriminatory nature of incentivized medical insurance programs, particular with regards to people with disabilities.

In his testimony, David Asch, director of the Center for Healthcare Innovation based in the University of Pennsylvania’s Wharton Business School, described effective designs for health incentive programs, noting that concerns about losing money were more effective from the insurer’s point of view that interest in receiving financial rewards. For that reason Asch said taking away money from someone should be considered before offering a reward. Asch also noted that effective programs included emotional engagement, frequent rewards (tweaked to people’s psychological foibles to they didn’t have to be too large), contests and social norming, including the use of public leader boards.

The date of the hearing is interesting, because right around the same time, public employees (including the teachers) of West Virginia were facing dramatic changes to their insurance plans. These changes included compulsory participation in Go365 an app-based health incentive program that imposed completion of intrusive surveys, wearing a fit bit (if you didn’t there was a $25 fee imposed each month), and meeting a certain step count per day. I include a transcription of testimony from one of these teachers, Brandon Wolford, given at this spring’s Labor Notes conference at the end of this post.

The incorporation of mHealth (mobile health) technologies is a key element of the healthcare disruption process. Increasingly, wearable technologies will transmit real-time data, surveilling the bodies of the insured. mHealth solutions are being built into healthcare protocols, so private investors will be able to track which treatments offer “high-value care.” The use of wearables and health apps also permits corporate health systems to insert digital “nudges” derived from calculated behavioral economic design, into the care provided, and monitor which patients comply, and which do not.

At the moment, the tech industry is working intently to integrate Blockchain technology and Internet of Things sensors like fit bits and health apps on smartphones. Many anticipate Blockchain will become a tool for securing IoT transmissions, enabling the creation of comprehensive and supposedly immutable health data logs, which could be key to mHealth expansion. Last summer the Medical Society of Delaware, a state that touts itself as a Blockchain innovator, announced a partnership with Symbiont, to develop healthcare records on Blockchain. Symbiont’s website claims it is the “market-leading smart contracts platform for institutional applications of Blockchain technology.” The company’s initial seed round of funding took place in 2014 with a second round raising an additional $15 million in May 2017 according to their Crunchbase profile.

The July/August 2018 issue of the Pennsylvania Gazette, the alumni magazine for the University of Pennsylvania, features Blockchain as its cover story, “Blockchain Fever.” The extensive article outlines use cases being considered for Blockchain deployment, including plans by a recent Wharton graduate to develop an application that would certify interactions between healthcare agencies and Medicare/Medicaid recipients for reimbursement. The University of Pennsylvania Health System is deep into innovative technologies. David Asch, director of Penn’s Center for Health Innovation, testified at the October 2017 hearing. The Penn Medicine integrated health system was created in 2001 by former UPenn president Judith Rodin in collaboration with Comcast Executive David Cohen. Rodin went on to head the Rockefeller Foundation, and in the years that followed the foundation spearheaded the creation of the Global Impact Investment Network. GIIN fostered growth of the social impact investing sector, at the same time healthcare began to transition away from a pay-for-service reimbursement towards a value-based model predicated on outcomes met.

Below is a relationship map showing the University of Pennsylvania’s involvement in “innovative” healthcare delivery, which I believe stems from Rodin and Cohen’s connections to Comcast. It is important to note that the Center for Health Innovations claims to have the first “nudge unit” embedded within a health system. Asch is an employee of Wharton, and Wharton is leading initiatives in people analytics, behavior change via tech, and Blockchain technologies. Interactive version of the map here.

New types of employer-based health insurance systems have started to emerge over the past six months. Based on this New York Times article, it seems employees of Amazon, JPMorgan and Berkshire Hathaway will have a front row seat as these technological manipulations unfold. Last fall Sidewalk Labs, the “smart cities” initiative of Alphabet (parent company of Google), announced an expansion into managed healthcare. City Block(read Blockchain) will tackle “urban health” and populations with “complex health needs.”

Reading between the lines, it appears Alphabet aims to use poor black and brown communities that have experienced generations of trauma as profit centers. Structural racism has created a massive build up of negative health outcomes over generations. Now, with innovative financial and technological infrastructures being rapidly put into place, these communities are highly vulnerable. Ever wonder why ACES (Adverse Childhood Experiences) has scores? I expect those numbers are about to be fed into predictive profiles guiding social investment impact metrics.

How convenient that the “smart city” solutions Sidewalk Labs is likely to promote will come with IoT sensors embedded in public spaces. How convenient that healthcare accelerators are developing emerging technologies to track patient compliance down to IoT enabled pill bottle caps; sensors that allow corporate and government interests to track a person’s actions with precision, while assessing their health metrics in excruciatingly profitable detail. Technology platforms are central to City Block’s healthcare program. Many services will take place online, including behavioral health interventions, with the aim of consolidating as much data as possible to build predictive profiles of individuals and facilitate the evaluation of impact investing deals.

Interesting aside, I have two friends who had emergency room visits at Jefferson Hospital this summer and were “seen” by doctors on a screen with an in-room facilitator wielding a camera for examination purposes. This is in a major East Coast city served by numerous research hospitals. Philadelphia is not Alaska. Where is that data going? Where were those doctors anyway?

As these surveillance technologies move full steam ahead, it would be wise for progressive voices invested in the “healthcare for all” conversation to begin considering strategies to address the serious ethical concerns surrounding wearable technologies, tele-health / tele-therapy, and value-based patient healthcare contracting. If guardrails are not put in place that guarantee humane delivery of care without data profiling, the medical establishment may very well be hijacked by global fin-tech interests.

As someone who values the essence of the platform put forward by Alexandria Ocasio Cortez, I worry supporters may not understand that several key elements of her platform have already been identified as growth sectors for Pay for Success. If public education, healthcare, housing and justice reform are channeled by global financial interests into outsourced-based contracts tied to Internet of Things tracking, we will end up in an even worse place than we are now. So, if you care about progressive causes, please, please get up to speed on these technological developments. You can be sure ALEC already has, and remember that Alibaba (Sesame Credit) joined in December. It’s not too much of a stretch to imagine patient rating systems regulating healthcare access down the road if we’re not careful.

Senator Todd Young was the first person to respond to witness testimony during the hearing, and his line of questioning revealed he is a strong advocate of Thaler’s “nudge” strategies. The “nudge” is a key feature of “what works” “Moneyball” government that deploys austerity to push outsourcing and data-driven “solutions” that embrace digital platforms that will gather the data required prove “impact” and reap financial returns. See this related post from fellow researcher Carolyn Leith “A Close Reading of Moneyball for Government and Why You Should Be Worried.”

Young asked David Asch of Wharton’s Center for Innovative Health, what employers could learn from behavioral economists? He also posed several specific suggestions that would scale such programs within the federal government namely: embedding units charged with experimenting with behavioral economics into federal government programs; developing a clearinghouse of best practices; and bringing in behavioral scientists into the Congressional Budget Office.

Asch, a doctor employed by the Wharton Business School, runs UPenn’s Center for Health Care Innovation created in 2011 to test and implement “new strategies to reimagine health care delivery for dramatically better VALUE and patient OUTCOMES” (emphasis added). The 28,000-foot facility houses simulation learning labs and an accelerator where research on use of “smart” hospital systems, social media, and emerging technologies in healthcare is conducted. The accelerator aims to rapidly prototype and scale “high impact solutions,” read Pay for Success.

Besides the Acceleration Lab, the Center also contains the Nudge Unit, which according to their website is the world’s first behavioral design team embedded within a health system. The goal of the unit is to “steer medical decision making towards HIGHER VALUE and improved patient outcomes (emphasis added).” Sample healthcare nudges include embedded prompts in digital platforms (for screenings), changing default settings (to generic prescriptions), framing information provided to clinicians (not sure what this means), and framing financial incentives as a loss.

This is longer than intended and hopefully provides some food for thought. This life datifying impact investing machine we are up against isn’t just coming for public education; it’s coming for ALL human service. We need to begin to understand the depth and breadth of this threat. I’m still mulling over a lot of this myself, and my knowledge base in healthcare is much shallower than my expertise in education. I’d love to hear what folks think in the comments or if you know of others writing on blockchain and IoT in medicine with a critical lens send me some links. Below are transcripts from West Virginia teacher Brandon Wolford about Go365 followed by the Senator Young / David Asch hearing exchange.

-Alison McDowell

Go365 Transcript

Brandon Wolford, West Virginia Teacher: When I first began teaching in 2012 the insurance, in my opinion, was excellent, because I had worked for one year in Kentucky and I had known that the premiums were, although they were being paid five to seven thousand more than we were, they still had to pay much more for their insurance. So it balanced out. However, after the first year or two I was there, that was when they started coming after us with the tax on our insurance. First of all the premiums, we started to see slight increases for one, and another was they started to enforce this “Healthy Tomorrows” policy.

So, the next thing you know, we get a paper in the mail that says, you know, you have go to the doctor by such and such a date. It must be reported. Your blood glucose levels must be at a certain amount. Your waist size must be a certain amount, and if it is not, if you don’t meet all of these stipulations then you get a $500 penalty on your out-of-pocket deductible. So, luckily for me, I eat everything I want, but I was healthy. My wife on the other hand, who eats much better than I do, salads at every meal, has high cholesterol, so she gets that $500 slapped on her just like that.

Okay so, that was how they started out. In the mean time, we have been filling these out for a year or two, and they keep saying you know you have to go back each year and be checked. And then comes the event that awoke the sleeping giants. The PEIA Board, which is the Public Employee Insurance Agency that represents the state of West Virginia, they, um it’s just a board of four to five individuals that are appointed by the governor, they are not elected. They have no one they answer to; they just come up with these things on their own.

So they come to us and they say we’re raising your premiums. This was somewhere between November and December of last year. We’re raising your premiums. You’re going to need to be enrolled in a program called Go365, which means that you have to wear a fit bit, as well record all of your steps. You have to check in with them, and it included private questions like how much sexual activity do you perform, and is it vigorous? All of these things that they wanted us to report on our personal lives, and that was all included. In addition to that we had to report all of those things, and if we refused to wear that fit bit and record all of our steps, or if we didn’t make our steps, we were going to be charged an additional $25 per month.

So, when everyone sees this along with the increased premiums, then they’ve also introduced a couple more bills to go along with that, because the PEIA Board, they have the final say. Whatever they do, it’s not voted upon by the legislature. It’s basically just law, once they decide it. But in the meantime our legislature was presenting these bills. We were currently on a plan of sixty, uh excuse me, eighty/twenty we were paying out of pocket. Well, they had proposed a bill that would double that and make us pay sixty/forty.

So, they presented that along with charter school bills and a couple of other things that were just direct attacks on us. We had been going by a process of seniority for several years; and they also introduced a bill to eliminate seniority to where it was up to the superintendent whether or not you got to stay in your position. It was up to the principal and regardless if you were there thirty years or you were there for your first or your second year…they were trying to tell us you know, it’s just up to your principal to decide. The superintendent decides. They don’t want you to go, you’ve been there for thirty years and you have a masters degree plus forty-five hours, you’re gone. It’s up to them. Your seniority no longer matters. So those things combined with the insurance is actually what got things going in our state.

Excerpted Testimony Healthy Lifestyle Choices, Senate HELP Committee 10/19/17

Lamar Alexander: We’ll now have a round of five-minute questions. We’ll start with Senator Young.

Senator Todd Young: Thank you Chairman. I’m very excited about this hearing, because I know a number of our witnesses have discussed in their testimonies behavioral economics and behavioral decision-making. I think it’s really important that we as policy makers incorporate how people really behave. Not according to an economist per se, or according to other policy experts, but based on observed behaviors. Often times we behave in ways that we don’t intend to. It leads us to results that we don’t want to end up in.

So, Mr. Asch, I’ll start with you, with your expertise in this area. You’ve indicated behavioral economics is being used to help doctors and patients make better decisions and you see opportunities for employers to help Americans change their behaviors in ways they want from tobacco mitigation to losing weight to managing blood pressure and you indicate those changes are much less likely to come from typical premium-based financial incentives and much more likely to come from approaches that reflect the underlying psychology of how people make decisions, encouraged by frequent rewards, emotional engagement, contests, and social acceptance and so forth. And you said in your verbal testimony you haven’t seen much of this new knowledge applied effectively by employers, but there’s no reason why it cannot be. So, my question for you sir is what might employers learn from behavioral economists. Just in summary fashion.

David Asch, Wharton Center for Health Care Innovation: Sure. Thank you senator. I think I’ll start by saying there is a misunderstanding often about behavioral economics and health. Many people believe that if you use financial incentives to change behavior you’re engaged in behavioral economics, and I would say no, that’s just economics. It becomes behavioral economic when you use an understanding of our little psychological foibles and pitfalls to sort of supercharge the incentives and make them more potent so that you don’t have to use incentives that are so large.

So I think that there are a variety of approaches that come from behavioral economics that can be applied in employment setting and elsewhere. I mentioned one, which is capitalizing on the notion that losses looms larger than gains, might be a new way to structure financial incentives in the employment setting in ways that might make it more potent and more palatable and easier for all employees to participate in programs to advance their health. The delivery of incentives more frequently for example. Or using contests or using certain kinds of social norming where it’s acceptable to show people on leader boards in contests and get people engaged in fun for their health. All of these are possibilities.

Senator Todd Young: Thank you very much. You really need to study these different phenomena individually. I think to have a sense of the growing body of work that is behavioral economics. Right, so we need the increased awareness, and I guess the education of many employers about some of these tics we have. That seems to be part of the answer. In fact, Richard Thaler who just won the Nobel Prize for his ground-breaking work in this area indicated that we as policy makers ought to have on a regular basis not just lawyers and economists at the tables where we’re drafting legislation, but ought to have a behavioral scientist as well.

And the UK, they have the Behavioral Insights Team. The United States, our previous administration, had a similar sort of team that did a number of experiments to figure out how policies would actually impact an individual’s health and wellness and a number of other things. Some of the ideas that I think we might incorporate into the government context, and tell me if any of these sort pop for you; if you think they make sense?

We need to continue to have a unit or units embedded within government that do a lot of these experiments. We need to have a clearinghouse of best practices that other employers included might draw on. This doesn’t have to be governmental, but it could certainly be. We on Capitol Hill might actually consider aside from having a Congressional budget office than an official budget office, we might have an entity or at least some presence within the CBO or individuals that understand how people would actually respond to given proposals. Do any or all of those make sense to you?

David Asch: Thank you for your remarks. Yes, I think they all make sense. And one of the lessons that I guess I have repeatedly learned is that seeming subtle differences in design can make a huge difference in how effective a program can be and how it is perceived and that will ultimately care about the impact of these programs. So, I am very much in favor in the use of these programs, but in addition, greater study of these programs, because I think we need an investment in the science that will help all of us in delivering these activities, not just in healthcare, but in other parts of society.

Senator Young: That makes sense. I am out of time. Thank you.



Step-by-step Privatization and Profit: ESSA Delivers Schools to Wall Street with a Bow on Top

Reposted with permission from Educationalchemy.


ESSA was designed to open the flood gates for neoliberal profiteers to not only profit from public educations services (I,e. tests or curriculum) but to completely own it…

Social impact bond projects are very definitely privatisation. PFI/PPP projects have effectively privatised the design, finance, construction and maintenance of much public infrastructure. Now social impact bond projects potentially privatise the design, finance, service delivery, management, monitoring and evaluation of early intervention and prevention policies.”

Step One- Curriculum: Common Core standards created one set of standards (modules) (originating from a global agenda circa 1985) For a full history of support for this outline click the link.

According to a promotional flyer created by the Bill and Melinda Gates Foundation:

“Education leaders have long talked about setting rigorous standards and allowing students more or less time as needed to demonstrate mastery of subjects and skills. This has been more a promise than a reality, but we believe it’s possible with the convergence of the Common Core State Standards, the work on new standards-based assessments, the development of new data systems, and the rapid growth of technology-enabled learning experiences.” 

So that…

Step Two-Testing: There can be one consistent numerical metric by which to measure student outcomes (PARCC)

So that…

Step Three- We can have modularized Competency Based Assessment: Instruction and ongoing testing can be delivered via technology ….

Competency-based education has been part of Achieve’s strategic plan for a few years, … states and national organizations that have made this topic a priority: Nellie Mae Education Foundation, iNACOL, Digital Learning Now, CCSSO and NGA.”

Pearson. “With competency-based education, institutions can help students complete credentials in less time, at lower cost.”

So that…

Step Four– We can have Pay for Success (or) Social Impact Bonds (evaluated for their “success” via the competency/outcomes based model) replace the funding infrastructure of public schools….

CTAC, the Boston-based Institute for Compensation Reform and Student Learning at the Community Training and Assistance Center partners with departments of education to develop and promote student learning outcomes (SLO’s). William Slotnik is executive director of CTAC. He advocates for VAM and merit pay schemes. “William Slotnik,… has argued that performance-based compensation tied directly to the educational mission of a school district can be a lever to transform schools.”

According the National Governors Association (NGA): “CBE can be a way for states to pay for the outcomes they want if supported by a funding formula that allocates dollars based on student learning, not simply time spent in a classroom or full-time equivalency” http://www.nga.org/files/live/sites/NGA/files/pdf/2015/1510ExpandingStudentSuccess.pdfm

ESSA was designed to open the flood gates for neoliberal profiteers to not only profit from public educations services (I,e. tests or curriculum) but to completely own it. See Fred Klonsky who concurs with Mercedes Schneider that “these bonds are an open door for the exploitation of children who do not score well on tests.” Social Impact Bonds have been criticized as a central piece of ESSA as noted by BATS: “‘Pay for Success’ from Every Student Succeeds Act as it is located in Title 1, Part D, Section 4108, page 485. Social Impact Bonds favor financial investors and NOT KIDS! In Title IV, A in the section titled Safety and Healthy Students, page 797, Social Impact Bonds are defined as ‘Pay for Success.’ Investors are paid off when a student IS NOT referred to special education. ”

The entire system of reforms over the last three decades have been a step by step sequence of actions designed to privatize public education as a for- profit enterprise of Wall Street investments.

Social impact bonds are a development in the mutation of privatization … The new emphasis on financialising and personalising services to create new pathways for the mutation of privatisation recognised that health, education and social services could not be sold off in the same way as state owned corporations. It ensured marketisation and privatisation were permanent and not dependent on outsourcing, which could be reversed by terminating or not renewing contracts (Whitfield, 2012a and 2012b).”

Again, the NGA: “In addition, leadership, promotion, and pay structures might look different in a CBE system that asks educators to take on new, specialized roles. Underpinning many current policies are labor contracts, which specify the educator’s role based on specified amounts of class time. Such policies would not only be unnecessary in a CBE system but would significantly impede the adoption of such a system.”

You dismantle labor unions on a global scale, which was, the goal of ALEC and the World Bank back when they began devising these policies. The following is an outline from the World Bank link on Global Education Reform,  summarizing what they think are key issues:

  1. Decentralization & School-Based Management Resource Kit
    Directions in Development: Decentralization Series

Financing Reform

  1. Vouchers
  2. Contracting
  3. Private Sector
  4. Charter Schools
  5. Privatization
  6. Private Delivery of Services

Teacher Reform

  1. On-line resources related to teacher career development
  2. Teacher Evaluation as part of Quality Assurance

Curriculum Reform

  1. Country Examples of Curriculum Reforms
  2. Accountability in Education
  3. Standard in Education

Does any of this sound familiar to you?

One report I found by Pauline Lipman (2012)  summarizes all of this quite nicely:

 “Under the Global Agreement on Trade in Services, all aspects of education and education services are subject to global trade. The result is the global marketing of schooling from primary school through higher education. Schools, education management organizations, tutoring services, teacher training, tests, curricula online classes, and franchises of branded universities are now part of a global education marketEducation markets are one facet of the neoliberal strategy to manage the structural crisis of capitalism by opening the public sector to capital accumulation. The roughly $2.5 trillion global market in education is a rich new arena for capital investment …and testing is a prominent mechanism to steer curriculum and instruction to meet these goals efficiently and effectively.”

The 2011 ALEC Annual Conference Substantive Agenda on Education shows their current interests:

“…the Task Force voted on several proposed bills and resolutions, with topics including: digital learning, the Common Core State Standards, charter schools, curriculum on free enterprise, taxpayers’ savings grants, amendments to the existing model legislation on higher education accountability, and a comprehensive bill that incorporates many components of the landmark school reforms Indiana passed this legislative session. Attendees will hear a presentation on the National Board for Professional Teaching Standards’ initiative to grow great schools, as well as one on innovations in higher education.”

According to one European white paper: “Philanthrocapitalism is the embedding of neoliberalism into the activities of foundations and trusts. It is a means of marketising and privatising social development aid in the global south. It has also been described as Philanthropic Colonialism … It’s what I would call ‘conscience laundering’ — feeling better about accumulating more than any one person could possibly need to live on by sprinkling a little around as an act of charity. But this just keeps the existing structure of inequality in place. The replacement of public finance and grants from public/foundations/trusts to community organisations, voluntary organisations and social enterprises with ‘social investment’, requiring a return on investment, means that all activities must be profitable. This will have a profound impact on the ability to regenerate to meet social and community needs. The merging of PPPs, impacting investing and philanthrocapitalism would be complete!”

-Morna McDermott

A look at the MAP test and Value Added Measures (VAM)

scrap the map5
Students and teachers at Ballard High School in Seattle

Beginning this fall, teachers who have been on this system for two school years prior to 2012-2013, and who teach tested subjects and grades, will receive a student growth rating based upon two assessments and a two year rolling average of student assessment data.”

– José Banda, Superintendent Seattle Public Schools

There has been a push underway to judge a teacher’s performance, or that of a principal, a school, a district or a superintendent based on student test scores.

Someone had the idea to determine how well a teacher or a school was faring based on how a student was doing on a standardized test over a certain period of time. That length of time has been decided upon from what I’ve seen in a rather haphazard manner from state to state.

The term “value added measures” or VAM is the term that is used when a student’s performance is measured over that particular length of time.

No other factors are included in this measure such as the socioeconomic situation of the student, family or health factors, or the academic status of  the student, for example if the student is an English Language Learner (ELL)   or has an Individualized Education Program (IEP).

That is my layman’s explanation of VAM and will help readers who are not steeped in statistics and mathematics to understand the following post.

What the superintendent in Seattle wants to do, as is the fashion these days thanks to Race to the Top, is determine a teacher’s performance based on VAM. This would then lead to determining a grade for a school, a principal and even the superintendent himself if he’s not careful.

The unfortunate effect is that a school grade can then determine if a school remains open, is closed or turned into a charter school.

With that introduction, I would like to share a response that I received from a math teacher in Seattle to a query about the MAP test and VAM and its use in Seattle.

To follow was their response:

Why we should Scrap the MAP

No experts anywhere in the field of K-12 educational assessment consider just two years of this type of data as valid.

The idea that the leaders of our LMO (Labor Management Organization, the Seattle Education Association) would agree to this just shows how little concern they have for science, mathematics, statistics, or teachers in classrooms.

Seattle Public Schools should not release the two year Value Added Measures (VAM) data on its teachers as planned.

“A study examining data from five separate school districts [over three years], found for example, that of teachers who scored in the bottom 20% of rankings in one year, only 20-30% had similar ratings the next year, while 25 – 45% of these teachers moved to the top part of the distribution, scoring well above average.” –

National Research Council, Board on Testing and Assessment (2009).

Report to the U.S. Department of Education.

Links to the full reports at The National Academy Press for source research publications and books:



The list below of peer-reviewed, academically sound research and reports on the use and abuse of VAM in K-12 is long and compelling.

We don’t understand how or why anyone whose job it is within a school system to collect and meaningfully apply teacher and student assessments to improve student learning is allowed to keep their job without ever doing the needed due diligence and to inform themselves about the core facts of their work. Absurd, really.

Virtually all the research on VAM as applied to teacher evaluation indicates that the planned Seattle Public School (SPS) action will seriously mislead the public – not inform them as apparently has been falsely assumed.

Economic Policy Institute:

“Analyses of VAM results show that they are often unstable across time, classes and tests; thus, test scores, even with the addition of VAM, are not accurate indicators of teacher effectiveness. Student test scores, even with VAM, cannot fully account for the wide range of factors that influence student learning, particularly the backgrounds of students, school supports and the effects of summer learning loss…. Furthermore, VAM does not take into account nonrandom sorting of teachers to students across schools and students to teachers within schools.”


Annenberg Institute: this is an excellent recent and major review of current principles and practices of VAM measure as relevant to K-12 educational reform.

“At least in the abstract, value-added assessment of teacher effectiveness has great potential to improve instruction and, ultimately, student achievement. However, the promise that value-added systems can provide such a precise, meaningful, and comprehensive picture is not supported by the data.”


The Kappan: PDK International

From “The Kappan,” the excellent magazine of PDK International (a must subscription for SPS board members and administrators in my view) is that after reviewing the critical problems with VAM… it does not abandon the idea improving teacher evaluations as part of the effort to improve K-12 education and instead presents practices that are more likely to actually accomplish those goals.

1. Value-added models of teacher effectiveness are inconsistent…

2. Teachers’ value-added performance is affected by the students assigned to them…

3. Value-Added Ratings Can’t Disentangle the Many Influences on Student Progress…


National Bureau of Economic Research: Student Storing and Bias in Value Added Estimation:

“The results here suggest that it is hazardous to interpret typical value added estimates as indicative of causal effects… assumptions yield large biases…. More evidence, from studies more directly targeted at the assumptions of value added modeling, is badly needed, as are richer VAMs that can account for real world assignments. In the meantime, causal claims will be tenuous at best.”


Test Score Ceiling Effects of Value Added Measures of School Quality

From: U. of California, U. of Missouri, and the American Statistical Association

This is a pure research that is often cited by experts but is not an easy read for a non-educator or lay person. Its critical findings are around test score ceilings and non-random populations of students (think Roosevelt vs Rainier Beach). This creates statistical problems and misconception when amalgamating or disaggregating student/teacher data from test scores.


The Problems with Value-Added Assessment – Diane Ravitch

With her perspective as an education historian this is a recent, thoughtful and fact based review of VAM use.

“I concluded that value-added assessment should not be used at all. Never. It has a wide margin of error. It is unstable. A teacher who is highly effective one year may get a different rating the next year depending on which students are assigned to his or her class. Ratings may differ if the tests differ. To the extent it is used, it will narrow the curriculum and promote teaching to tests. Teachers will be mislabeled and stigmatized. Many factors that influence student scores will not be counted at all.”


Research Calls Data-Driven Education Reforms into Question

Recent reports by National Academies, National Research Council and the National Center on Education and the Economy.

“Both organizations are respected for their high quality, comprehensive, and non-ideological research. Together, they reach the undeniable conclusion that today’s array of testing and choice fails to meet the instructional needs of American students and the national goal of broadly-based high academic achievement.”


Why Teacher Evaluation Shouldn’t Rest on Student Test Scores

FairTest.Org has a clearly stated agenda, but that does not discount this excellent list of the practical problems applying VAM (as currently used) to teacher evaluation and concludes with a list of solid, academically sound research references.



An excellent, unbiased resource on educational issues and the relevant research. The George Lucas Educational Foundation is dedicated to improving the K-12 learning process by documenting, disseminating, and advocating for innovative, replicable, and evidence-based strategies that prepare students to thrive in their future education, careers, and adult lives. Edutopia’s byline is, “What Works in Education.”

“Value-added modeling” is indeed all the rage in teacher evaluation: The Obama administration supports it, and the Los Angeles Times used it to grade more than 6,000 California teachers in a controversial project. States are changing laws in order to make standardized tests an important part of teacher evaluation. Unfortunately, this rush is being done without evidence that it works well. “



And this letter from a Seattle Public Schools parent to our superintendent:

Dear Superintendent Banda,

I am writing to express my support for the teachers boycotting the MAP
test and to urge you to take NO disciplinary action against them.

I have been reflecting on standardized test scores and the uses to which
they have been put. At the same time as these kids are boycotting, I
have received a response to my advanced learning applications for my
kids. I have several years of experience with attempting to test my two
children into the program at the same time – in the hopes that they could
attend the same program at the same school.

You may be pleased to hear that they have been successful. One actually
was rejected, but upon appeal she will qualify based on her spring MAP

Although my children are quite bright and do require some kind of
ALO, the success was based just as much on LUCK. Why
do I say this? Well, for my two children to test into advanced learning,
both of them had to meet a threshold for four standardized tests (two
Cog-AT and two MAP tests). My son is receiving special education
services for a processing delay – he works slowly. So this year’s Cog-AT
scores would have disqualified him for both Spectrum and APP.
Last year, though, he qualified for APP. Why the difference? He’s the
same kid!!! If anything, he’s smarter this year, on account of the
excellent teachers he has.

Meanwhile, my daughter was disqualified despite having Cog-AT scores
in the 98th and 99th percentile, because her spring MAP for reading is in
the 75th percentile. But her more recent winter score is higher so I have
been assured that she will get in based on that.

Once again, this is the same kid. Two wildly different scores – which is
of course the norm for the MAP test.

I’m lucky she didn’t have a bad day for this winter test, aren’t I?

But education shouldn’t be based on luck. Teacher evaluations shouldn’t
be based on luck.

Now, a big community of parents, teachers, and students have been saying
this all along. The district hasn’t listened, and that’s why we don’t trust
your task force. Deal justly with the teachers, and we’ll listen then.

Editor’s note:


Dora Taylor

Part 1: High stakes testing : A little history

I will be posting a series on high stakes testing and opting out on this blog. Why? In other states where high stakes testing has become the norm, parents and educators are discovering that what is happening is harmful to their children and to education in general and they are pushing back by opting out.

Unfortunately, high stakes testing has officially begun in Seattle.

It started with the following memo from Seattle Public Schools Superintendent Jose Banda issued this school year:

“Beginning this fall, teachers who have been on this system for two school years prior to 2012-2013, and who teach tested subjects and grades, will receive a student growth rating based upon two assessments and a two year rolling average of student assessment data.”

– José Banda, Superintendent Seattle Public Schools

To follow is the first in a series of posts describing what high stakes testing is, how it began, the ramifications of this type of testing and  how it will affect every student in our Seattle Public School system.

High Stakes Testing: A Little History

– Dora Taylor

“…I discovered then, in my early teaching career, that learning is best driven by ideas, challenges, experiences, and activities that engage students. My experience over the past 45 years has confirmed this.

We have come far from that time in the ’60s. Now the mantra is high expectations and high standards. Yet, with all that zeal to produce measurable learning outcomes we have lost sight of the essential motivations to learn that moved my students. Recently I asked a number of elementary school students what they were learning about and the reactions were consistently, “We are learning how to do good on the tests.” They did not say they were learning to read.

It is hard for me to understand how educators can claim that they are creating high standards when the substance and content of learning is reduced to the mechanical task of getting a correct answer on a manufactured test.”

-An excerpt from An Open Letter to Arne Duncan by Herb Kohl.

As this quote by Herb Kohl denotes, the emphasis on testing has created a culture of test takers and test givers in our public schools. This focus on testing leads to unnecessary pressure on students to perform well on a standardized test. It causes a teacher to emphasize a narrow scope of material. It leaves little opportunity for a student to develop their creative and critical thinking skills and it is used incorrectly as a tool in the evaluation of a teacher, principal and a school with serious consequences to the student and their community.

My experience as a parent and a teacher 

The results of high stakes testing did not become apparent to me until my daughter and I moved to another state and a new school.

From pre-school through 7th grade, my daughter attended a small private school. The emphasis at the school was on understanding there were different paths to solving a problem. This approach included math.

My daughter excelled in math. She found it fun and challenging and had accelerated to discussing areas of physics and astronomy with her teachers. The staff appreciated her interest and enjoyed spending time talking to her on related subjects.

There were no standardized tests given, just the tests that the teachers developed based on the material provided in the class.

Then we moved. I selected a public school based on state test scores. My reasoning was that if the overall test scores were excellent, the teaching staff and school must be excellent as well.

My daughter started school, ready to take on all her subjects with confidence and a joy of learning that she had developed over her years at the previous school.

Then she started to have trouble with math. She found it boring and seemed to be struggling with it. She would come home and tell me that the other students knew how to take a test but she didn’t. Her joy of exploring math began to diminish. She had learned that there were different ways to solve a problem but at her new school, there was only one way. Instead of discussing String Theory she was learning baseball scores. Apparently the teacher thought that the class could better relate to how Mariners’ Ichiro was doing in terms of stats than other concepts related to math. My daughter, having no interest in baseball, and  therefore no interest in the information that the teacher was providing.

The Vice Principal intimated to me at a PTA meeting that with the No Child Left Behind Act, there was little room for experimentation or deviating from the prescribed curriculum. The school needed to perform at a high level in terms of test scores, that was expected by the Federal Government and by the parents.

In other words, the math and science material had been dumbed down to match the simplified questions that would be on the test. This is called ‘Teaching to the test”. My daughter and I would have none of that.

Fortunately, we discovered a progressive alternative school in Seattle, Nova High School. The extraordinary principal and staff once again made her comfortable with her own intellect and methods of exploration. There she thrived. The emphasis at Nova is not on test scores but on student oriented project-based learning.

My other experience has been as a teacher introducing architecture to students in grades 3 through high school.

What I discovered over the years is that students who are inculcated in test taking expect me to provide for them answers to solve a design challenge. They think that there is one way to approach a problem. During my time with these students, my goal is to help them understand that there are as many ways to develop a solution as there are people in this world.

In general, I can guess what type of school a student has been a part of, whether it is test oriented, alternative, private or British. When a child is raised understanding that a problem can be solved in more than one way and that many times there is more than one correct answer to a question, there is also a joy in creating, building and problem solving.

 What is high stakes testing?

In 1965, President Lyndon B. Johnson signed into Federal law the Elementary and Secondary Education Act (ESEA) as part of the “War on Poverty” program. This bill ensured that children in poverty would receive additional funding for their school programs. The funding allocated was to include the professional development of teachers, class materials, and support for parent involvement. This federal money is referred to as “Title 1” funding. (Note: Part of the original ESEA agreement ensured there would not be a national curriculum decided by the Federal government, but rather that each state would determine its own curriculum.)  ESEA was to continue for five years, but Congress has reauthorized the bill every five years and each time it is reauthorized, members of Congress, along with the President, have made changes to the bill.

Originally the ESEA was based on need and not on test score results. NCLB changed that.

In 2001, during George W. Bush’s administration, Congress made the first significant changes to ESEA and renamed it the No Child Left Behind Act (NCLB). Every child in a public school that received Federal funding was tested every year from grade 3 through grade 8. Each state would determine what actions to take if a school was deemed “failing”. These schools judged as “low performing” would receive additional assistance from the Federal government and students who attended chronically “low performing” schools would be able to transfer to another school.

Under the NCLB Act, all public schools in all states are to reach 100% “proficiency” in reading and math by 2014. If a school does not meet its Academic Yearly Performance (AYP) goals based on test scores, a series of steps are taken.

This process begins with the school being placed on notice and ends with the restructuring of the school. Restructuring can be either converting the school to a charter school, firing the principal and staff or turning the school over to the state.

The No Child Left Behind Act was never fully funded by the Federal government. School districts have had to take on the cost of creating the curriculum and pay for the development and implementation of the required standardized tests. These costs can rise to millions of dollars.

The Secretary of Education, Arne Duncan, introduced in 2012 A Blueprint for Reform. It would reauthorize ESEA. Right now the bill is in the House Committee on Education and the Workforce and has been amended to read Student Success Act.

Because Congress has failed to reach an agreement on the reauthorization of ESEA, the Obama administration is now offering a waiver to the requirements of NCLB. The requirements for this waiver include agreeing to accept the Common Core State Standards, a national curriculum in math and English. This is illegal according to the original ESEA. States also must agree to evaluate teachers and principals based in large part on test scores.

With this waiver, tests and testing become even more significant in the life of a student and teacher. This focus on testing is referred to as high stakes testing because so much is at stake: a teacher’s career, the tenure of a principal, and even the life of a school or community.

This waiver can be expensive as well. The California State Department of Instruction determined that it would cost $2.5 B to $3.1 B to comply with the requirements to receive the NCLB waiver. California decided not to apply.

When President Obama was sworn into office, his Secretary of Education, Arne Duncan, unveiled the Race to the Top (RTTT) program. The program included financial incentives for states if they agreed to particular requirements which were similar to the NCLB waivers. The requirements included providing alternative routes for teacher certification as well as evaluating teacher and principal “effectiveness” and merit pay based on test scores.

Arne Duncan’s Race to the Top program also requires the “lowest achieving schools,” defined as schools that 1) were in the bottom 5% of student performance based on test scores and 2) were receiving Title 1 funds, to agree to undergo an “intervention.” Such an intervention could include one of the following:

  • Closing the school.
  • Firing the principal.
  • Firing half of the teaching staff.
  • Closing the school and converting it into a charter school.

Additionally, the Race to the Top program requires the state to remove the cap on charter schools and adopt K-12 common core national standards.

This program was a competition between states with the winners taking a portion of the $4.35 billion appropriated for the Race to the Top program. With all states feeling the financial pinch, 40 states and the District of Columbia submitted proposals for Race to the Top funding. Twelve states were awarded a portion of the $4.35 billion based on their description of how they would meet the Race to the Top requirements. All states that submitted their plans were to follow through with their programs whether they received Race to the Top funding or not. For some states, the funding they received was not enough to cover the costs of developing and implementing both the new curriculum and the testing that was to be designed to accompany the Common Core Standards.

With this emphasis on test results, whether by way of Race to the Top or the No Child Left Behind waiver, the scope of instruction has narrowed to math and reading, and within the tight confines of the Common Core Standards.

Part 2: High stakes testing and opting out: The Types of Tests


Video: The nonsense of Value Added Measures (VAM)

A peer-reviewed study on VAM: The SAS Education Value-Added Assessment System (SAS® EVAAS®) in the Houston Independent School District (HISD): Intended and Unintended Consequences. Education Policy Analysis Archives, 20(12)

The introduction:

The SAS Educational Value-Added Assessment System (SAS® EVAAS®) is the most widely used value-added system in the country. It is also self-proclaimed as “the most robust and reliable” system available, with its greatest benefit to help educators improve their teaching practices. This study critically examined the effects of SAS® EVAAS® as experienced by teachers, in one of the largest, high-needs urban school districts in the nation — the Houston Independent School District (HISD). Using a multiple methods approach, this study critically analyzed retrospective quantitative and qualitative data to better comprehend and understand the evidence collected from four teachers whose contracts were not renewed in the summer of 2011, in part given their low SAS® EVAAS® scores. This study also suggests some intended and unintended effects that seem to be occurring as a result of SAS® EVAAS® implementation in HISD. In addition to issues with reliability, bias, teacher attribution, and validity, high-stakes use of SAS® EVAAS® in this district seems to be exacerbating unintended effects.

The video:


Value Added Measures: What they are and what they’re not

Value Added Measures, also referred to as VAM, and sometimes referred to as Value Added Modeling or Value Added Assessments, has made it’s way to Seattle. It has been pushed by Bill Gates and other ed reformers who are not educated in the field of education or for that matter in mathematical formulas and their meanings.

In our state, this evaluation system was heralded by the League of Education Voters (LEV) and the Washington State PTA (WSPTA) as a way to judge “teacher effectiveness”, a big item on the agenda of the ed reformers. Legislation was pushed through by legislators with the lobbying efforts of LEV, Stand for Children (SFC) and WSPTA in Olympia and praised as a success by LEV when the legislation passed.

The reason for this push is to basically devalue the idea of seniority and place the emphasis of success or failure of our schools squarely on the shoulders of teachers rather than an entire set of circumstance that are not in their control. What it has done is dumb down the curriculum to the point where the focus in the classroom is on test preparation on a narrow scope of knowledge in the subjects of math and English.

These same zealots, including Mayor Bloomberg at the time, began to publish test score evaluations of teachers in newspapers in Los Angeles and New York with disastrous results.  The publishing of these test scores was even applauded by the Secretary of Education, Arne Duncan. One teacher in New York City was called out as the worst teacher in the city with her photo published on the cover of one of these rags. She was highly regarded by her principal and colleagues but humiliated publicly. One young teacher in Los Angeles committed suicide after his test scores were published. His family and friends believe that much of it had to do with his evaluation based on student test scores. See  A teacher pushed to the edge.

For other articles on the subject of these witch hunts see: Carolyn Abbott, The Worst 8th Grade Math Teacher In New York City, Victim Of Her Own Success, These Are The Worst Teachers In New York City and The True Story of Pascale Mauclair.

For a simple breakdown of VAM, I would recommend this video.

For a more scholarly description there is Mathematical Intimidation: Driven by the Data written by mathematician John Ewing:

Mathematicians occasionally worry about the misuse of their subject. G. H. Hardy famously wrote about mathematics used for war in his autobiography, A Mathematician’s Apology (and solidified his reputation as a foe of applied mathematics in doing so). More recently, groups of mathematicians tried to organize a boycott of the Star Wars [missile defense] project on the grounds that it was an abuse of mathematics. And even more recently some fretted about the role of mathematics in the financial meltdown.

But the most common misuse of mathematics is simpler, more pervasive, and (alas) more insidious: mathematics employed as a rhetorical weapon—an intellectual credential to convince the public that an idea or a process is “objective” and hence better than other competing ideas or processes. This is mathematical intimidation. It is especially persuasive because so many people are awed by mathematics and yet do not understand it—a dangerous combination.

The latest instance of the phenomenon is valued-added modeling (VAM), used to interpret test data. Value-added modeling pops up everywhere today, from newspapers to television to political campaigns. VAM is heavily promoted with unbridled and uncritical enthusiasm by the press, by politicians, and even by (some) educational experts, and it is touted as the modern, “scientific” way to measure educational success in everything from charter schools to individual teachers.

Yet most of those promoting value-added modeling are ill-equipped to judge either its effectiveness or its limitations. Some of those who are equipped make extravagant claims without much detail, reassuring us that someone has checked into our concerns and we shouldn’t worry. Value-added modeling is promoted because it has the right pedigree — because it is based on “sophisticated mathematics.”As a consequence, mathematics that ought to be used to illuminate ends up being used to intimidate. When that happens, mathematicians have a responsibility to speak out.


Value-added models are all about tests—standardized tests that have become ubiquitous in K–12 education in the past few decades. These tests have been around for many years, but their scale, scope, and potential utility have changed dramatically.

Fifty years ago, at a few key points in their education, schoolchildren would bring home a piece of paper that showed academic achievement, usually with a percentile score showing where they landed among a large group. Parents could take pride in their child’s progress (or fret over its lack); teachers could sort students into those who excelled and those who needed remediation; students could make plans for higher education.

Today, tests have more consequences. “No Child Left Behind” mandated that tests in reading and mathematics be administered in grades 3–8. Often more tests are given in high school, including high-stakes tests for graduation.

With all that accumulating data, it was inevitable that people would want to use tests to evaluate everything educational—not merely teachers, schools, and entire states but also new curricula, teacher training programs, or teacher selection criteria. Are the new standards better than the old? Are experienced teachers better than novice? Do teachers need to know the content they teach?

Using data from tests to answer such questions is part of the current “student achievement” ethos—the belief that the goal of education is to produce high test scores. But it is also part of a broader trend in modern society to place a higher value on numerical (objective) measurements than verbal (subjective) evidence. But using tests to evaluate teachers, schools, or programs has many problems. (For a readable and comprehensive account, see [Koretz 2008].) Here are four of the most important problems, taken from a much longer list.

1. Influences. Test scores are affected by many factors, including the incoming levels of achievement, the influence of previous teachers, the attitudes of peers, and parental support. One cannot immediately separate the influence of a particular teacher or program among all those variables.

2. Polls. Like polls, tests are only samples. They cover only a small selection of material from a larger domain. A student’s score is meant to represent how much has been learned on all material, but tests (like polls) can be misleading.

3. Intangibles. Tests (especially multiple-choice tests) measure the learning of facts and procedures rather than the many other goals of teaching. Attitude, engagement, and the ability to learn further on one’s own are difficult to measure with tests. In some cases, these “intangible” goals may be more important than those measured by tests. (The father of modern standardized testing, E. F. Lindquist, wrote eloquently about this [Lindquist 1951]; a synopsis of his comments can be found in [Koretz 2008, 37].)

4. Inflation. Test scores can be increased without increasing student learning. This assertion has been convincingly demonstrated, but it is widely ignored by many in the education establishment [Koretz 2008, chap. 10]. In fact, the assertion should not be surprising. Every teacher knows that providing strategies for test-taking can improve student performance and that narrowing the curriculum to conform precisely to the test (“teaching to the test”) can have an even greater effect. The evidence shows that these effects can be substantial: One can dramatically increase test scores while at the same time actually decreasing student learning. “Test scores” are not the same as “student achievement.”

This last problem plays a larger role as the stakes increase. This is often referred to as Campbell’s Law: “The more any quantitative social indicator is used for social decision-making, the more subject it will be to corruption pressures and the more apt it will be to distort and corrupt the social processes it is intended to measure” [Campbell 1976]. In its simplest form, this can mean that high-stakes tests are likely to induce some people (students, teachers, or administrators) to cheat … and they do [Gabriel 2010].

But the more common consequence of Campbell’s Law is a distortion of the education experience, ignoring things that are not tested (for example, student engagement and attitude) and concentrating on precisely those things that are.

Value-Added Models

In the past two decades, a group of statisticians has focused on addressing the first of these four problems. This was natural. Mathematicians routinely create models for complicated systems that are similar to a large collection of students and teachers with many factors affecting individual outcomes over time.

Here’s a typical, although simplified, example, called the “split-plot design.” You want to test fertilizer on a number of different varieties of some crop. You have many plots, each divided into subplots. After assigning particular varieties to each subplot and randomly assigning levels of fertilizer to each whole plot, you can then sit back and watch how the plants grow as you apply the fertilizer. The task is to determine the effect of the fertilizer on growth, distinguishing it from the effects from the different varieties. Statisticians have developed standard mathematical tools (mixed models) to do this.

Does this situation sound familiar? Varieties, plots, fertilizer …students, classrooms, teachers?

Dozens of similar situations arise in many areas, from agriculture to MRI analysis, always with the same basic ingredients—a mixture of fixed and random effects—and it is therefore not surprising that statisticians suggested using mixed models to analyze test data and determine “teacher effects.”

This is often explained to the public by analogy. One cannot accurately measure the quality of a teacher merely by looking at the scores on a single test at the end of a school year. If one teacher starts with all poorly prepared students, while another starts with all excellent, we would be misled by scores from a single test given to each class.

To account for such differences, we might use two tests, comparing scores from the end of one year to the next. The focus is on how much the scores increase rather than the scores themselves. That’s the basic idea behind “value added.” But value-added models (VAMs) are much more than merely comparing successive test scores.

Given many scores (say, grades 3–8) for many students with many teachers at many schools, one creates a mixed model for this complicated situation. The model is supposed to take into account all the factors that might influence test results — past history of the student, socioeconomic status, and so forth. The aim is to predict, based on all these past factors, the growth in test scores for students taught by a particular teacher. The actual change represents this more sophisticated “value added”— good when it’s larger than expected; bad when it’s smaller.

The best-known VAM, devised by William Sanders, is a mixed model (actually, several models), which is based on Henderson’s mixed-model equations, although mixed models originate much earlier [Sanders 1997]. One calculates (a huge computational effort!) the best linear unbiased predictors for the effects of teachers on scores. The precise details are unimportant here, but the process is similar to all mathematical modeling, with underlying assumptions and a number of choices in the model’s construction.


When value-added models were first conceived, even their most ardent supporters cautioned about their use [Sanders 1995, abstract]. They were a new tool that allowed us to make sense of mountains of data, using mathematics in the same way it was used to understand the growth of crops or the effects of a drug. But that tool was based on a statistical model, and inferences about individual teachers might not be valid, either because of faulty assumptions or because of normal (andexpected) variation.

Such cautions were qualified, however, and one can see the roots of the modern embrace of VAMs in two juxtaposed quotes from William Sanders, the father of the value-added movement, which appeared in an article in Teacher Magazine in the year 2000. The article’s author reiterates the familiar cautions about VAMs, yet in the next paragraphseems to forget them:

Sanders has always said that scores for individual teachers should not be released publicly. “That would be totally inappropriate,” he says. “This is about trying to improve our schools, not embarrassing teachers. If their scores were made available, it would create chaos because most parents would be trying to get their kids into the same classroom.”

Still, Sanders says, it’s critical that ineffective teachers be identified. “The evidence is overwhelming,” he says, “that if any child catches two very weak teachers in a row, unless there is a major intervention, that kid never recovers from it. And that’s something that as a society we can’t ignore” [Hill 2000].

Over the past decade, such cautions about VAM slowly evaporated, especially in the popular press. A 2004 article in The School Administrator complains that there have not been ways to evaluate teachers in the past but excitedly touts value added as a solution:

“Fortunately, significant help is available in the form of a relatively new tool known as value-added assessment. Because value-added isolates the impact of instruction on student learning, it provides detailed information at the classroom level. Its rich diagnostic data can be used to improve teaching and student learning. It can be the basis for a needed improvement in the calculation of adequate yearly progress. In time, once teachers and administrators grow comfortable with its fairness, value-added also may serve as the foundation for an accountability system at the level of individual educators [Hershberg 2004, 1].”

And newspapers such as The Los Angeles Times get their hands on seven years of test scores for students in the L.A. schools and then publish a series of exposés about teachers, based on a value-added analysis of test data, which was performed under contract [Felch 2010]. The article explains its methodology:

“The Times used a statistical approach known as value-added analysis, which rates teachers based on their students’ progress on standardized tests from year to year. Each student’s performance is compared with his or her own in past years, which largely controls for outside influences often blamed for academic failure: poverty, prior learning and other factors.

Though controversial among teachers and others, the method has been increasingly embraced by education leaders and policymakers across the country, including the Obama administration.”

It goes on to draw many conclusions, including:

“Many of the factors commonly assumed to be important to teachers’ effectiveness were not. Although teachers are paid more for experience, education and training, none of this had much bearing on whether they improved their students’ performance.”

The writer adds the now-common dismissal of any concerns:

“No one suggests using value-added analysis as the sole measure of a teacher. Many experts recommend that it count for half or less of a teacher’s overall evaluation.

“Nevertheless, value-added analysis offers the closest thing available to an objective assessment of teachers. And it might help in resolving the greater mystery of what makes for effective teaching, and whether such skills can be taught.”

The article goes on to do exactly what it says “no one suggests” — it measures teachers solely on the basis of their value-added scores.

What Might Be Wrong with VAM?

As the popular press promoted value-added models with ever-increasing zeal, there was a parallel, much less visible scholarly conversation about the limitations of value-added models. In 2003 a book with the title Evaluating Value-Added Models for Teacher Accountability laid out some of the problems and concluded:

“The research base is currently insufficient to support the use of VAM for high-stakes decisions. We have identified numerous possible sources of error in teacher effects and any attempt to use VAM estimates for high-stakes decisions must be informed by an understanding of these potential errors [McCaffrey 2003, xx].”

In the next few years, a number of scholarly papers and reports raising concerns were published, including papers with such titles as “The Promise and Peril of Using Valued-Added Modeling to Measure Teacher Effectiveness” [RAND, 2004], “Re-Examining the Role of Teacher Quality in the Educational Production Function” [Koedel 2007], and “Methodological Concerns about the Education Value-Added Assessment System” [Amrein-Beardsley 2008].

What were the concerns in these papers? Here is a sample that hints at the complexity of issues.

• In the real world of schools, data is frequently missing or corrupt. What if students are missing past test data? What if past data was recorded incorrectly (not rare in schools)? What if students transferred into the school from outside the system?

• The modern classroom is more variable than people imagine. What if students are team-taught? How do you apportion credit or blame among various teachers? Do teachers in one class (say mathematics) affect the learning in another (say science)?

• Every mathematical model in sociology has to make rules, and they sometimes seem arbitrary. For example, what if students move into a class during the year? (Rule: Include them if they are in class for 150 or more days.) What if we only have a couple years of test data, or possibly more than five years? (Rule: The range three to five years is fixed for all models.) What’s the rationale for these kinds of rules?

• Class sizes differ in modern schools, and the nature of the model means there will be more variability for small classes. (Think of a class of one student.) Adjusting for this will necessarily drive teacher effects for small classes toward the mean. How does one adjust sensibly?

• While the basic idea underlying value-added models is the same, there are in fact many models. Do different models applied to the same data sets produce the same results? Are value-added models “robust”?

•Since models are applied to longitudinal data sequentially, it is essential to ask whether the results are consistent year to year. Are the computed teacher effects comparable over successive years for individual teachers? Are value-added models “consistent”?

To read the complete paper, go to Mathematical Intimidation: Driven by the Data.

John Ewing is president of Math for America, a nonprofit organization dedicated to improving mathematics education in U.S. public high schools by recruiting, training and retaining great teachers. The article originally appeared in the May Notices of the American Mathematics Society.

Another take on the subject is from Diane Ravitch and titled The Problems With Value-Added Assessment.