Why test scores aren’t valid for teacher evaluation

From Diane Ravitch’s Bridging Differences blog on Education Week — the best explanation I’ve read yet for why it’s not valid to use student test scores in teacher evaluation:

I received an email from Dr. Harry Frank, an emeritus professor of psychology at the University of Michigan who has written textbooks about testing and measurement. Dr. Frank wrote that the first principle for valid assessment is that “no assessment can be used at the same time for both counseling and for administrative decisions (retention, increment, tenure, promotion). … All this does is promote cheating and teaching to the exam. … This principle is so basic that it’s often covered in the very first chapter of introductory texts on workplace performance evaluation.” [The full text of Dr. Frank’s email is posted on my Web site, www.dianeravitch.com, in a section called “comments.”] I asked Dr. Frank to explain the word “counseling,” and he said that this meant “feedback on performance for purposes of skills development,” what we might think of as the diagnostic use of an assessment. Dr. Frank also added: “Assessments should be a counseling resource, not a source of extrinsic motivation, i.e., rewards and punishments for teachers, administrators, and school districts.”

Put simply, tests and assessments should inform teachers about student progress and their own teaching, i.e., what can be learned from the test results. But it is inappropriate to use the same test results to hand out bonuses and punishments, promotions and tenure.


6 responses to “Why test scores aren’t valid for teacher evaluation

  1. Shawn, interesting you bring up the idea of formative assessments. Developing these assessments is a major reform effort being spearheaded by Deputy Supt Carranza as we speak. In last night’s consent calendar the Board approved a $200k contract with a service that will develop these assessments for every grade by the beginning of next year.

  2. Shawn (Educator)

    It’s amazing to me that so much focus is put on the CST–it is a basic recall, fill in the bubble biased test. White kids do well on the CST and English Language Learners and Standard English Language Learners generally don’t (unless you look at schools that are using home language–like Ebonics–as a way to empower and teach kids to code-switch). It does not show if kids know how to read, write, speak, perform, problem solve or any other of the other multiple intelligences. Teacher should certainly not be judged by the CST scores, but rather should be evaluated through formative means . If the District didn’t ask principals to combine summative and formative assessments into one classroom observation then we would see more progress with our teachers. I hear all the time on these blogs–“I heard a principal say that this teacher is a weak link…..” What about the District placing inexperienced principals into schools that should have the more experienced. I got news for you–the upper echelon of the District are fighting within house right now over positions and have no ideas how to run a school system. Many of them are former teachers enamored by power who do not have the training to deal with politics. I don’t know how many times I’ve seen the central office cruise through the school with their clipboards and suits without a smile on their face.

  3. If they are more seriously about “counseling”, the next grade’s teach should get the answer sheets of the class for the previous year’s test.

  4. I’ve been told by more than a few principals that, year over year within a school, it’s pretty clear who is making progress with their students and who isn’t. In fact, one even noted that while he (of course) couldn’t share who those teachers were, they were the teachers that any student, fellow teacher or parent could (and usually do) name as weak.

  5. Hi Marcia – I certainly wouldn’t have implied that the CST would be useful as a way to provide any kind of feedback loop to teachers – you’re right that the cycle is way too long. And Ms. Ravitch’s article was written for a national audience so I’m sure she wasn’t referring to the CST either.

  6. Marcia Lomneth

    Your article and Diane Ravitch’s imply that California STAR Testing (CST tests) could be used for “counseling” But they can’t really be used for that kind of “feedback on performance for purposes of skills development” because the results don’t come back soon enough. The kids are already in the next grade when their scores arrive.
    Additionally, teachers and students never know which questions were missed. They only get the summary score for a topic. For California Math standards, the topic groupings are essentially meaningless.
    Since the CST data can’t be used for counseling, it seems reasonable that student growth (as measured by those tests) should be used for teacher evaluations.
    It is generally agreed, that the Math CST test results are a very good indicator of what students have learned in a school year. Granted, for English/Language Arts or Science, CST test may not be quite as useful, because multiple choice tests work better for math problems (with one correct answer) than they do for evaluating reading, writing or science concepts.

    As for cheating, that would be hard to do in Middle and High Schools, because the tests are not given in the regular class rooms. They are generally given in home room. The subject teacher is rarely, if ever, testing the students he or she teaches.
    Good teachers regularly move students up a level, and sadly struggling teachers often have a class that comes in at or above grade level, and the kids sink to Basic or Below Basic in one year.
    For judging a teacher, perhaps we should look at an average of two or three years of data to look for consistent student growth trends.
    The tests we have in California are not perfect, but they are good enough. Even so, they are useful enough to identify classes and teachers where the best learning is going on.
    One principal I know looked for classes with a large percentage of what he called “sliders” to identify where many kids lost ground after a year with a particular teacher. Teachers that have that problem should be put on an program, and either be helped to improve, or moved to a job that better matches that person’s talents.