Wednesday, August 3, 2011

Scoring Clicker Data

I was recently presented with some clicker data to examine (GMW11). It had been scored by traditional right count scoring. There were a number of scores below 20%. There were even four students with a score of zero. That is way below the average guessing score when using five options on each question.

A different score distribution was produced by scoring the data for both knowledge and judgment. This distribution looks very much like what one would expect from students on their first introduction to Knowledge and Judgment Scoring. Students earn the same scores by both methods of scoring when they fail to exercise their own judgment (mark an answer to every question).The top three students therefore obtained the same score with both methods of scoring.

Here is an opportunity to compare Right Mark Scoring (RMS) and Knowledge and Judgment Scoring (KJS) when used on any multiple-choice test. There is one catch, both methods of scoring are being used on one, the same, set of answer sheets.

Normally students would elect which method they felt comfortable using (and if time permits, on the first or second test, they may fill out two answer sheets, one for each method of scoring). The same test data can support a number of different stories. This story will assume that the test was presented with a choice of RMS and KJS, and further, that this was the first such test for the class. Most would be expected to select what they are most familiar with: RMS.

When quantity and quality are scatterplotted from RMS data, the result is a straight line. Only one dimension is being measured: a count of right marks.
KJS data are two-dimensional. A range of quality scores can yield the same test score. The test score of 46% was earned by students with a range of quality scores from zero to 44%.

Higher quality students are found above 50%. Lower quality students are found below 50%. Higher quality students get higher scores by marking more right answers. Only one student marked a perfect 100% quality score (no wrong marks).

Lower quality students get lower scores by marking more wrong answers. Four marked a zero quality score (every one of their two to 10 marks on the 23 question test was wrong).

Quantity and quality have been given equal value. The active test score then starts at 50%: 1 point for right, 1 point for good judgment (omit or right), and zero for wrong (poor judgment). [Back in the 1970s, when this work first began, the active test score started with zero. It was called net yield scoring; right minus wrong. The discovery of the quality score produced the second dimension that assesses student performance rather than defaulting to luck on test day.]

The end result of training students to accurately report what they trust they know or can do is shown in the Fall88 scatterplot. After an initial test (such as the clicker data) where most students elect RMS, they change study habits, and voluntarily switch to KJS. Here most of the class show a quality score about one letter grade higher than their test score. There is a bit of a disconnect at the pass/fail line of 60% (70% C, 80% B, and 90% A). Experienced students feel more comfortable reporting what they know than guessing at answers on all items on the test. They are on the path to being independent learners (self-correcting scholars).

This is in contrast to traditional right mark scoring where any score can be one letter grade higher with good luck to one letter grade lower with bad luck than a student’s actual ability. A grade of B one day may be a D on another day. And no one, including the student, knows what the student actually knows and can do as the basis for further learning and instruction.

Grading has an important effect on which scoring method students select: RMS (“I mark, you score”) or KJS (self-assessment, “I tell you”). RMS students tend to cram and to match. KJS students bring a rich web of relationships (from learning by questioning, answering, and verifying) that they can apply to questions they have not seen before. There is an operational difference between remembering and understanding that can be measured (RMS vs. KJS).

No comments:

Post a Comment