Monday, April 26, 2010

Multiple-Choice Lucky Scores

The news headlines could have been, “Cheat or Chance” or “Trick or Teach,” this past year. The cut score for passing a multiple-choice test, scored by only counting right marks, continued to fall. The traditional multiple-choice test scoring method was being pushed over a credibility limit.

Aug 11: “City students are passing standardized tests just by guessing
Aug 17: “Guessing My Way to Promotion”
Sep 14: “Botched Most Answers on New York State Math Test? You Still Pass”
Sep 16: “Is any test reliable? CRCT? SAT? NAEP? ACT? Pick one”
Oct 31: “Ducan: States ‘set bar too low’”
Jan 11: “As School Exit Tests Prove Tough, States Ease Standards”

The 100-point 2009 Arkansas Algebra I (AAI) end-of-course test, mentioned in the last article, is a good example to examine to see how standardized testing actually works:

  1. Items for new AAI versions are trial-tested, in a current operational test, rather than field-tested on a selected sub-sample at a different time.
  2. A statewide Uniform Grading Scale is monitored for inflation by comparing the pass rate in school with the pass rate on the AAI.
  3. Arkansas has had a nearly perfect yearly increase in the AAI test score for the past nine years (see page 24 of 28).
The multiple-choice portion of the test is played on the traditional field of varying quality. At the high end, everyone knows what the examinee knows or can do, including the examinee. The scoring in Confidence Based Learning (CBL) plays in this region, as does the SAT and ACT when used to pick top quality winners. 

Traditional Right Marked Scoring (RMS), used  on the AAI, are played at the other, lower, end of the field. The examinee guesses and waits for the test score and even then no one knows what the student knows or can do, including the examinee.

Knowledge and Judgment Scoring (KJS) permits students to individualize their test to match their preparation. They can opt for RMS or for KJS. They can opt for the teacher to tell them what they have right, or for reporting what they know and trust is right. They can opt for lower or higher-order thinking.

Chance plays almost no part in CBL. Chance is the main determiner of lucky scores. [YouTube]  This holds for any test using RMS, including the SAT, ACT, and end-of-course tests. 

The effects of unaltered pure chance can be seen on tests such as the AAI when:

  1. The answer sheets are marked randomly without looking at the test booklet.
  2. The answer sheets have no erasures.
  3. No marking pattern is used such as wallpapering. Wallpapering reduces test anxiety by students agreeing, before the test, how they will mark forced-choice guesses (when they have finished reporting what they know and trust, but must not omit or not leave blanks).
  4. Student judgment is absent or is given no value (RMS).
There are several ways to score the effects of chance on multiple-choice tests:

  1. Randomly mark 100 AAI answer sheets for the 60 multiple-choice questions.
  2. Use a quincunx board.
  3. Use the Excel function: BINOMDIST.
The quincunx board allows you to see chance in action; that force behind what is called creativity in Arts, Letters, and Politics, and is also called error in Science, Math and Engineering. The quincunx board works well for normal classroom tests with about 25 students (balls) and 8 questions (9 bins). (Number each student. Run slowly. Have each student follow his/her ball as it falls into a bin. Repeat and compare results for an added effect.)

The Excel function BINOMDIST can be set for almost any number of students and questions. A set of 100 answer sheets produces a surprisingly uniform distribution even though the right answer is expected by chance but 1/4th of the time.

The graph of 4-option questions shows that no student can expect to pass the AAI by guessing. Classroom passing is set equal to 24 raw score points out of 100 points in Arkansas. The maximum lucky score on the sixty 4-option questions was 23, and that only happened about 1 out of 100 students. The required passing cut score of 37 points for graduation in Arkansas is far beyond the reach of lucky scores. [YouTube]

But students can alter these results by exercising higher-order thinking skills. If students can, on average, discard one option on each question, they are then working with a 3-option question test. The classroom test equivalent of 24 raw score points can be passed with lucky scores. Some 17 (6 + 4 + 3 + 2 + 1 + 1) out of 100 students passed by guessing from the remaining three options. Students who do this are often referred to as “test wise.”

Students, teachers, test makers, and administrators can manipulate the effects of chance, for their benefit, in other ways.

No comments:

Post a Comment