Wednesday, May 19, 2010

Wallpapering Traditional Multiple-Choice Tests

 Wallpapering is preparing, in advance of the test, a mark pattern to be used when students do not have answers they can verify and trust. Students have three options after marking all the questions that can be used to report what is known or can be done:

  1. Turning in the answer sheet yields an accurate, honest, but unfair score unless omit or judgment is given a value equal to, or higher than, knowledge; as is done with Knowledge and Judgment Scoring (KJS) and Confidence Based Learning (CBL).

  1. Randomly marking the remaining questions gives judgment a value of zero. The score is less accurate, honest, and fair the lower it gets until it only reflects answer sheet marking ability. The test is a high anxiety academic casino game at the lowest levels (orders) of thinking.

  1. Wallpapering is a defensive measure. It reduces test anxiety. It increases fairness and test security. It shares the same good luck.

Being prepared reduces test anxiety. This includes how to make a forced-choice mark when you do not have a trusted answer. The age-old advice is to pick one option, such as C. Wallpapering adds one more step: Everyone in the class makes the same mark (with KJS and CBL everyone just omits).

A fair test requires a fair starting score (which exists with KJS and CBL).
The active starting score on traditional multiple-choice tests is about 33%, on average. That is a range of independent starting scores of about two letter grades. Wallpapering reduces this range.

Wallpapering produces a security code. The wallpaper marking-pattern can be made as elaborate as needed. Over half of the marks on an answer sheet can come from wallpapering when test scores drop below 50%. A set of answer sheets marked right and wallpapered, and with no erasures, indicates no tampering.

NCLB raw scores below 40% are now listed as Proficient in several states. The distribution of scores from marginal students with equal abilities follows the normal curve of error. The distribution widens as the test scores descend. It is gambling. Some pass. Some fail. This is not fair.

Wallpapering reduces this unfairness. All students in the group (class) mark the same answer when they cannot trust making a right mark. They do the same thing at the same time rather than individually trust to luck. This does not change their individual test scores, on average.

Wallpaper is an answer sheet created BEFORE seeing the test. Individual variation is markedly reduced. The simplest example is for all in the group to agree to mark the same letter when in doubt. More variable patterns can be created using mnemonics for easy memory. Short patterns can repeat every few questions. The Christmas tree repeats every 4 questions (A, B, C, D) on a 4-option test. Longer patterns can use poetry and music.

Doing the same thing at the same time has evolved in birds as a means of protecting individual members of the flock from predators. The tight formation protects individual members and decreases the energy needed to fly. The same protection and energy savings applies in schools of fish.

Wallpapering has this effect for marginal students taking tests using RMS. It reduces the random lucky score variation in individual test scores. Wallpapering allows students to do the same thing at the same time with equal ease when marking a trusted right answer or marking the equivalent of omit using KJS or CBL. A few minutes of planning equal a few millennia of evolution in protecting marginal students from the vagaries of NCLB testing.


Multiple Choice Bubble Sheet Template:
Teacher-Author: E. Fisher         Price: FREE

Wednesday, May 12, 2010

My Score Quality

Examiners can tell students, parents, and employers how a score relates to other examinees on a test. But how does it relate to everything else?

What does my score mean other than I passed the Arkansas Algebra I (AAI) end of course test? Am I ready for Algebra II? Have I mastered the general lifetime skills supported by learning Algebra? Did I take a lower-order thinking appreciation course or a higher-order thinking skills course? Did I just pass a graduation requirement and get a grade? Are the newspapers right that the course is not tough enough, that the passing cut score is too low?

Arkansas is one of five states to have a Statewide Uniform Grading Scale for classroom tests. This is one way of indicating quality. The final determiner is how students perform on their next unit, next semester or next job assignment.

Quality varies between states. The letter grade of “C” ranges from 70% to 77%. A  classroom “D” is 60% in Arkansas and Florida and 70% in South Carolina and Tennessee. The quality of a test score is dependent upon a number of factors including scale scores. The AAI raw score equivalent to a classroom pass = 24%.


If the AAI test were all multiple-choice, every score falls in the shadow of the lucky scores. The score of 25 is nonsense. The cut scores of 21, 24 and 37 could be obtained by just marking the answer sheet without looking at the test. All cut scores would be shady “no quality” scores.

Replacing 40 of the multiple-choice questions with five open-response questions toughens up the test. The lucky scores on the AAI 4-option question test now cast a shadow over just half of the playing field, from the 15% to the 60% line. A score of 15 can be expected from lucky scores, down from 25, on average. Both 24 and 37 fall about half shaded. They have a quality score of less than 50%. Any score below 50% is a low quality score. Right mark scoring (RMS) holds students accountable for their luck on test day, as much as or more than, for what they know or can do.

Psychometricians were not on the side of the students when they included the five open-response questions. However these questions are, in general, non-functional.  The test designed for 100 points actually functions as a test based
on 60 points. The functional passing scores are 40% (24) and 60% (37) out of 60 even though the designed passing scores are 24 and 37 out of 100. Few multiple-choice tests using RMS function as designed.
 
The AAI is designed for students to mark their best guess at the “best answer” on each question. Individual student test scores below 50% only have meaning after being averaged into a class or school score ranking. (RMS remains the least expensive way to obtain school rankings.) This research technique fails to apply to individual students. A test score of 37%, on a crippled multiple-choice test (no omit), is also a quality score of 37%. The test is not designed for students to report what they trust they know and can use as the basis for further learning and instruction. That requires the option missing on tests using RMS: omit (I have yet to learn this).

RMS and knowledge and judgment scoring (KJS) can be combined on the same test as a means of gently nudging students out of the habit of guessing, to reporting what they actually know. The test scores and student counseling matrixes guide students on the path from passive pupil to self-correcting high achiever. There is an additional dimension of information available that is not obtainable with RMS even when using the same test questions.

(Wallpaper has a third use with RMS. Along with reducing test anxiety, and the variation in lucky score starting positions, it allows KJS to extract ¾ of the quality information lost with RMS. A wallpaper key is added to the answer key and weight key.)

The learning cycle shortens as passive pupils become self-correcting high quality achievers. Boring classes become exciting adventures. A multiple-choice test that randomly passes and fails low performing students of equal abilities with RMS becomes a seek-and-find task to report what is meaningful and useful for each student with KJS and Confidence Based Learning (CBL).

When students elect to report what they know and trust with KJS or CBL, they receive a quantity score, a quality score and a test score. High quality students obtain individual confirmation that they do know what they know and that they are skilled at using this knowledge regardless of the quantity of right marks. Success is doing more of what each student is good at doing. This is in contrast to RMS where doing more of what low scoring students are doing (guessing right answers) is a continuation of failure (a practice in continually failing schools).

Assessment should produce high quality scores and promote the development of high quality students. CBL differentiates questions into informed, uninformed, misinformed and good judgment to omit, to question, and not make a serious error.  KJS sorts questions into expected, difficult, misconception and good judgment to not make a wrong mark and thus report what has yet to be learned. Quality is independent from quantity.

Secretary of Education, Ernie Duncan’s opinion: “At a time when we should be raising standards to compete in the global economy, more states are lowering the bar than raising it. We're lying to our children when we tell them they're proficient but they're not achieving at a level that will prepare them for success once they graduate.”



Thursday, May 6, 2010

Three Multiple-Choice Games



Three multiple-choice games can be played on the same field. Each has its own rules for scoring and grading. [YouTube]

The 2009 Arkansas Algebra I (AAI) end-of-course test has the game field designed with 100 points, the same number as yards on a football field. The field slopes from a swamp down at the left end were the guessers play up to dry land were the 100% goal posts stand.

The number of answer options for each multiple-choice question controls the difficulty of play related to luck. The more options per question, the more skilled the players must be to win and the fewer lucky winners. Anyone can play when right mark scoring (RMS) is used: students, employees, and animals (the target of the original complete multiple-choice test that included omit).


The Arkansas Uniform Grading Scale rules set the letter grades of D to A at 60 to 90 for traditional right mark scoring  (RMS) on classroom tests. The static starting score is set to zero. The hidden active starting score is 25, on average.
The Arkansas Algebra I (AAI) end-of-course test replaces 40 multiple-choice with five 8-point open response questions. The hidden active starting score is reduced from 20 to 15, on average. The test is now ten points, or one letter grade, more difficult. A student cannot pass the test by guessing.
The active starting score, the lucky score, is hidden at the left end of the playing field in the foggy swamp where the guessers play among the lucky-score trees. The traditional classroom game starts here with lower order thinking skills. Students are encouraged to guess from 5, 4, 3, or 2 options. Only right marks count as blank and omit have no value with RMS.

Confidence Based Learning (CBL) only plays on dry ground near the goal posts. It uses 3-option questions. It starts play at the 75% (25-yard) line for good judgment, far away from the swamp of shady scores. Mastery players receive points for both knowledge and their skillful to use their knowledge (their judgment). They attempt to reach the 100% goal posts. They make few, if any, wrong marks.

Knowledge and Judgment Scoring (KJS) starts play at the 50% (50-yard) line for good judgment. Students functioning at lower levels of thinking can mark every question (which may put them back in the swamp with RMS). Students and employees functioning at higher (all) levels of thinking use the test to report what they trust. Their goal is to make the highest number of right marks with the fewest number, if any, of wrong marks. [YouTube]

A universal score board sums the rules for the three methods of scoring. Scoring is compared in passive, static, mode after the test is finished; and in active, dynamic, mode during the test. Scoring for KJS and CBL are usually expressed in the active, dynamic, mode as the scoring starts with the value given to perfect judgment, 50% or 75% (no wrong marks have been made at the start of the test).

Scoring for RMS is usually expressed in the passive, static, mode after the test paper has been turned in. This allows resetting the starting score (and the value of judgment) to zero. This has deceptive consequences. Students like the apparent “no risk” feature. They also like the help from lucky marks. What they do not realize is that every wrong mark reduces their lucky score.

Changing from RMS to KJS or CBL is about the same as changing from a tricycle to a bicycle. It is changing from external control and correction to internal control and self-correction; from linear, low order, thinking to include high order, cyclical, thinking.  It takes practice; about three experiences.

It is scary to do something new. Who ever heard of getting one point for a right mark and one point for the good judgment to not make a wrong mark (omit)? It is done on every essay test where students report what they know and trust, and omit what they have yet to learn.

Students quickly like KJS as it saves them time not having to come up with “the best answer” to a question they cannot read or understand. They like to see the quality score confirm what they trust; what they really know and can build on.

They like the freedom to customize the multiple-choice test to match their preparation (a 90% quality score), as they do on most other assessments. This is effective formative assessment as students learn to question, to answer, and to confirm as they are learning in preparation for assessment. They are in charge as they develop from passive pupil to self-motivated high achiever.

Teachers benefit too. KJS and CBL differentiate misconceptions, where students think they know the answer but do not, from just guessing on difficult questions. Students are sorted by their level of thinking (teachable level), as well as, by what they know. Each student presents a quantity, a quality, and a test score. You have accurate, honest and fair numbers to support you classroom observations.

Since the three methods of scoring are based on different skills, the Universal Cut Point Raw Score Grade Equalizer or other methods can be used to assign grades (2009 Arkansas End-of-Course Raw to Scale Score Conversion Table and State Law).

All three methods produce the same raw score when examinees fail to exercise good judgment and mark all questions in hope of getting a lucky passing score. An accurate and honest performance produces the highest score, on average.