Wednesday, November 30, 2011

Smart Wallpaper Testing

The idea for wallpaper came from a simple fact. Students need protection from predatory testing. Know or not, they must mark an answer to each question. Birds fly in flocks and fish swim in schools. They do the same thing at the same time to avoid predators. Wallpaper lets students mark the same option when they cannot use the test to report what they know.

Two wallpaper patterns can be used to extract higher levels of thinking (Smart Testing) information. Dumb wallpaper is based on one of the answer options. Smart wallpaper can be based on the most frequent wrong mark for each question, for example.  Dumb wallpaper pays no attention to student performance. Smart wallpaper is based on expected student performance.

Wallpaper extracts higher levels of thinking (Smart Testing) information using Knowledge and Judgment Scoring (KJS). The assumption is that students omit or use the wallpaper pattern when not using the question to report what is known and trusted. This can be seen in the progression from KJS without wallpaper, Table 3bST, 

KJS with Dumb wallpaper, Table 3bSD,

and KJS with Smart wallpaper, Table 3bSS.

The student counseling mark matrix analysis (the test taker view of the test) changes from nonsense, to a better performance with Dumb wallpaper, to a typical Knowledge and Judgment Scoring (KJS) printout with Smart wallpaper.

Test scores increase as the simulated quality increases. The distributions (Standard Deviations) of scores and item difficulty decrease. Test reliability declines!  Oops!  “Houston, we have a problem!” Test companies optimize (brag about) their test reliability based on poor quality data. KJS optimizes student judgment to produce accurate, honest, and fair data.

This table clearly captures this conflict in numbers. High test reliability is needed to obtain similar consecutive average test scores. It follows the lower the quality of student scores and the lower the average test score, the more chance determines the average test score. It is also known that the normal curve is highly reproducible by chance alone. High test reliability can become an artifact of test design rather than student performance.

To the fact that the starting score on a multiple-choice test is 1/(number of options) rather than zero, we can now add a second form of self-deception (psychometricians refer to these as simplifications). They made some sense when everything was done with paper and pencil. Today there is no need to still lock quality and quantity together on a multiple-choice test, especially now that one (KJS) can measure what students actually know and trust rather than just rank students (RMS).

The misconceptions in Table 3bST are artifacts created by forcing students to mark when they have no answer of their own. They were not given the option to omit (to mark an accurate, honest and fair answer sheet). Table 3bSS, using Smart wallpaper, shows all four groups of questions (expected, discriminating, guessing, and misconception – EDGM). Higher quality students earn higher test scores that are more accurate, honest and fair.
The scores in Table 3bSS are only obtainable if students omitted instead of marking the most frequent wrong mark for each question. This simulation fails to capture what students would actually do, if given the opportunity to only mark, when marking reports something they know and trust (can confirm).  Given that opportunity, some quality scores would be higher and some lower. Also there is no way to know which wrong mark will be the most frequently marked for each question. Wallpaper must be created BEFORE the test, not after the test.

This simulation again demonstrates there is no way of equating RMS and KJS results from one set of data. To know what students actually know they must be give the opportunity to report what they know that is meaningful and useful as the basis for further learning, instruction, and use on the job. Traditional RMS only does this when test scores are near 90%. Knowledge and Judgment Scoring (Smart Testing) yields a valid quality score (%RT) for every test score, a valid test score for every high quality (%RT) student performance.

No comments:

Post a Comment