16
The last post stated, “Lower individual PBR values result
from mixing right and wrong marks in an item pattern. Wider score distributions
make possible longer item mark patterns.” I was curious about just how does
this happen?
I marked Item 30 in Table 19 with five locations. The top
location contained four right marks (1s). This location was then changed to
wrong marks (0s) and the four right marks were moved one count below. A visual
education statistics engine (VESE) table was developed. This process was then
repeated in each of the three lower locations.
The above process took an item with an unmixed mark pattern
(14 right and 26 wrong) and mixed wrong marks into four
lower locations, each with a one right count lower score. I moved four marks as
it took this many to get a measurable result with all six statistics with the
standard deviation (SD) set at 4 or 10% on a test with 40 students and 40 items
(Chart 40).
I did the same thing with the SD set at 2 or 5% (Chart 41)
where the effect on lowering the item PBR is greater. But a SD of 5% is not a
realistic value. The effect of mixing right and wrong marks would be even less
with the SD set at 8 or 20% with 40 students and 40 items. My assumption, at
this point, is that the mixing of right and wrong marks will be of little
concern in large tests such as standardized traditional multiple-choice (TMC)
tests.
Chart 42 shows an interesting observation. Mixing just one
count makes no change in the individual PBR for item 30. The reason for this
can be seen in Table 19. When a right mark with a related student raw score of
30 is mixing with the next lower location of 29, the math is 30 -1 = 29 and 29
+ 1 = 30. The student scores do not change. The students getting the scores do
change.
The deeper the mixing, the further the right marks are moved
down the student score scale, the lower the individual PBR. But the individual PBR
increases the further an unmixed mark pattern descends or lengthens, up to a
point.
Items 26 to 31 in Table 19 show how this happens. An S-shaped or sigmoid curve is etched into Table 19 with bold 1’s. Each item is
less difficult as you go from item 31 to 26 (0.25 to 0.75). Each mark pattern
lengthens linearly.
[The number of mark patterns was 10 at 5% student score SD
and 20 at 10% student score SD.]
The PBR and individual variance increase to a point and then
decrease (Chart 43). That point is the 70% average student score set for the
test. The test score sets the limit for individual item PBRs. In this table,
based on optimum conditions, that is 0.73 PBR which provides plenty of room for
classroom tests that generally run from 0.10 to 0.50.
Item 29 shows a difficulty of 0.45 and variance of 0.25.
Item 28 shows a difficulty of 0.55 and a variance also of 0.25. They fall
equidistant from the item difficulty mean of 20 or 0.50. The junction of mean student
score and mean item difficulty set the PBR limit.
This has practical implications. The further away the
average student score is from 50%, the lower the limit on item discrimination
(PBR).
In Table 19 an unmixed marking pattern can only be 12 counts
long before it decreases. If the test score had been 50%, the marking pattern
could have been 20 counts long and the PBR 100% (as shown in previous posts).
This all comes back to the need for discriminating items to
produce efficient tests; tests using the fewest items to rank students using
TMC. The problem is, we do not create discriminating items. We can create
items, but it is student performance that develops their PBR. This provides
useful descriptive information from classroom tests. The development of PBR
values is often distorted with standardized tests under conditions that range
from pure gambling to being severely stressful.
It does not have to be that way. By offering Knowledge and Judgment Scoring (KJS), or
its equivalent, students can report what they actually know and can do; what they
trust as the foundation for further learning and instruction. The test then
reveals student quantity and quality, misconceptions, the classroom level of
thinking, and teacher effectiveness; not just a ranking.
Most students can function with high quality even though the
quantity can vary greatly. The quality goal of the CCSS movement can be
assessed using current efficient technology once students are permitted to make
an individualized, honest and fair report of their knowledge and skills using
multiple-choice; just like they do on most other forms of assessment.
- - - - - - - - - - - - - - - - - - - -
-
Free software to help you and your students
experience and understand how to break out of traditional-multiple choice (TMC)
and into Knowledge and Judgment Scoring (KJS) (tricycle to bicycle):