The bet in the title of Catherine
Gewertz’s article caught my attention: “One District’s Common-Core Bet: Results
Are In”.
As I read, I realized that the betting that takes place in traditional
multiple-choice (TMC) was being given arbitrary valuations to justify the
difference between a test score and a classroom observation. If the two agreed,
that was good. If they did not agree, the standardized test score was
dismissed.
TMC gives us the choice of a right mark and several wrong
marks. Each is traditionally given a value of 1 or 0. This simplification,
carried forward from paper and pencil days, hides the true value and the
meanings that can be assigned to each mark.
The value and meaning of each mark changes
with the degree of completion of the test and the ability of the student.
Consider a test with one right answer and three wrong answers. This is now a
popular number for standardized tests.
Consider a TMC test of 100 questions. The starting score is
25, on average. Every student knows this. Just mark an answer to each question.
Look at the test and change a few marks, that you can trust you know, to right.
With good luck on test day, get a score high enough to pass the test.
If a student marked 60 correctly, the final score is 60. But
the quality of this passing score is also 60%.
Part of that 60% represents what
a student knows and can do, and part is luck on test day. A passing score can
be obtained by a student who knows or can do less than half of what the test is
assessing; a quality below 50%. This is traditionally acceptable in the
classroom. [TMC ignores quality. A right mark on a test with a score of 100 has
the same value, but not the same meaning as a right mark on a test with a score
of 50.]
A wrong mark can also be assigned different meanings. As a
rule of thumb (based on the analysis of variance, ANOVA; a time honored method
of data reduction), if fewer than five students mark a wrong answer to a
question, the marks on the question can be ignored. If fewer that five students
make the same wrong mark, the marks on that option can be ignored. This is why
Power Up Plus (PUP) does not report statistics on wrong marks, but only on
right marks. There is no need to clutter up the reports with potentially
interesting, but useless and meaningless information.
PUP does include a fitness statistics not found in any other
item analysis report that I have examined. This statistic shows how well the
test fits student preparation. Students prepare for tests; but test makers also
prepare for the abilities of test takers.
The fitness statistic estimates the score a student is
expected to get if, on average, as many wrong options are eliminated as are
non-functional on the test, before guessing; with NO KNOWLEDGE of the right
answer. This is the best guess score. It is always higher than the design score
of 25. The estimate ranged from 36% to 53%, with a mean of 44%, on the
Nursing124 data. Half of these
students were self-correcting scholars. The test was then a checklist of how
they were expected to perform.
With the above in mind, we can understand how a single wrong
mark can be devastating to a test score. But a single wrong mark, not shared by
the rest of the class can be taken seriously or ignored (just as a right mark,
on a difficult question, by a low scoring student).
To
make sense of TMC test results requires both a matrix of student marks
and a distribution
of marks for each question (Break Out Overview). Evaluating only an
individual student report gives you no idea whither a student missed a survey
question that every student was expected to answer correctly or a question that
the class failed to understand.
Are
we dealing with a misconception? Or a lack of performance related to different
levels of thinking in class and on the test; or related to the limits of rote
memory to match an answer option to a question? [“It’s the test-taking.”] When does
a right mark also mean a right answer or just luck on test day? [“This guy
scored advanced only because he had a lucky day.”]
Mikel Robinson, as an individual, failed the test by 1
point. Mikel Robinson, as one student in a group of students,
may not have failed. [We don’t really know.] His score just fell on the low
side of a statistical range (the conditional standard error of measurement; see
a previous post on CSEM). Within this range, it is not possible to
differentiate one student’s performance from another’s using current
statistical methods and a TMC test design (students are not asked if they can
use the question to report what they can trust they actually know or can do).
We can say, that if he retook the test, the probability of
passing may be as high as 50%, or more, depending upon the reliability and
other characteristics of the test. [And the probability of those who passed by
1 point, of then failing by one point on a repeat of the test, would be the
same.]
These problems are minimized with accurate, honest, and fair
Knowledge and Judgment Scoring (KJS). You can know when a right mark is a right
answer using KJS or the partial credit Rasch model IRT scoring. You can know
the extent of a student’s development: the quality score. And, perhaps more
important, is that your students can trust what they know and can do too;
during the test, as well as after the test. This is the foundation on which to
build further long lasting learning. This is student empowerment.
Welcome to the KJS Group: Please register
at mailto:KJSgroup@nine-patch.com.
Include something about yourself and your interest in student empowerment (your
name, school, classroom environment, LinkedIn, Facebook, email, phone, and
etc.).
Free anonymous download, Power Up
Plus (PUP), version 5.22 containing both TMC and KJS: PUP522xlsm.zip, 606 KB or PUP522xls.zip, 1,099 KB.
- - - - - - - - - - - - - - - - - - - - -
Other free software to help you and your students experience and
understand how to break out of traditional-multiple choice (TMC) and into Knowledge
and Judgment Scoring (KJS) (tricycle
to bicycle):