8
The information that needed to be related in post 7, became
too long for one post. Post 7 contains the SEMEngine; all five of the related
statistics on one spreadsheet.
This post relates a collection of stuff that gives those statistics
additional meaning; a bit of understanding needed to use them properly.
The SEMEngine, in the previous post, can produce the
unpredictable statistics relevant to classroom tests and standardized tests. But
a full understanding of these statistics requires a discussion of a second
standard error and the two methods of scoring multiplechoice (traditional
multiplechoice, TMC, and Knowledge and Judgment Scoring, KJS); partial and
full disclosure of that a student knows and can do.
The standard deviation (SD) of the group test score and the
standard error of measurement (SEM) of the average student test score provide
guidance in constructing standardized tests as predictive inputs. These
statistics are also helpful in describing classroom test results. The first
refers to test results from the class or group taking the test, the average group score; the second, to the average
student score in the class. They are
two different perspectives of the same average score. They have different uses.
There is a second standard error, the standard error of the
mean (SE) that permits comparison between group
test scores. [I am belaboring this topic as the two standard errors (of the
mean and of measurement), the abbreviation (SEM) and even the SD can get
confused (Standard
error and Standard error vs.
Standard error of measurement).
Chart 18 shows how the SEM of the average student score is
reduced as more equivalent items are added to the Cantrell data of 14 items. A
50 item test is expected to yield a SEM of 5.15%. This is less than 1/3 the
range of the SD. But even this would require an improvement of 3 x 5.15 =
15.45% for a significant increase in performance from one year or one test to
the next. That is 1.5 times a traditional letter grade. To my knowledge, very
few standardized tests use 50 items in any topic or skill area.
Chart 19 shows how the SE of the classroom or group test
score is reduced as more equivalent items are added to the Cantrell data. The SE has a finer resolution than the
SEM. An improvement in class performance on a 50 item test, 3 x 2.57 = 7.71%
would require only about a 3/4 letter grade to show a significant difference in
the two test scores from two different classes or one class at two different
times. This shows that it is easier to show a significant difference between the
average scores from two tests than it is between two scores from the same
student.
[The above can be generalized to support the traditional
score range of 10% per letter grade.]
I retitled this post as “Teacher Effectiveness” after
looking at the above two charts (18 and 19). These statistics provide a means
of measuring teacher effectiveness; or at least ranking teacher effectiveness.
To measure teacher effectiveness, the portion of students electing TMC or KJS on
the test would also have to be included.
[A class selecting mostly TMC is in a
lower level of thinking classroom environment populated with passive pupils conditioned
to mark an answer to every item. A class selecting mostly KJS is in a higher
(all) level of thinking classroom environment populated with selfmotivated,
selfcorrecting high quality achievers who are mature enough to distinguish between
what they have yet to learn and what they know and can do that can serve as the basis for
further learning and instruction.]
Student
development is as important as knowledge or skill. The CCSS movement promotes
this idea too but without the simplicity of multiplechoice (in time and money).
These visualized statistical models of the real world
have been found to have practical value in making predictions (a most expected
midpoint on a range of possibilities). However, what we feed into these
statistics determines the validity and usefulness of the results. The concrete
reality that you got a score of 50% on a classroom test becomes transformed
into an abstract prediction that, + 1 SD, that score (and your next score on
an equivalent test) just might have been anywhere between 30% and 70% on an
equivalent standardized test. And further, using the SEM, the range may be
reduced to between 45% and 55% (generalized from Table 18).
Test scores (and these first five reviewed statistics) are
easily manipulated by the selection of questions on the test and how the test
is scored. The traditional multiplechoice test (forcedchoice test) is a game
with a built in handycap of over 20%. This manipulation of scores is so
traditional (so hardened to change) that little thought is given to it with the exception of when elementary
school students take their first multiplechoice tests.
Learning to lie is difficult for serious students; they
know a best guess is not a reflection of their abilities. It is just sugar coating
and a distraction from the ugly truth. Students with equal abilities, but
receiving lower test scores, rightly feel cheated by their poor luck on test
day. In time, these students just mark, finish the test, and then get back to
their world where they do have some control. Since there is no way of knowing if a right mark is a right
answer or a lucky answer, there is no need to take the test seriously except
for where their score falls in the class distribution (their rank).
[This practice is institutionalized when their class rank
is provided in college admission documents.]
The traditional multiplechoice test (TMC) is fast,
cheap, and marketed way beyond its valid ability to rank students IMHO. It is,
as my students put it, Dumb testing. The statistics are not an accurate, honest
and fair reflection of their individual abilities.
TMC IMHO drives students away from developing into selfmotivated,
selfcorrecting, high quality achievers.
Statistics will not change the outcome. There is a better (alternative) method
of multiplechoice assessment, KJS, at no additional cost that will guide their
development. An effective teacher motivates students to be ready to learn and
to want to learn.
A multiplechoice test can be used to permit students to
report what they actually know, understand, and find useful as the basis for
further learning and instruction. All that is required is an extraction of
student judgment (something that is considered an essential part of almost all
alternative and authentic assessments and soon the elaborate CCSS assessments).
Please check out Smart testing: Knowledge
and Judgment Scoring, partial
credit Rasch model, and Confidence
Based Assessment, for example. All three promote student development that
yields high test scores, long term, and with a minimum of review.
                   

Free software to help you and your students
experience and understand how to break out of traditionalmultiple choice (TMC)
and into Knowledge and Judgment Scoring (KJS) (tricycle to bicycle):
No comments:
Post a Comment