The best test is a test that permits
you to accurately, honestly, and fairly report what you know and can do. You
know how to question, to get answers, and to verify. You know what you know and
what you have yet to learn. This operates at two levels of thinking. It is a
myth that a forced choice multiple-choice test measures what you trust you know
and can do.
At the beginning of
any learning operation, you learn to repeat and to recall. Next you learn
to relate the bits you can repeat and recall. By the end of a learning
operation you have assembled a web of skills and relationships. You start at
lower levels of thinking and progress to higher levels of thinking. Practice
takes you from slow conscious operations to fast automatic responses
(multiplication or roller skating). It is a myth that learning primarily occurs
only by responding to a teacher in a classroom.
Your attitude
during learning and testing is important. Your maturity is indicated by your
ability to get interested in new topics or activities your teacher recommends
(during the course). As a rule of thumb, a positive attitude is worth about one
letter grade on a test. It is a myth that you can easily learn when you have a
negative attitude.
Your expectations are important. You tend to get what you expect. A nine year study with over 3000 students indicated that students tend to get the grade they expected at the time they enrolled in the class, based on their lack of information, misinformation, and attitude. It is a myth that you cannot do better than your preconceived grade.
Learning and testing are one coordinated event when you can see the result of your practicing directly (target practice or skateboarding). This situation also occurs when you are directly tutored by a person or by a person’s software. It is a myth that you must always take a test separately from learning.
Complex learning
operations go though the same sequence of learning steps. The rule of three
applies here. Read or practice from one source to get the basic terms or
actions. Read or practice from a second set to add any additional terms or
actions. Read or practice from a third set to test your understanding, your web
of knowledge and skill relationships. It is a myth that you must always have
another person test your learning (but another person can be very helpful).
That other person is usually a teacher who cannot teach and test each pupil or student
individually. The teacher also selects what is to be learned rather than
letting you make the choice. The teacher also selects the test you will take.
It is a myth that your teachers have the qualities needed to introduce you to
the range of skills and knowledge required for an honest, self-supporting
citizen.
Teaching usually takes place during scheduled time periods. In extreme situations, only what is learned
in those scheduled time periods will be scored. This is one basis for assessing
teacher effectiveness. It is a myth that the primary goal of traditional schools
is student learning and development.
Traditional
multiple-choice is defective. It was crippled when the option of no
response, “do not know”, was eliminated when adapted from its use with animal
experiments to make classroom scoring easier. It is a myth that you should not
have this option to permit accurate, honest, and fair assessment.
Traditional multiple-choice promotes selecting the best right answer: using the lowest levels
of thinking. The minimum requirement is making a mark for each question. It is
a myth that such a score measures what you know or can do. The score ranks you
on the test.
Your score may rank you above or below average. It is a myth
that you will always be safe with an above average score (passing).
The normal distribution of multiple-choice test scores is
based on your luck on test day. The
normal distribution is desired for classes in schools designed for failure. It
is a myth that a class should not have an average score of 90%.
Luck on test day will
distribute 2/3 of your classmates’ multiple-choice scores within the bubble in
the center of a normal distribution; that is one standard deviation (SD) from the average. (Table
15 or Download) [SD = SQRT(Variance)
and the Variance = SUM(Deviation from the Average^2)/N = Mean Sum of Squares =
MSS]
Your grade (cut
score) is set by marking off the distribution of classmate scores in standard
deviations: F (<-2 b="" c="" d="" to="">+1); A (>+2). Your raw
score grade is the sum of what you know and can do, your luck on test day, and
your set of classmates.-2>
Raw scores can be adjusted
by shifting their distribution, higher or lower, and by stretching (or
shrinking) the distribution to get a distribution that “looks right”. It is a
myth that your teacher, can only select the right mix of questions, to get a
raw score distribution that “looks right”.
Some questions
perform poorly. They can be deleted and a new, more accurate, scored
distribution created. It is a myth that every question must be retained.
Discriminating questions are marked right only by high scoring
classmates and marked wrong by low scoring classmates. (Table
15 or Download) It is a myth that all
questions should be discriminating.
Discriminating questions
produce your class raw score distribution. About 5 to 10 are needed to
create the amount of error that yields a range of five letter grades. It is a
myth that discriminating questions assess mastery.
The reliability
(reproducibility, precision) of your raw score can be predicted, but not your
final (adjusted) score. Test reliability
(KR20) is based on the ratio of variation (the variance) from between student
scores (external column) and within
question difficulty mark patterns (internal
columns). (Table
15 or Download)
This makes sense: The smaller the amount of error variance
within the question difficulty internal columns, with respect to the variance between
student scores in the external column, the
greater the test reliability. Discriminating, difficult, questions spread
out student scores more (yield higher variance) than they increase the error
variance within the questions. If there were no error variance, a test would be
totally reliable (KR20 = 1). It is
a myth that a good informative test must maximize reliability.
The test reliability can
help predict the average test score your class would get if it were to take
another test over the same set of skills and knowledge. The Standard Error of Measurement (SEM) of
your test is the range of error (from all of the above effects) for the average
test score. (Table
15 or Download) The SD of the test and
the test reliability are combined to obtain the SEM. The test reliability
extracts a portion of the SD. If the test reliability were 1 (totally
reliable), the SEM would be 0 (no error), the class would be expected to get
the same class test score on a retest.
And finally what can you expect about the precision of your
score and your retest score (providing you have not learned any more). A retest is of critical importance to
students needing to reach a high stakes cut score. If the SEM or CSEM ranges
widely enough, you do not need to study. Just retake the test a couple of times
and your luck on test day may get you a passing score. It is a myth that the
probability, of you getting a passing grade 2/3 of the time, will insure you
get the passing grade if you need a second trial.
The Conditional [on
your raw score] Standard Error of
Measurement (CSEM) extracts the variance from only your mark pattern (Table 22). [CSEM = SQRT(Variance within your marks X the number of questions]
Your CSEM will be very small if you have a very high or low score. This limits the prospects of a passing score by retaking a test without studying.
Now to study, to
change testing habits, or to trust to luck on test day, before a retest. Get a
copy of the blueprint used in designing the test. A blueprint lists in detail what
will be covered and the type of questions. Question each topic or skill. It is
easier to answer questions other people have written if you have already
created and answered your own questions. Use the advice in the first five
paragraphs above and work up into higher level of thinking, meaning making (a
web of relationships that makes sense to you and visualize, sketch, draw, every
term).
A change in testing
habits may also be in order. Many students who do not “test well” are
bright, fast memorizers, but lacking in meaningful relationships that make
sense to themselves. They are still learning for someone else: the test and
scanning each question for the “one right answer”. With meaningful
relationships in mind you have the information in hand to answer a number of
related questions. You are not limited to just matching what you recall to the
question answers. [Mark out wrong answers and guess from the remaining answers.]
And now for the “Hail
Mary” approach. First, as a rule of thumb, your score on a test written by
someone other than your teacher (a standardized test for example) will be one
to two letter grades below your classroom test scores. If your failing test score
is within 1 SEM of the cut score, you can expect a retest score within this
range 2/3 of the time. The same prediction is made with your CSEM value that
can range above and below the SEM value. If your failing test score is below 1
SEM or 1 CSEM from the cut score, you have no option other than to study. It is
a myth that students passing a few points above the cut score will also pass on
a retest. [Near passes are safe. Near failures are not.]
Also please keep in mind that all of the math dealing with the
variation between and within columns and rows (the variance) can be done on the
student and question mark patterns with no
knowledge of the test questions or the students. It is a myth that good
statistical procedures can improve poor question or student performance.
Teacher and psychometrician judgment on the other hand can do wonders!
The standardized test
paradox: A good blueprint to guide calibrated question selection for the
test is the basis for low scores and a statistically reliable test. Good student
preparation is the basis for high scores (mastery) and a statistically
unreliable test (it cannot spread student scores out enough for the
distribution to “look right”).
The sciences, engineering, and manufacturing use statistics
to reduce error to a minimum (low maintenance cars, aircraft, computers, and
telephones). Only in traditional institutionalized education (schools designed
for failure) is error intentionally introduced to create a score range that
“looks right” for setting grades and ranking schools. This is all non-sense for
schools designed for mastery (who advance students after they are prepared for
the next steps). It is a myth (and an entrenched excuse for failure by the
school) that student score distributions must
fit a normal, bell-shaped, curve of error.
Mastery schools
are now being promoted as the burden of record keeping is easily computerized.
The Internet makes mastery schools available everywhere and at anytime. This
will have a marked change in traditional schooling in the next few years. This
change can be seen in the “flipped” classroom (a modern version of assigned
[deep] reading before class discussion). It is a myth that the “flipped”
classroom is something new.
Current educational software removes the time lag, in the question-answer-and-verify learning cycle,
introduced by grouping students in classes, and then extended with standardized
tests. Learning and assessment are again joined to promote mastery of assigned
skills and knowledge. Students
advance when they are ready to succeed at the next levels. It is a myth that
“formative assessments” are actually functional when test results are not
available in an operational time frame (seconds to a few days).
Standardized tests will continue to rank students and
schools, as the tests mature to certifying
mastery for students who learn and excel anywhere and at anytime. It is a
myth that current substantive standardized tests (that do not let students
report what they trust they know or can do) can “pin point exactly what a student
knows and needs to learn”.
- - - -
- - - - - - - - - - - - - - - - -
Free
software to help you and your students experience and understand how to break
out of traditional-multiple choice (TMC) and into Knowledge and
Judgment Scoring (KJS) (tricycle to bicycle):