Two consortia (PARCC
and SBAC) are working again on
tests that are beyond simple questions that can be answered at all levels of
thinking. The questions will go through the usual calibration, equating, and
bias free processes. And, to the best of my knowledge, they will continue to be
right count scored, at the lowest levels of thinking.
Trying to assess 21st century skills (bicycling)
with the same old tricycles (forced choice tests) seems rather strange to me.
And more so when the test is to assess college and job preparedness. These
tests are to do more than create a ranked scale on which a predetermined
portion will pass or fail as has been used in past years. These tests are
supposed to actually measure something about students rather than produce just
a ranked performance on a test.
Trying to raise the level of thinking required on a test in
the beginning of NCLB resulted in a lot of very clever questions. I have no
idea if one could actually figure out why or how students answered the
questions in respect to why they were on the test. On a forced choice test you
just mark. On a quantity and quality scored test student responses fall into Expected, Guessing,
Misconception, and Discriminating because students only mark when they
trust they do know or can do – an accurate, honest and fair test is obtained with
no forced guessing required.
Higher levels (orders) of thinking involve metacognition: the
ability to think about one’s own thinking, the ability to question one’s own
work, and the ability to be self-correcting. These abilities are assessed with
quantity and quality scoring of multiple-choice tests. The quality score
indicates the degree of success each student has in developing these abilities
(when learning and when testing). The quantity score measures the degree of
mastery of knowledge and related skills. It is not that they know the answer
but that they have developed the sense of responsibility to function at all
levels of thinking and can therefore figure out the answers.
May own experience has been that students learn metacognitive
test taking skills quickly when routinely assessed with a test scored for
knowledge and judgment. (Over 90% voluntarily switched from guessing at answers
[traditional right count scoring] to quantity and quality scoring after two
experiences with both). It took more than three to four times as long for them
to apply these skills to learning: to reading or observing, with questions; to
building a set of relationships that permitted them to verify that they could
trust what they knew; to apply their knowledge and skill to answering questions
they had not seen before.
The two consortia have to make a choice between beefing up
traditional forced-choice multiple-choice tests or simply changing the test
instructions so student can, continue with multiple-guess, or switch to
reporting what they trust they know (quantity and quality scoring). I am not
convinced that beefing up traditional forced-choice questions will produce the sought
after results. The new questions must still be guarded against guessing as
students are still forced to guess. The guessing problem is solved by letting
students report what they trust they know using quantity and quality scoring –
no guessing required.
Two sample
items from SBAC show how attempts are being made to improve test items. A
careful examination indicates that again, we are facing clever marketing.
“Which model below best represents the fraction 2/5?”
“Even if students don’t truly have a deep understanding of
what two-fifths means, they are likely to choose Option B over the others
because it looks like a more traditional way of representing fractions.
Restructuring this problem into a multipart item offers a clearer sense of how
deeply a student understands the concept of two-fifths.”
The word “best” is a red flag. Test instructions often read,
“Mark the best answer for each item.” It means: Guess whenever you do not know,
do not leave an item unmarked. Your test score is a combination of what you
know and your luck on test day. Low ability students and test designers are
well aware of this as they plan for each test.
“Best” in the item stem is also a traditional lazy way of
asking a question. A better wording would be “is the simplest representation
of”. There would then be just one right answer for the right reason: “the
simplest representation” rather than “a more traditional way of representing”. Marketing.
I agree that the item needs to be restructured or edited.
“For numbers
1a-1d, state whether or not each figure has 2/5 of its whole shaded.”
“This item is more complex because students now have to look
at each part separately and decide whether two-fifths can take different forms.
The total number of ways to respond to this item is 16. ‘Guessing’ the correct
combination of responses is much less likely than for a traditional four-option
selected-response item.”
The comment states that students must now “look at each part
separately and decide” each of four yes/no answers. The item may be more
complex to create with four answers but the answering is simpler for the
student. Marketing.
Grouping four yes/no answers together to avoid the chance
score of 50% is clever. The 2x2x2x2 (16) ways would become 3x3x3x3 (81) ways
using quantity and quality scoring (if students were to mark at the lowest
levels of thinking)! The catch here is that the possible ways and the
probability of those ways are not the same thing. It is the functional ways,
the number of ways that draw at least 5% of the marks that matters. If only
four ways were functional on the test, then all of the above reduces down to a
normal four-option item. Scoring the test for quantity and quality eliminates
the entire issue as forced-guessing is not required when students have the
opportunity to report what they trust accurately, honestly, and fairly. If you
do not force students to guess, you do not need to protect test results from
guessing.
As I understand how this item will be scored, it condenses
four items into one for a reason that is not entirely valid: guessing control.
The statement that “students now have to look at each part separately” is
presented in such a way that it implies they would not “have to look at each
part separately” on the first example. Marketing again. Since there is no way
to predict how an item will perform, we need actual test data to support the
claims being made.
These two examples are not unique in striving for higher
levels of thinking assessment by combining two or more simple items into a more
complex item. I dearly love the TAKS question that Maria Luisa Cesar included in her San Antonio
Express-News article, 4 December 2011, 1B and 3B. Two simple questions along the line
of, “Is this figure: A) a hexagon, B) an octagon, C) a square, D) a rectangle" have been combined.
I was faced with this question on my first day in school
with, “Color each of the six circles with the correct color.” I did not know my
colors. I had six circles and six crayons. I lined up the crayons of the left
side of my desk. After coloring a bit of each circle with a crayon, I put it on
the right side of my desk. I had colored each circle with the correct color.
The same reasoning would get a correct answer here without
knowing anything about hexagons or octagons: the figures are not the same. That
leaves 7 sides and 5 vertices. Seven sides is not correct. So 5 vertices must
be correct, whatever a “vertice” is.
The STAAR question figures are composed of vertices (4, 6,
5, 6), faces (5, 6, 5, 4), and edges (5, 9, 8, 9). A simple count of each
yields a match only with option C. No knowledge of the geometric figures is
required at the lowest levels of thinking.
The problem here is that the question author was thinking
like a normal adult teacher. It took me a couple of years using quantity and
quality scoring (PUP) to make sense of
the thinking students use when faced with a test question. I divided the sources of information that students used into two parts. One part is what students
learned by just living: robins have red breasts and blue eggs. The other part
is what they have learned in a reasoned, rational manner. These are roughly
lower and higher levels of thinking, recall and formal, or passive and active
learning.
On top of this is the human behavior of acting on what one
believes rather than on what one knows. Here we are at the source of
misconceptions that are very difficult to correct in most students and adults
(teachers and teacher advocates have a pathological bias against
free-enterprise when at the same time it generates the funds for their employment [and solves problems the educational bureaucracy fails to solve]. They also have an inability to relearn to use a multiple-choice test to assess what students
actually know rather than using it to just rank students).
In summary, improving assessment by taking the old tricycle
and adding dual wheels with deeper treat (multitasking and multiple part items)
is really not enough. It is time to move on to the bicycle where the student is
free to report what is trusted as the basis for further learning and
instruction (spontaneous student judgment replaces that passive third wheel –
waiting for the teacher to perform and correct).
And even more important is to
create the environment in which students acquire the sense of responsibility
needed to learn at higher levels of thinking. Scoring classroom tests for
knowledge and judgment (PUP and Partial Credit Rasch Model) does
this: it promotes student development, as well as, knowledge and skill. Only
when struggling students actually see and can believe they are receiving credit
for knowing what they know rather than for their luck on test day, have I seem
them change study and test taking habits.
Kaitlyn Steigler sums it up nicely in an article by Jane
Roberts, “It used to be, I do, we do together, now you do.” “Now, the kids
will take charge. The teaching will be based on what we figure they know or
don’t know.” PUP scores multiple-choice
tests both ways, so students can switch to reporting what they trust when they
are ready. Then self-correcting students, as well as, their teachers will know what they know when they are learning, during the test, and as the basis for further learning.
No comments:
Post a Comment