It is time for psychometricians, teachers, and students to
get on the same track with the same unit of measurement (not motorcycles,
bicycles, and tricycles). Psychometricians have been top dog, feared, secretive
and their judgment unquestioned. Teachers have worked hard, but to my current
knowledge, only in a case like Nebraska has their judgment made a meaningful
improvement in test results. Students have been treated as inanimate commercial
commodities.
Optimum test results can only be obtained when the playing
field is leveled for all three stakeholders. It is currently optimized from the
view of psychometricians who have been strongly influenced, at times, by
political power, and more often silenced by golden handcuffs. The “anomalies”
that have become public and then retracted (more than once in Florida) show us
the fruits of one-stakeholder rule in student performance assessment.
And now we have the Common Core State Standards tests.
Students would like an honest, accurate and fair test. Teachers and students
would like to know what each student knows and can do and what each one has yet
to learn. Psychometricians would like highly reproducible test results, which
do not require (present the opportunity for) equating test results (exposing
error in selecting test items of equal difficulty) from year to year, but do
present the appearance of equal difficulty.
And then we have the secondary level stakeholders who demand
(and who fund with millions of dollars) the test results, only be in the form
of a ranking, that shows improvement each year. They also want to do this at
the lowest cost. To date the secondary level stakeholders have held the field.
Why things are as they are is then not too difficult to understand
if you ignore the marketing that often overstates what is actually being done.
Assessments carried out as forced
activities cannot produce a valid indicator of what students actually know and
can do. Such tests can produce a valid statistical ranking for satisfying a
state or federal law. And that is why and how the tests have been funded.
The Common Core State Standards movement suggests that the
judgment of all three primary stakeholders is included and respected. No one
party is to triumph over or manipulate the other two parties. This demands some
changes in the way they interact.
Students should be given the option of exercising their
judgment in responding to test elements. This is inherent in classroom folders.
It is also present when students have the option to respond to 5 essay items
out of 7 to 10 suggested on a test. An in the alternative form of
multiple-choice (quantity and quality scoring) students select questions to
report, what in their judgment, they trust they know or can do.
Teachers should be given the option of exercising their
judgment in writing test items that provide insight into what students are
learning from what they are teaching. This includes both subject matter and
skills, and student development. Teachers should be able to report, based on
their judgment, which group each student best fits such as below, meets, and
exceeds standards, as in Nebraska. Taken together, these inputs capture in
numbers the climate of the classroom.
Psychometricians must respect the needs of the other two
stakeholders. The oversimplification of data collection and data reduction to
obtain the highest possible (but questionable) test reliability needs to become
a part of the history of a natural experiment (NCLB) that has gone on too long.
What works nicely in the safety of the research laboratory cannot be directly
applied to individual student performances and obtain meaningful results (other
than a ranking).
IMHO the Common Core State Standards movement demands the
inclusion of more of the classroom climate (instruction, learning, feedback)
than what forced test student performances yield. The student must be given the
option to report what is meaningful, useful and empowering. The mechanics are
simple for the student: know and don’t know; can or can’t do. Mark an option,
select a question, or perform a task when in your mind you can trust what you
are doing (and that this can be used as the basis for further learning and
instruction).
Students want to succeed. Teachers want them to succeed.
Psychometricians need to capture what students and teachers have accomplished
by letting students report knowledge, skills, and judgment. Quantity and
quality scoring captures all three. Forced performances capture only part of
knowledge and skills.
This has been a long introduction to three charts that
summarize the psychometrician’s view of a standardized test. The first view is
the result of over simplifying the classroom environment. Only right marks are
counted on multiple-choice tests, or right stuff (generally restricted to
rubrics) is counted on other forms of assessment. A raw score distribution is
divided into three to five parts with cut scores. This is purely a statistical
concept that works with any sample of anything. Once you have it in hand, the
next job is to ascribe meaning to it based on each psychometrician’s judgment.
The data from Alaska indicate that about 1/4 of the time students of equal
abilities switch categories from year to year. This is a sizable measurement
error related to right mark scoring.
The second view includes teacher judgment (see Nebraska
posts). The single distribution is now teased apart into three. The average
test score is no longer 50% but near 70%. The three score regions (below,
meets, and exceeds standards) now have meaning based on teacher judgment
(standard deviation of 20%, for example).
The third view includes student judgment to report what is
actually known and can be done that is the trusted basis for further learning
and instruction. This is what the Common Core State Standards movement states
is now needed. This chart is speculative. I have no actual data for it. I do
know from working with over 3000 students that the portion of a test score
distribution below 50% almost vanishes with quantity and quality scoring. Also
the variation (the standard deviation) is lower, giving better separation of
students grouped by performance (standard deviation of 10%, for example).
The psychometrician’s view is simple, cheap and often
illusionary. The teacher’s view becomes more meaningful. The student’s view
completes a balanced assessment system.
In summary, the Common Core State Standards movement now
demands a far better test scoring and analysis than used in the past. In the
case of multiple-choice tests, the switch from right count scoring to quantity
and quality scoring only involves a change in test instructions that permit
each student to elect which method should be used to score the test (see prior
posts). The test then yields results that students, teachers and
psychometricians can, all together, agree looks right.
Software to do this has been in existence for over two
decades. Winsteps (partial credit Rasch model IRT) and Power Up Plus (Knowledge
and Judgment Scoring) are two examples. Winsteps has been a popular
program for state departments of education during the NCLB decade (they only
need to change test instructions to assess student judgment).
Power Up Plus (PUP)
is a classroom friendly program developed to provide students a means to
frequently report accurately, honestly, and fairly what they actually knew and
could do that was of value to themselves. They used the test results to guide further
learning. I used the test results to guide my instruction and their development
(passive pupil to self-correcting high achiever).
What all of this comes down to is an inversion of the
present hierarchy:
- Let students have the opportunity to earn a quality score of 80-90% regardless of the quantity score. Let students report what they really know and can do.
- Let teachers submit questions that have shown in the classroom to meaningfully group students by their understanding, ability, skill, and development. These are questions that measure something important: mastery, misconceptions, reasoning errors and etc. Also let teachers estimate student test performance (below, meet, and above standards) as a part of each standardized test.
- Let psychometricians do their best with counts that are based on real students and classrooms rather than conducting an academic game show. The current statistical concept for ranking students is IMHO an even less perfect match to the Common Core State Standards movement than to the NCLB standards.
This is one way to produce a balanced assessment system. The
standardized test items grow from all learning experiences. Students are free
to make an accurate, honest, and fair report. Psychometricians are free to
moderate a meaningful assessment process.
Please
encourage Nebraska to allow students to report
what they trust they know and what they trust they have yet to learn. Blog.
Petition. We need
to foster innovation wherever it may take hold.