Nebraska is
now poised for a fifth star. It has a standardized assessment that ranks, that
measures, that produces meaningful student test scores, and is deeply
influenced by the judgment of experienced classroom teachers. Nebraska is the
first state I have found that has transparently documented that it is this close to an accurate, honest, and fair
critical features can rate the standardized tests used by state departments of
education over the past ten years. These tests have evolved from just ranking
students, teachers, and schools based solely on the judgment of
psychometritians to including the judgment of teachers.
A five star
rating would include the judgment of students to report what they know
accurately, honestly, and fairly instead of guessing at the best answer to each
multiple-choice question.
A test can
earn three stars based on the judgment of psychometritians, one star on the
judgment of teachers, and one star on the judgment of students. These are the
three main stakeholders in doing a standardized multiple-choice test.
There are
other stakeholders who make use of and market the test results. These secondary
stakeholders often do not market the true nature of the standardized test in
hand. Their claims may not match the test results.
Any standardized multiple-choice test earns one star. The norm-referenced test
compares one student with another. Raw test scores are plotted on a
distribution. A judgment is then made where to make the cut scores. Many
factors can be used in making this judgment. It can be purely statistical. It
can attempt to match historical data. It can be a set portion for passing or
failing. It can be whatever looks right. The cut score is generally marketed
with exaggerated importance.

The much-decried right count scored multiple-choice test performs best with
higher test scores than lower test scores. Right marks on tests scored below
60% are questionable. Test scored below 50% are as much a product of luck on
test day as they are of student ability. We can know what students do not know. Psychometricians like test
scores near 50% as they lend stability to the test data. Nebraska designed its
test for an average test score of 65% plus questions needed to cover the
blueprint requirements for a criterion-referenced test. The Nebraska
standardized 2010 Grade 3 Reading test produced an average score of 72%.
Nebraska can know what students do know
about 3/4 of the time: Three stars.
FOUR STARS: Nebraska
earns a fourth star of including teacher judgment in writing questions, in
reviewing questions, and in setting the criterion-referenced standards. The
three regions (below, meets, and exceeds standards) have meaning beyond purely
statistical relationships. It was teacher judgment that moved the test design
from an average score of 50% to 72%. The scores now look very much like those
produced by any good classroom test. They can be interpreted and used in the
same way.
FIVE STARS: Nebraska has
yet to earn a fifth star. That requires student judgment to be included in the
assessment system. When that is done, Nebraska will have an accurate, honest,
and fair test that also meets the requirements of the Common Core State Standards.
Most right marks will also represent right answers instead of luck on test day (less churning of individual test scores from year to year). The level of thinking used by students on the test and in the classroom can also be obtained. All that is needed is giving students the option to continue guessing or to report what they trust they know.
* Mark every question even if you must guess. Your judgment of what you know and can do (what is meaningful, useful, and empowering) has no value.
** Only mark
to report what you trust you know or can do. Your judgment and what you know
have equal value (an accurate, honest, and fair assessment).
student judgment will add student development (the ability to use all levels of
thinking) to the Nebraska test. Students need to know and do, but also
who have experienced judgment in applying knowledge and skills in situations different from those in which they learned.
Routine use
of quantity and quality scoring in the classroom (be it multiple-choice, short
answer, essay, project, or report) promotes student develop. It promotes the
sense of responsibility and reward needed to learn at all levels of thinking
(passive pupils become active self-correcting learners). IMHO if students fail to develop this sense of responsibility the Common Core State Standards movement will also fail
Software to
do quantity and quality scoring has been available for over two decades.
Nebraska is already using Winsteps. Winsteps contains the partial credit Rasch model
routine that scores quantity and quality.
Power Up
Plus (PUP) scores multiple-choice tests by both methods: traditional right count
scoring and Knowledge and Judgment Scoring.
Students can elect which method they are most comfortable with in the classroom
and in preparation for standardized tests.
Starting in
2005, Knowledge Factor has a patented learning system that guarantees student development. High
quality students generally pass standardized tests. All three programs promote
the sense of responsibility and reward needed to learn at all levels of
thinking, a requirement to meet the Common Core State Standards.
Newsletter (posted 23 OCT 2012):
