Wednesday, October 24, 2012

Nebraska Student Assessment Four Star Rating

Nebraska is now poised for a fifth star. It has a standardized assessment that ranks, that measures, that produces meaningful student test scores, and is deeply influenced by the judgment of experienced classroom teachers. Nebraska is the first state I have found that has transparently documented that it is this close to an accurate, honest, and fair assessment.

Five critical features can rate the standardized tests used by state departments of education over the past ten years. These tests have evolved from just ranking students, teachers, and schools based solely on the judgment of psychometritians to including the judgment of teachers. 

A five star rating would include the judgment of students to report what they know accurately, honestly, and fairly instead of guessing at the best answer to each multiple-choice question.

A test can earn three stars based on the judgment of psychometritians, one star on the judgment of teachers, and one star on the judgment of students. These are the three main stakeholders in doing a standardized multiple-choice test.

There are other stakeholders who make use of and market the test results. These secondary stakeholders often do not market the true nature of the standardized test in hand. Their claims may not match the test results.

ONE STAR: Any standardized multiple-choice test earns one star. The norm-referenced test compares one student with another. Raw test scores are plotted on a distribution. A judgment is then made where to make the cut scores. Many factors can be used in making this judgment. It can be purely statistical. It can attempt to match historical data. It can be a set portion for passing or failing. It can be whatever looks right. The cut score is generally marketed with exaggerated importance.

TWO STARS: A criterion-referenced test earns two stars. This test contains questions that measure what needs to be measured. It does not compare one student with another. It groups students with comparable abilities. Nebraska uses below standard, meets standard, and exceeds standard. This divides the score distribution into three regions. Cut scores fall at the point a student has an equal chance of falling into either region. The messy nature of measuring student knowledge, skill, and judgment is transparent. Passing is preparing to meet the standard set for the median of the meets standard region, not just preparing to be just one point above the cut score.

THREE STARS: The much-decried right count scored multiple-choice test performs best with higher test scores than lower test scores. Right marks on tests scored below 60% are questionable. Test scored below 50% are as much a product of luck on test day as they are of student ability. We can know what students do not know. Psychometricians like test scores near 50% as they lend stability to the test data. Nebraska designed its test for an average test score of 65% plus questions needed to cover the blueprint requirements for a criterion-referenced test. The Nebraska standardized 2010 Grade 3 Reading test produced an average score of 72%. Nebraska can know what students do know about 3/4 of the time: Three stars.

FOUR STARS: Nebraska earns a fourth star of including teacher judgment in writing questions, in reviewing questions, and in setting the criterion-referenced standards. The three regions (below, meets, and exceeds standards) have meaning beyond purely statistical relationships. It was teacher judgment that moved the test design from an average score of 50% to 72%. The scores now look very much like those produced by any good classroom test. They can be interpreted and used in the same way.

FIVE STARS: Nebraska has yet to earn a fifth star. That requires student judgment to be included in the assessment system. When that is done, Nebraska will have an accurate, honest, and fair test that also meets the requirements of the Common Core State Standards.

Most right marks will also represent right answers instead of luck on test day (less churning of individual test scores from year to year). The level of thinking used by students on the test and in the classroom can also be obtained. All that is needed is giving students the option to continue guessing or to report what they trust they know.

*   Mark every question even if you must guess. Your judgment of what you know and can do (what is meaningful, useful, and empowering) has no value.
** Only mark to report what you trust you know or can do. Your judgment and what you know have equal value (an accurate, honest, and fair assessment).

Including student judgment will add student development (the ability to use all levels of thinking) to the Nebraska test. Students need to know and do, but also who have experienced judgment in applying knowledge and skills in situations different from those in which they learned.

Routine use of quantity and quality scoring in the classroom (be it multiple-choice, short answer, essay, project, or report) promotes student develop. It promotes the sense of responsibility and reward needed to learn at all levels of thinking (passive pupils become active self-correcting learners). IMHO if students fail to develop this sense of responsibility the Common Core State Standards movement will also fail

Software to do quantity and quality scoring has been available for over two decades. Nebraska is already using Winsteps. Winsteps contains the partial credit Rasch model routine that scores quantity and quality. 

Power Up Plus (PUP) scores multiple-choice tests by both methods: traditional right count scoring and Knowledge and Judgment Scoring. Students can elect which method they are most comfortable with in the classroom and in preparation for standardized tests.

Starting in 2005, Knowledge Factor has a patented learning system that guarantees student development. High quality students generally pass standardized tests. All three programs promote the sense of responsibility and reward needed to learn at all levels of thinking, a requirement to meet the Common Core State Standards.

