Wednesday, December 21, 2011

True Score Diviner


The previous few posts have listed weaknesses in traditional multiple-choice right mark scoring (RMS). Other than a rank of increasingly questionable value as the test scores decrease, RMS results are seriously flawed for use in current formative assessments. Quality and quantity are still linked. They are not linked in projects, reports, and essay tests. Even on a failing project, there can still be a note, “Great use of color”; “Great idea, another bit of editing and a great paper”.

Knowledge and Judgment Scoring (KJS) does the same thing with multiple-choice tests, “You got a quality score of 90% on the test you select to mark. Now make the same preparation on more of the assignment and you will have a passing score. We know you can do it!”

RMS test scores are always suspect and often meaningless. The True Score Diviner can help you find your true score, or if your score is your true score, the range of test scores you may have gotten with the same preparation. 

At 100%, your test score and true score are one and the same. With a test score of 25%, on a 4-option question test, your true score could range from 25 - 25 or zero to 25 + 25 or 50%. Half of the time RMS cheats you and half of the time it teases or lies to you. You have a lucky day or an unlucky day. There is no way to know which or how much from a single test. Statistical procedures say very little about single events strongly related to luck. They can help if you took about five versions of the test and calculated an average test score. You do not do that.

Knowledge and Judgment Scoring (KJS) solves this problem by letting you report what you know and trust. You are, in effect, scoring your own test based on your own preparation. Each student gets a customized test. Guessing is not required.

Now both student and teacher know what every high quality student knows and can do that can be trusted as the basis for further instruction, learning, and application regardless of the test score. Quantity and quality each generate separate scores.

We need to promote Knowledge and Judgment Scoring (KJS). Power Up Plus (PUP) does this by offering students both RMS and KJS. They make the switch when they have matured enough in a supportive classroom that places equal emphasis on knowing and on the skills required by the successful independent achiever. I know. I don’t know. I know how to know.

RMS today makes as much sense as selling gasoline at $3 per gallon from a pump that averages one gallon for each $3. It may deliver less than ½ gallon to over two gallons for each $3. But it does deliver an average of $3 per gallon if you sum all the customers for the day. That is a range of less than $1.50 to over $6.00 per gallon. Such a situation in academic measurement still goes, for the most part, unquestioned.

Wednesday, December 14, 2011

Is Student Debriefing Hacking?


“How did the test go?”  “Fine.” This common exchange is heard after every standardized test. It does not disclose the content of the test, the questions on the test, or the score. It is more wish than fact. It reveals nothing that is meaningful to student and teacher; a frequent end result of NCLB standardized testing.

The current trend in revising the Elementary and Secondary Education Act (ESEA) is to add tests within the course to the final test. This is promoted as formative testing. Unfortunately formative testing requires timely feedback. Computers can provide non-judgmental timely feedback. This gave rise to the Educational Software Cooperative, Inc, non-profit. Learning at higher levels of thinking (question/answers/verify) provides effective self-motivating feedback. A standardized test that only returns a test score several weeks later has little if any formative testing content.

The new within-course tests are actually an expansion of predatory testing. Predatory testing crowds out instructional/learning time. It unfortunately encourages lengthy test preparation at lower levels of thinking by the very schools that most need higher levels of thinking instructional/learning time. It encourages a short-term fix rather than a long-term solution (rote over understand).

The classroom teacher has several options:
  1. Devote little, if any, time to test preparation. Conduct the classroom in such a manner that the standardized test is, as knowledgeable students put it, “No big deal.”
  2. Prepare students to take the test at higher levels of thinking by using Knowledge and Judgment Scoring (KJS) on projects and classroom essay and multiple-choice tests.
  3. Continue lengthy test preparation at lower levels of thinking (which in my opinion should be outlawed; recognized as a trait of incompetent school administration).

One way of making ESEA standardized tests function as formative assessments is to debrief students shortly after the test. High scoring classes can do this very informally for the first teacher option above.

Less successful classes, at higher levels of thinking, can collect the topics students find puzzling. High quality students have good judgment in determining what they know and what they have yet to learn.

At lower levels of thinking, students and teachers are most interested in the right answer for each question: A or B or C or D. Debriefing at this level, in my opinion, is as meaningless as reading off the answers to an in-class test.

Each of the above levels penetrates closer to the actual question stem and answer options. The concept of “fair use”, when applied to standardized test questions, requires that whatever is done, it must not reduce the market value of the test. It must not be for profit. It must only be of benefit to the participating students. The actual test questions must not be discussed. They must remain secret. Debriefing is then restricted to a one-time affair. Debriefing is of decreasing value to students performing at higher levels of thinking down to lower levels of thinking.

Student debriefing is hacking:

  1. It is a violation of copyright. (Fair use of copyrighted material does not include disclosing or direct copying of a standardized test question. A standardized test question is used to make comparative assessments [the common items must be protected]. By its very nature, it must be kept secret or its market value is affected. What portion can be copied or referenced is open to interpretation*.)
  2. It promotes the sale of test question answers. (Informal and higher levels of thinking debriefing do not require the exact question stem nor the question answers. Any attempt to recall exact question stems and answers is of limited use as good standardized tests scramble the answer options, edit the question stems, and replace a portion of the questions between each test. Computer adaptive tests [CAT] do much of this during each student application – no two students even get the same test.)

Student debriefing is not hacking:

  1. It makes a formative assessment out of predatory testing.
  2. Debriefing with a test company provided summary lesson plan, listing topics with model test questions, would not be hacking.  For a test of 30 questions covering 6 topics, the 6 topics could be listed with a model question for each topic.  The model questions could be ones released from past tests. In-class scoring of this summary test would provide immediate feedback for students and teachers. This formative assessment lesson plan would increase the test’s market value.

*At one extreme, the Georgia Professional Standards Commission bands any mention, reference to, or discussion of test questions. Students take the test and close the booklets. The closed booklets are collected and returned.

At the other extreme, parents of students who have learning problems can view the test booklets. This is justified as “fair use” as it provides parents some idea of what the student should have been able to do.  It is of help in educating the student. It is not for profit. This one time use applies to no one other than to the parent/school/student relationship. It is therefore not a breach of security.

Wednesday, December 7, 2011

Is Wallpapering Hacking?

Hacking, in the beginning, was an honorable tradition of learning how to control and use a computer, for something useful, without having access to machine and language manuals. It was playing (question/answers/verify; just as is done in putting a puzzle together). It was pioneering. It was empowering. It was fun. Over time, “hacking” became all of the above, but with malice intent. A few bad apples tarnished the image of the bright and the bold.

Dumb wallpapering, marking the same option, “C” for example, if you do not know, does not improve test results or student scores. Smart wallpapering, creating a unique answer pattern PRIOR to seeing the test, yields improved KJS results. It can rather uniformly alter student scores.


When the wallpaper contains a right answer, everyone who uses the wallpaper mark gets a right answer. This holds for low quality and high quality students. This is fair. The class, the team, wins or loses together. This is the same level that standardized Dumb Testing data makes sense in ranking classes and schools.

Wallpapering reduces test stress by reducing the time and effort wasted on trying to find the “best answer” to a question you cannot read or understand, let along have nothing in mind for an answer.

Wallpapering is hacking:

  1. It restricts a wrong mark to one option per question. (The mathematical model for Dumb Testing assumes that a student randomly marks wrong answers. This is not true. The model also assigns the starting test value to zero. This is not true. On a 4-option question test, the starting value is 25%, on average.)
  2. Students are acting in collusion. (It makes no difference if individual students decide before the test or during the test what option to make for a forced mark. Wallpapering requires the selection to be made BEFORE seeing the test.)

Wallpapering is not hacking:

  1. It only formalizes the advice students have been given for decades: “Mark ‘C’ if you cannot select a ‘best’ answer”.
  2. It does not change Dumb Testing standardized test scores.