tag:blogger.com,1999:blog-66767249967714682672024-03-05T14:10:43.954-08:00Multiple-Choice RebornThis blog explores the advantages and disadvantages of classical right mark scoring (RMS), Knowledge and Judgment Scoring (KJS) and Confidence Based Learning (CBL) when used to set grades (cut points) and to promote student and employee development.Richard Harthttp://www.blogger.com/profile/04962997526156185761noreply@blogger.comBlogger83125tag:blogger.com,1999:blog-6676724996771468267.post-3972665742689540672018-05-11T07:03:00.000-07:002018-05-11T07:03:48.911-07:00Student-Centered Learning<div class="MsoNormal">
I have spent over 40 years preaching the need for schools
designed for success rather than for failure. Yesterday I happened upon an
article by <a href="http://www.google.com/search?q=don+fix+high+schools,+transform+them">Nicholas
Donohue</a> that presents convincing evidence that that is being done by
transforming high schools in the New England states. It is call <a href="http://www.google.com/search?q=wikipedia+student-centered+learning">student-centered
learning</a>.</div>
<div class="MsoNormal">
My attempt in 1981-1989 used a campus computer system at <a href="http://www.nwmissouri.edu/">NWMSU</a>, textbook, lecture, laboratory, AND
voluntary student presentations, research, and projects. This work has been
further developed in <a href="http://richard-hart.blogspot.com/2009/10/multiple-choice-bubbling.html">Multiple-Choice
Reborn</a> and summarized in <a href="http://knowledgeandjudgmentscoring.blogspot.com/2016/03/copy-detector-kjs.html">Knowledge
and Judgment Scoring - 2016</a>. In 1995, Knowledge Factor patented an online confidence
based learning system (now in <a href="http://knowledgefactor.com/">amplifier</a>).
<a href="http://www.google.com/search?q=wikipedia+Polytomous+Rasch+model">Masters</a>,
1982, developed <a href="http://www.richard-hart.blogspot.com/2010/11/rasch-model-irt-demystified.html">Rasch</a>
partial credit scoring (PCS).</div>
<div class="MsoNormal">
All three put the student in the position of being in charge
of learning and reporting; at all levels of thinking. They approached
evaluating an apple from the skin, as traditional multiple-choice (guess) testing
is done.</div>
<div class="MsoNormal">
PCS just polished the apple skin. The emphasis was still on
the surface, the score, at that time. <a href="http://knowledgefactor.com/">Knowledge
Factor</a> made the transition from the concrete level of thinking to understanding
(skin to core), and provided the meat between in amplifier. Nuclear power plant
operators and doctors were held to a much higher responsibility (self-judgment)
standard (far over 75%, over 90% mastery) than is customary in a traditional
high school classroom (60% for passing).</div>
<div class="MsoNormal">
My students voted to give knowledge and judgment equal value
(1:1 or 50%:50%). Voluntary activities replaced one letter grade (10% each).
The students were then responsible for reporting what they knew or could do.
They could mix several ways of learning and reporting.</div>
<div class="MsoNormal">
A student with a knowledge score of 50% and a quality score
of 100% would end up with about the same test score as a student who marked
every question (guessed) for a quality, quantity, and test score of 75% (with
no judgment).</div>
<div class="MsoNormal">
These two students are very different. One is at the core of
being educated (scholar). The other is only viewing the skin (tourist). The
first one has a solid basis for self-instruction and further learning; is ready
for independent scholarship. The apple seeds germinate (raise new questions) and
produce more fruit (without the tree). </div>
<div class="MsoNormal">
We know much less about the second student, and about what
must be “re-taught”. The apple may just be left on the tree in what is often a
vain effort to ripen it. Such is the fate of students in schools designed for
failure (grades A to F).</div>
<div class="MsoNormal">
In extreme cases, courses are classified by difficulty or
assigned PASS/FAIL grades. My General Biology students were even “protected” so
I could not know which student was in the course for a grade or pass/fail.</div>
<div class="MsoNormal">
Students assess the level of thinking required in a course by
asking on the first day, “Are your tests cumulative?” If so, they leave. This
is a voluntary choice to stay at the lowest levels of thinking. Memory care residents
do not have that choice.</div>
<div class="MsoNormal">
There is a frightening parallel between creating a happy
environment for memory care residents here at <a href="http://provisionliving.com/locations/columbia/">Provision Living</a> at
Columbia, and creating an academic environment (national, state, school, and
classroom) that yields a happy student course grade. Both end up at the end of
the day pretty much where they started, at the lowest levels of thinking. </div>
<div class="MsoNormal">
Many students made the transition from memorizing nonsense
for the next test to questioning, answering, and verifying; learning for
themselves and knowing they were “right”. This is self-empowering. They started
getting better grades in all of their courses. They had experienced the joy of
scholarship, an intrinsic reward. “I do know what I know.” The independent quality
score in knowledge and judgment scoring directed their path.</div>
<div class="MsoNormal">
Student centered learning is not new. The title is. This is
important in marketing to institutionalized education. What is new is that at
last entire high schools are now being transformed for the right reason:
student development rather than standardized test scores based on lower levels
of thinking instruction and testing.<span style="mso-spacerun: yes;">
</span></div>
<div class="MsoNormal">
These students should be ready for college or other post
high school programs. They should not be the under-prepared college students we
worked with. The General Biology course was to last for only a few years; until
the high schools did all of this work. In practice, the course became
permanent. Biology did not became a required course in all high schools.</div>
<div class="MsoNormal">
My interest in this project was to find a way to know what
each student really knew, believed, could do, and was interested in, when a new
science building was constructed in 1980 with 120 seat lecture halls. The unexpected
consequence of promoting student development, based on the independent quality and
quantity scores, was not only a bonus but appropriately needed for
under-prepared college students. Over 90% of students voluntarily switched from
guessing right answers to reporting what they actually knew and could do.</div>
<div class="MsoNormal">
<br /></div>
<!--[if gte mso 9]><xml>
<o:OfficeDocumentSettings>
<o:AllowPNG/>
</o:OfficeDocumentSettings>
</xml><![endif]--><!--[if gte mso 9]><xml>
<w:WordDocument>
<w:Zoom>0</w:Zoom>
<w:TrackMoves>false</w:TrackMoves>
<w:TrackFormatting/>
<w:PunctuationKerning/>
<w:DrawingGridHorizontalSpacing>18 pt</w:DrawingGridHorizontalSpacing>
<w:DrawingGridVerticalSpacing>18 pt</w:DrawingGridVerticalSpacing>
<w:DisplayHorizontalDrawingGridEvery>0</w:DisplayHorizontalDrawingGridEvery>
<w:DisplayVerticalDrawingGridEvery>0</w:DisplayVerticalDrawingGridEvery>
<w:ValidateAgainstSchemas/>
<w:SaveIfXMLInvalid>false</w:SaveIfXMLInvalid>
<w:IgnoreMixedContent>false</w:IgnoreMixedContent>
<w:AlwaysShowPlaceholderText>false</w:AlwaysShowPlaceholderText>
<w:Compatibility>
<w:BreakWrappedTables/>
<w:DontGrowAutofit/>
<w:DontAutofitConstrainedTables/>
<w:DontVertAlignInTxbx/>
</w:Compatibility>
</w:WordDocument>
</xml><![endif]--><!--[if gte mso 9]><xml>
<w:LatentStyles DefLockedState="false" LatentStyleCount="276">
</w:LatentStyles>
</xml><![endif]-->
<!--[if gte mso 10]>
<style>
/* Style Definitions */
table.MsoNormalTable
{mso-style-name:"Table Normal";
mso-tstyle-rowband-size:0;
mso-tstyle-colband-size:0;
mso-style-noshow:yes;
mso-style-parent:"";
mso-padding-alt:0in 5.4pt 0in 5.4pt;
mso-para-margin-top:0in;
mso-para-margin-right:0in;
mso-para-margin-bottom:10.0pt;
mso-para-margin-left:0in;
mso-pagination:widow-orphan;
font-size:12.0pt;
font-family:"Times New Roman";
mso-ascii-font-family:Cambria;
mso-ascii-theme-font:minor-latin;
mso-fareast-font-family:"Times New Roman";
mso-fareast-theme-font:minor-fareast;
mso-hansi-font-family:Cambria;
mso-hansi-theme-font:minor-latin;}
</style>
<![endif]-->
<!--StartFragment-->
<!--EndFragment--><br />
<div class="MsoNormal">
<br /></div>
Richard Harthttp://www.blogger.com/profile/04962997526156185761noreply@blogger.com0tag:blogger.com,1999:blog-6676724996771468267.post-22553975045405309782015-05-13T03:00:00.000-07:002015-05-13T03:00:07.911-07:00Information and Reliability<div>
#15</div>
<div>
<div class="MsoNormal">
How does IRT information replace CTT reliability? Can this
be found on the audit tool (Table 45)?</div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
This post relates my audit tool, Table 45, Comparison of
Conditional Error of Measurement between Normal [CTT] Classroom Calculation and
the IRT Model to a quote from <a href="http://en.wikipedia.org/wiki/item_response_theory">Wikipedia</a>
(Information). I am confident that the math is correct. I need to
clarify the concepts for which the math is making estimates. </div>
<div class="MsoNormal">
<br /></div>
<table cellpadding="0" cellspacing="0" class="tr-caption-container" style="float: right; text-align: right;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg3bGV-MJDSUquRiQedQ2WExtjviIQXmXcX33VNoxfRoCAuddwMJd7jaPAfK_gNWQipdH3sJfGQCVwhb1fqNIUu4TY0t2pfcLg45goya2O6T8mqjbHl5i701A-1n2ZUN2ahO6vXON9A_xc/s1600/Table+45.jpg" imageanchor="1" style="clear: right; margin-bottom: 1em; margin-left: auto; margin-right: auto;"><img border="0" height="200" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg3bGV-MJDSUquRiQedQ2WExtjviIQXmXcX33VNoxfRoCAuddwMJd7jaPAfK_gNWQipdH3sJfGQCVwhb1fqNIUu4TY0t2pfcLg45goya2O6T8mqjbHl5i701A-1n2ZUN2ahO6vXON9A_xc/s200/Table+45.jpg" width="164" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">Table 45</td></tr>
</tbody></table>
<div class="MsoNormal">
<span style="background-color: #ffe599;">“One of the major contributions of item response theory is
the extension of the concept of reliability. Traditionally, reliability refers
to the precision of measurement (i.e., the degree to which measurement is free
of error). And traditionally, it is measured using a single index defined in various
ways, such as the ratio of true and observed score variance.”</span></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
See test reliability (a ratio), KR20, True/Total Variance,
0.29 (Table 45a). </div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<span style="background-color: #ffe599;">“This index is helpful in characterizing a test’s average
reliability, for example in order to compare two tests.”</span></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
The test reliability for CTT and IRT are also comparable on
Table 45a and 45c, 0.29 and 0.27.</div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<span style="background-color: #ffe599;">“But IRT makes it clear that precision is not uniform across
the entire range of test scores. Scores at the edges of the test’s range, for example,
generally have more error associated with them than scores closer to the middle
of the range.”</span></div>
<div class="MsoNormal">
<br /></div>
<table cellpadding="0" cellspacing="0" class="tr-caption-container" style="float: left; margin-right: 1em; text-align: left;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhZVCMzwIJqHEF_f3aOQETKvo_OYvROepi-VQ-_-rp-TGjGy3cBjXpNicbbs2SHJnIjGCp8uukh6WEcqhgHNXMnXQu3dp2iDEStQaT49nBNW1fJcIeab0IWwQGTGkI6N7DT42tm7suNHNw/s1600/Table+46.jpg" imageanchor="1" style="clear: left; margin-bottom: 1em; margin-left: auto; margin-right: auto;"><img border="0" height="123" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhZVCMzwIJqHEF_f3aOQETKvo_OYvROepi-VQ-_-rp-TGjGy3cBjXpNicbbs2SHJnIjGCp8uukh6WEcqhgHNXMnXQu3dp2iDEStQaT49nBNW1fJcIeab0IWwQGTGkI6N7DT42tm7suNHNw/s200/Table+46.jpg" width="200" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">Table 46</td></tr>
</tbody></table>
<table cellpadding="0" cellspacing="0" class="tr-caption-container" style="float: right; margin-left: 1em; text-align: right;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiy7dpQ0J376p9L8TsDUOnxXTa65KKi1xgmp5UbM4RL1LhQXkmKUbIhyphenhyphen6Wo6fingPpmLwuMG-KwmoQXcRa3N0Zr4U04nN-CKTZun3Z53eNte649iv6yM9gESCTvkflU_AiS4fBQ2T6wztE/s1600/Chart+82.jpg" imageanchor="1" style="clear: right; margin-bottom: 1em; margin-left: auto; margin-right: auto;"><img border="0" height="142" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiy7dpQ0J376p9L8TsDUOnxXTa65KKi1xgmp5UbM4RL1LhQXkmKUbIhyphenhyphen6Wo6fingPpmLwuMG-KwmoQXcRa3N0Zr4U04nN-CKTZun3Z53eNte649iv6yM9gESCTvkflU_AiS4fBQ2T6wztE/s200/Chart+82.jpg" width="200" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">Chart 82</td></tr>
</tbody></table>
<div class="MsoNormal">
See Table 45c (classroom data) and Table 46, col 9-10 (dummy
data). For CTT the values are inverted (Chart 82,
classroom data and Chart 89, dummy data).</div>
<div class="MsoNormal">
<br /></div>
<br />
<br />
<table cellpadding="0" cellspacing="0" class="tr-caption-container" style="float: left; margin-right: 1em; text-align: left;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhu7CIsrDrqz3tiERr5148HjwzR_h76peT7DwjPWUIHteSJGGZXCn_ADiJlpRe_ht0nHEzFzMyZJze9HQE32NF0-TFpR6CeuaYt_cnO-ehWS1j_YPQacrbO7kvbQyOwYrKHhpi5B7OMLzY/s1600/Chart+89.jpg" imageanchor="1" style="clear: left; margin-bottom: 1em; margin-left: auto; margin-right: auto;"><img border="0" height="138" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhu7CIsrDrqz3tiERr5148HjwzR_h76peT7DwjPWUIHteSJGGZXCn_ADiJlpRe_ht0nHEzFzMyZJze9HQE32NF0-TFpR6CeuaYt_cnO-ehWS1j_YPQacrbO7kvbQyOwYrKHhpi5B7OMLzY/s200/Chart+89.jpg" width="200" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">Chart 89</td></tr>
</tbody></table>
<div class="MsoNormal">
<span style="background-color: #ffe599;">“Item response theory advances the concept of item and test
information. Information is also a <i>function</i>
of the model parameters. For example, according to Fisher information theory,
the item information supplied in the case of the 1PL for dichotomous response
data is simply the probability of a correct response multiplied by the
probability of an incorrect response, or,
. . .” [I = pq]. </span></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
See Table 45c, p*q CELL INFORMATION (classroom data). Also
on Chart 89, the cell variance (CTT) and cell information (IRT) have identical
values (0.15) from Excel =VAR.P and from pq (Table 46, col 7, dummy data).</div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<span style="background-color: #ffe599;">“The standard error of estimation (SE) is the reciprocal of
the test information of at a given trait level, is the . . .” [1/SQRT(pq)].</span></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
Is the “test information … at a given trait level” the Score
Information (3.24, red, Chart 89, dummy data) for 17 right out of 21 items?
Then the reciprocal of 3.24 is 0.31, the error variance (green, Chart 89 and
Table 46, col 9) in measures on a logit scale. And the IRT conditional error of
estimation (SE) would be the square root: SQRT(0.31) = 0.56 in measures. And
this inverted would yield the CTT CSEM: 1/0.56 = 1.80 in counts.</div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
[[Or the SQRT(SUM(p*q)) = SQRT((0.15) * 21) = SQRT(3.24) =
1.80 (in counts) and the reciprocal is 1/1.80 = 0.56 in measures.]]</div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
The IRT (CSEM) in Chart 89 is really the IRT standard error
of estimation (SE or SEE). On Table 45c, the CSEM (SQRT) is also the SE
(conditional error of estimation) obtained from the square root of the error
variance for that ability level (17 right, 1.73 measures, or 0.81 or 81%).</div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<span style="background-color: #ffe599;">“Thus more information implies less error of measurement.”</span></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
See Table 45c, CSEM, green, and Table 46, col 9-10.</div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<span style="background-color: #ffe599;">“In general, item information functions tend to look
bell-shaped. Highly discriminating items have tall, narrow information
functions; they contribute greatly but over a narrow range. Less discriminating
items provide less information but over a wider range.”</span></div>
<div class="MsoNormal">
<br /></div>
<table cellpadding="0" cellspacing="0" class="tr-caption-container" style="float: left; text-align: left;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgIhTLfxho9dKv-SxRjSEg7Nf1ANNU5mdmaFMJZq8WGKq-z_kiCZgm7oyHXlZgYlQNZEYb9TyOWWZa8zf1S2G5C4lJJzeK1KOvCCIYHw6f1dQgf5RHOPaQCoXKLPRlhxyF65YRW3_LJCt8/s1600/Chart+92.jpg" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" height="158" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgIhTLfxho9dKv-SxRjSEg7Nf1ANNU5mdmaFMJZq8WGKq-z_kiCZgm7oyHXlZgYlQNZEYb9TyOWWZa8zf1S2G5C4lJJzeK1KOvCCIYHw6f1dQgf5RHOPaQCoXKLPRlhxyF65YRW3_LJCt8/s200/Chart+92.jpg" width="200" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">Chart 92</td></tr>
</tbody></table>
<table cellpadding="0" cellspacing="0" class="tr-caption-container" style="float: right; text-align: right;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjqkTek8sxegoMbUd9ImwV0FeBGC76jKjt3RRV5n7wtPgmowKqzTj1RhNRaHxVqcxjLOI2UjlnxXXvKXx8BhZ-T5B7twPkpENCCHzi3W9QdQtsuaSiB1ORsIuIs7ay9IxaiXxwcaAE2ko8/s1600/Table+47.jpg" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" height="50" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjqkTek8sxegoMbUd9ImwV0FeBGC76jKjt3RRV5n7wtPgmowKqzTj1RhNRaHxVqcxjLOI2UjlnxXXvKXx8BhZ-T5B7twPkpENCCHzi3W9QdQtsuaSiB1ORsIuIs7ay9IxaiXxwcaAE2ko8/s200/Table+47.jpg" width="200" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">Table 47</td></tr>
</tbody></table>
<div class="MsoNormal">
The same generality applies to the item information
functions (IIF)s in Chart 92 but it is not very evident. The item with a
difficulty of 10 (IIF = 1.80, Table 47) is also highly discriminating. The two easiest items had negative discrimination; they show
an increase in information as student ability decreases toward zero measure. The generality applies best near the
average test raw score of 50% or zero measure; which is not on the chart (no student got a score of 50% on this test). </div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
This test had an average test score of 80%. This has spread the item information
function curves out (Chart 92). They are not centered on the raw score of 50%
or the measures zero location. <b>However
each peaks near the point where item difficulty in measures is close to student
difficulty in measures.</b> This observation is critical in establishing the
value of IRT item analysis and how it is used. This makes sense in measures (a natural
log of the ratio of right and wrong mark scale) but not in raw </div>
<div class="MsoNormal">
scores (normal
linear scale) as I first posted in Chart 75 with only count and percent scales.<a href="https://www.blogger.com/blogger.g?blogID=6676724996771468267" name="_GoBack"></a></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<span style="background-color: #ffe599;">“Plots of item information can be used to see how much
information an item contributes and to what portion of the scale score range.”</span></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
This is very evident in Table 47 and Chart 92.</div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<span style="background-color: #ffe599;">“Because of local independence, item information functions
are additive.”</span></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
See <span style="color: red;">Test SEM (in Measures), </span>Winsteps
Table 17.1 MODEL S.E. MEAN (identical) = 0.64, Table 45c)</div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<span style="background-color: #ffe599;">“Thus, the test information function is simply the sum of
the information functions of the items on the exam. Using this property with a
large item bank, test information functions can be shaped to control
measurement error very precisely.”</span></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<span style="background-color: #ffe599;"> “Characterizing
the accuracy of test scores is perhaps the central issue in psychometric theory
and is a chief difference between IRT and CTT. IRT findings reveal that the CTT
concept of reliability is a simplification.” </span></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
At this point my audit tool, Table 45, falls silent. These
two mathematical models are a means for only estimating theoretical values;
they are not the theoretical values nor are they the reasoning behind them. CTT
starts from observed values and projects into the general environment. IRT can start with the perfect Rasch model and select observations that fit the model. The
two models are looking in opposite directions. CTT uses a linear scale with the
origin at zero counts. IRT sets its log ratio point-of-origin (zero) at the 50%
CTT point. I must accept the concept that CTT is a simplification of IRT on the
basis of authority at this point.</div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<span style="background-color: #ffe599;">“In the place of reliability, IRT offers the test
information function which shows the degree of precision at different values of
theta, [student ability].”</span></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
I would word this, “In ADDITION to reliability,” (Table 45a,
CTT = 0.29 and 45c, IRT = 0.27). Also the “IRT offers the ITEM information
function which shows the degree of precision at different values . . .”</div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<span style="background-color: #ffe599;">“These results allow psychometricians to (potentially)
carefully shape the level of reliability for different ranges of ability by
including carefully chose items. For example, in a certification situation in
which a test can only be passed or failed, where there is only a single
“cutscore,” and where the actually passing score is unimportant, a very
efficient test can be developed by selecting only items that have high
information near the cutscore. These items generally correspond to items whose
difficulty is about the same as that of the cutscore.” </span></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
The eleven items in Table 47 and Chart 92 each peak near the
point where item difficulty in measures is close to student difficulty in
measures. The discovery or invention of this relationship is the key advantage
of IRT over CTT.</div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<b>These data show that a test item need not have to have (a commonly
recommended) average score near 50% for useable results.</b> Any cutscore from 50%
to 80% would produce useable results on this test with an average score of 80%
and cutscore (passing) of 70%.<br />
<br />
<span style="background-color: #ffe599;">"IRT is sometimes called <i>strong true score theory or modern mental test theory</i> because it is a more recent body of theory and makes more explicit the hypotheses that are implicit within CTT."</span><br />
<br />
My understanding is that with CTT an item may be 50% difficult for the class without reveiling how difficult it is for each student (no location). With IRT ever item is 50% difficult for any student with a comparable ability (difficulty and ability at the same location). </div>
<br /><span style="font-family: Calibri; font-size: 11.0pt; line-height: 115%; mso-ansi-language: EN-US; mso-ascii-theme-font: minor-latin; mso-bidi-font-family: "Times New Roman"; mso-bidi-theme-font: minor-bidi; mso-fareast-font-family: Calibri; mso-fareast-language: EN-US; mso-fareast-theme-font: minor-latin; mso-hansi-theme-font: minor-latin;"></span>
<span style="font-family: Calibri; font-size: 11.0pt; line-height: 115%; mso-ansi-language: EN-US; mso-ascii-theme-font: minor-latin; mso-bidi-font-family: "Times New Roman"; mso-bidi-theme-font: minor-bidi; mso-fareast-font-family: Calibri; mso-fareast-language: EN-US; mso-fareast-theme-font: minor-latin; mso-hansi-theme-font: minor-latin;">
I do not know what part of IRT is invention and what part is discovery
on the part of some ingenious people. Two basic parts had to be fit together:
information and measures by way of an inversion. Then a story had to be created
to market the finished product; the Rasch model and <a href="http://www.winsteps.com/">Winsteps</a> (full and partial
credit) are the limit of my experience. The unfortunate name choice of “partial
credit” rather than knowledge or skill and judgment may have been a factor in the Rasch
partial credit model not becoming popular. The name, partial credit, falls into
the realm of psychometrician tools. The name, Knowledge and Judgment, falls
into the realm of classroom tools needed to guide the development of scholars
as well as obtain maximum information from paper standardized tests; where
students individually customized their tests (accurately, honestly, and fairly) rather than
CAT where the test is tailored to fit the student; using best-guess, dated, and questionable
second hand information.</span><!--EndFragment--></div>
<div>
<span style="font-family: Calibri; font-size: 11.0pt; line-height: 115%; mso-ansi-language: EN-US; mso-ascii-theme-font: minor-latin; mso-bidi-font-family: "Times New Roman"; mso-bidi-theme-font: minor-bidi; mso-fareast-font-family: Calibri; mso-fareast-language: EN-US; mso-fareast-theme-font: minor-latin; mso-hansi-theme-font: minor-latin;"><br /></span></div>
<div>
<span style="font-family: Calibri;"><span style="font-size: 15px; line-height: 17px;">IRT makes CAT possible. Please see "<a href="http://www.edweek.org/dd/articles/2012/10/17/01adaptive.h06.html">Adaptive Testing Evolves to Assess Common-Core Skills</a>" for current marketing, use, and a list of comments, including two of mine. The exaggerated claims of test makers to assess and promote deveoping students by the continued use of forced-choice lower level of thinking tests continues to be ignored in the marketing of these tests to assess Common Core skills. Increased precision of nonsense still takes precedence over an assessment that is compatible with and supports the classroom and scholarship.</span></span><br />
<span style="font-family: Calibri;"><span style="font-size: 15px; line-height: 17px;"><br /></span></span>
<span style="font-family: Calibri;"><span style="font-size: 15px; line-height: 17px;">Serious mastery: <a href="http://www.knowledgefactor.com/">Knowledge Factor</a>.</span></span><br />
<span style="font-family: Calibri;"><span style="font-size: 15px; line-height: 17px;">Student development:<a href="http://www.nine-patch.com/"> Knowledge and Judgment Scoring</a> (Free Power Up Plus) and <a href="http://www.winsteps.com/">IRT Rasch partial credit</a> (Free Ministep).</span></span><br />
<span style="font-family: Calibri;"><span style="font-size: 15px; line-height: 17px;">Ranking: Forced-choice on paper or CAT.</span></span></div>
Richard Harthttp://www.blogger.com/profile/04962997526156185761noreply@blogger.com2tag:blogger.com,1999:blog-6676724996771468267.post-53333005965961560642015-04-08T03:00:00.000-07:002015-04-08T03:00:09.385-07:00CTT and Rasch IRT Item Analysis Paradox<div class="MsoNormal">
14</div>
<div class="MsoNormal">
[The solution is in Chart 89, Item Analysis flow sheet.]</div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
An <a href="http://www.winsteps.com/winman/reliability.htm">apparent
paradox</a> is that extreme scores have perfect precision, but extreme measures
have perfect imprecision” in “Reliability and separation of measures.” A more
complete <a href="http://www.rasch.org/rmt/rmt204f.htm">discussion</a> is given
under the title, “Standard Errors and Reliabilities: Rasch and Raw Score”.<u><span style="color: blue;"><o:p></o:p></span></u></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<table cellpadding="0" cellspacing="0" class="tr-caption-container" style="float: right; margin-left: 1em; text-align: right;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEivkaJGor2khm4YrO-WyWl8Qsw6Ht15XWS8y65legBnXazrymlvYHIokWkY7w__cuwOPNkYWzKom0WV-qToefDHrigdpA_b1t1XT1IlOl0d8RXycNCdIbYCP-4dPimcCc6cLvc6BgDrHF8/s1600/Chart+82.jpg" imageanchor="1" style="clear: right; margin-bottom: 1em; margin-left: auto; margin-right: auto;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEivkaJGor2khm4YrO-WyWl8Qsw6Ht15XWS8y65legBnXazrymlvYHIokWkY7w__cuwOPNkYWzKom0WV-qToefDHrigdpA_b1t1XT1IlOl0d8RXycNCdIbYCP-4dPimcCc6cLvc6BgDrHF8/s1600/Chart+82.jpg" height="228" width="320" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">Chart 82</td></tr>
</tbody></table>
The apparent paradox is graphed in Chart 82. Precision on
one scale is the inverse or reciprocal of the other: 1/0.44 = 2.27 and 1/2.27 =
0.44.<br />
<br />
<br /></div>
<div class="separator" style="clear: both; text-align: left;">
<table cellpadding="0" cellspacing="0" class="tr-caption-container" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em; text-align: right;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhZfIXaQJOwa3YUeO3HcoNYtmZ6jW_Vp7D2t2VtInCNZY6ui6hmHlYQwAY2nmDdxbbZjoSU3sqC7FHAvN3xUjtzKRS-j07ZwTzNVygyJMtWUGcEbz2eqomEkDdP3f5rDz8rjfqmImB6Tk8/s1600/Table+45.jpg" imageanchor="1" style="clear: right; margin-bottom: 1em; margin-left: auto; margin-right: auto;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhZfIXaQJOwa3YUeO3HcoNYtmZ6jW_Vp7D2t2VtInCNZY6ui6hmHlYQwAY2nmDdxbbZjoSU3sqC7FHAvN3xUjtzKRS-j07ZwTzNVygyJMtWUGcEbz2eqomEkDdP3f5rDz8rjfqmImB6Tk8/s1600/Table+45.jpg" height="320" width="263" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">Table 45</td></tr>
</tbody></table>
I edited Table 32 to disclose a full development of a comparison between CTT and IRT using real classroom data (Table 45). This first view is too complicated.</div>
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi9pFKDpZ2rZIrKMcZT4gX0NKoemXzSOI7VP25PU9RnBPwyIV2N-3DgHVOEQiBSjA2Gm0kgd7sFyM7XKMAgfoV06aRHJH5yEruzxlcRav2QSTPvUVTTxvttDXzba9601gqHgcmOs0Sn1js/s1600/Chart+83.jpg" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi9pFKDpZ2rZIrKMcZT4gX0NKoemXzSOI7VP25PU9RnBPwyIV2N-3DgHVOEQiBSjA2Gm0kgd7sFyM7XKMAgfoV06aRHJH5yEruzxlcRav2QSTPvUVTTxvttDXzba9601gqHgcmOs0Sn1js/s1600/Chart+83.jpg" height="217" width="320" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">Chart 83</td></tr>
</tbody></table>
<div class="separator" style="clear: both; text-align: left;">
Chart 83 (CTT) and Chart 84 (IRT) summarize the statistics behind Table 45.</div>
<div class="separator" style="clear: both; text-align: center;">
<br /></div>
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEioWMyfvf_MHx_GJxr1-rlAvp6XiBcRXiiaNTPPsBX5sB8RwYTgJkU78IFXziJLu63iEYOJ2zlOET30xwpCixWY3XRi9LXHezr-dSlKx_vQS1vAeGZSkLZNq4HAg65VOIssouhyXDpDi80/s1600/Chart+84.jpg" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEioWMyfvf_MHx_GJxr1-rlAvp6XiBcRXiiaNTPPsBX5sB8RwYTgJkU78IFXziJLu63iEYOJ2zlOET30xwpCixWY3XRi9LXHezr-dSlKx_vQS1vAeGZSkLZNq4HAg65VOIssouhyXDpDi80/s1600/Chart+84.jpg" height="217" width="320" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">Chart 84</td></tr>
</tbody></table>
<div class="MsoNormal">
Table 45 includes the process of combining student
scores and item difficulties onto one logit scale.</div>
<div class="MsoNormal">
<br /></div>
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgan3e5nGJ1rAUg8Je7OuOBW7zzBCZcL1bAuZxKww5CwwNKm5a_7ei5Rd5ATnPE8ki3yx__l2fVrWxk0Ep3XzRSBS47_yv6JM7LdpqmLAkShUH_Eit8ZeGPhQhqWaVXBPHRAe5tS05DjWU/s1600/Table+46.jpg" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgan3e5nGJ1rAUg8Je7OuOBW7zzBCZcL1bAuZxKww5CwwNKm5a_7ei5Rd5ATnPE8ki3yx__l2fVrWxk0Ep3XzRSBS47_yv6JM7LdpqmLAkShUH_Eit8ZeGPhQhqWaVXBPHRAe5tS05DjWU/s1600/Table+46.jpg" height="198" width="320" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">Table 46</td></tr>
</tbody></table>
<div class="MsoNormal">
I then isolated the item analysis from the complete development
above by skipping the formation of a single scale from real classroom data.
Instead, I feed the IRT item analysis a percent (dummy) data set (Table 46)
with the same number of items as in the classroom test (21 items). I then
graphed the data strings in Table 46 as a second, simpler, view of IRT item
analysis.</div>
<div class="MsoNormal">
<br /></div>
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjRXJ6uiWLdNx2LS4uixWlYmvy90n1CR-kU0D150pHbQdnQeYF_wkcAzWnp9Otj1lGBFP94dR2ySe1yGSQjIJsLTheHynaA0WK9LGn2JzmSTYsRLWKSrTiXjvpquZJUZm_YZ-mi9I1wiGI/s1600/Chart+85.jpg" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjRXJ6uiWLdNx2LS4uixWlYmvy90n1CR-kU0D150pHbQdnQeYF_wkcAzWnp9Otj1lGBFP94dR2ySe1yGSQjIJsLTheHynaA0WK9LGn2JzmSTYsRLWKSrTiXjvpquZJUZm_YZ-mi9I1wiGI/s1600/Chart+85.jpg" height="320" width="291" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">Chart 85</td></tr>
</tbody></table>
<div class="MsoNormal">
Turning right counts (Chart 85, blue) into a <b style="mso-bidi-font-weight: normal;">right/wrong ratio</b> string (red) yields a
very different shape than a straight line right mark count. We now have the <b style="mso-bidi-font-weight: normal;">rate</b> at which each mark completes a
perfect score of 21 or 100%. It starts slow (1/20), with the last mark racing
20 times (20/1) the average rate (10/11 or 11/10, near 1, in Table 46, col 2). </div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
Taking the natural log of the ratio (a logit, Table 46, col
3) creates the Rasch model IRT characteristic curve (Chart 85, purple) with the
zero logit point of origin positioned at the 50% normal value. [Ratios and log
ratios have no dimensions.] </div>
<div class="MsoNormal">
<br /></div>
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgTLl2E5gR00u4pes4Z2O10I5MA5FMIjcvqIYYhG8c4J-7TNrZ9NcbNGLMl6pksXa7TLTqm4pYCDX2F7wnkW3iKtIZZKlIX9yKdYoogefPvqQraRfgT6vPdNbtDcZqv3JOGfn0DOoJY5ho/s1600/Chart+86.jpg" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgTLl2E5gR00u4pes4Z2O10I5MA5FMIjcvqIYYhG8c4J-7TNrZ9NcbNGLMl6pksXa7TLTqm4pYCDX2F7wnkW3iKtIZZKlIX9yKdYoogefPvqQraRfgT6vPdNbtDcZqv3JOGfn0DOoJY5ho/s1600/Chart+86.jpg" height="204" width="320" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">Chart 86</td></tr>
</tbody></table>
<div class="MsoNormal">
Winsteps, at this point, has reduced student raw scores and
item difficulties (in <b style="mso-bidi-font-weight: normal;">counts</b>) into
one logit scale of student ability and item difficulty with the dimension of a <b style="mso-bidi-font-weight: normal;">measure</b>. These are then combined into
the probability of a right answer to start the item analysis. The percent
(dummy) input (Table 46, col 6) replaces this operation (Chart 86). This
simplifies the current discussion to just item analysis and precision. </div>
<div class="MsoNormal">
<br /></div>
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiLjb_6AksohX_fqtMofHsTxgdUqHfhlD7r46JYEYSuX2WoMmNdTgWz82iwrTwquHQj7G16CpUrAjuBXKEsKHo6Ntco6KMZ93XgbNAfgvHEFguRnJOxPYhPuXb0dKJf_JNomuBszc9nVpo/s1600/Chart+87.jpg" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiLjb_6AksohX_fqtMofHsTxgdUqHfhlD7r46JYEYSuX2WoMmNdTgWz82iwrTwquHQj7G16CpUrAjuBXKEsKHo6Ntco6KMZ93XgbNAfgvHEFguRnJOxPYhPuXb0dKJf_JNomuBszc9nVpo/s1600/Chart+87.jpg" height="311" width="320" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">Chart 87</td></tr>
</tbody></table>
<div class="MsoNormal">
Percent input and Information for one central cell are
plotted in Chart 87. Cell information is limited to a maximum of 0.25 at a student
raw score of 50% (Table 46, col 7), when combining p*q (0.50 * 0.50 = 0.25 ).
The next step is to adjust the cell information for 21 items on the test
(Column 8).</div>
<div class="MsoNormal">
<br /></div>
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjCyOsr7vOFlUGqIW0x6iIgEnofOkZ0ykLDdVGeCTFERJGRUMWAQnBx-Fd22Ya8eObMTr49wd-q-bVL0l5TtRT69eBQTvfDKx1EeYpAz7xN6yJnTQHLx3rwPL7gU75yuOxWxBcrpwBa07g/s1600/Chart+88.jpg" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjCyOsr7vOFlUGqIW0x6iIgEnofOkZ0ykLDdVGeCTFERJGRUMWAQnBx-Fd22Ya8eObMTr49wd-q-bVL0l5TtRT69eBQTvfDKx1EeYpAz7xN6yJnTQHLx3rwPL7gU75yuOxWxBcrpwBa07g/s1600/Chart+88.jpg" height="320" width="267" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">Chart 88</td></tr>
</tbody></table>
<div class="MsoNormal">
Chart 88 completes the comparison of CTT and IRT
calculations on Table 46. The inversion of Information (col 9) yields the error
variance that aligns with student score measures such that the greatest
precision (smallest error variance) is at the point of origin of the logit
scale. The square root of the error variance (col 10) yields the CSEM
equivalent for IRT measures. And then, by a second inversion these measure
values are transformed into the identical normal CSEM values (col 11 - 12) for
a CTT item analysis. The total view in Table 45 was too complicated. Charts 85
– 88 are also. </div>
<div class="MsoNormal">
<br /></div>
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhJFUOwT9jgGEQ1aMqR-xjlSMvgCCNVblg2eTE6ijNTdoa-vNQGuZOpnrE-vMHXR02nEfUa2ZwUwuXqWusb6Mwgcka1FqJxEX1tPHNDojiX2OWXnLecMq08VG5y7f6bihepfaPvuduMzec/s1600/Chart+89.jpg" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhJFUOwT9jgGEQ1aMqR-xjlSMvgCCNVblg2eTE6ijNTdoa-vNQGuZOpnrE-vMHXR02nEfUa2ZwUwuXqWusb6Mwgcka1FqJxEX1tPHNDojiX2OWXnLecMq08VG5y7f6bihepfaPvuduMzec/s1600/Chart+89.jpg" height="222" width="320" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">Chart 89</td></tr>
</tbody></table>
<div class="MsoNormal">
My third, simple, and last view is a flowchart (Chart 89)
constructed from the above charts and tables.</div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
The percent (dummy) data produce identical (1.80) standard
error of measurement (CSEM) results with CTT and IRT item analysis (Table 46,
col 11 - 12 and Chart 89) even though CTT starts with a raw score count (17),
and skips the score mean (0.81), and the IRT item analysis starts with a score
mean (0.81). </div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
CTT captures the variation (in marks) within a student score
in the variance (0.15); IRT captures the variation (in probabilities) as
information (0.15). In all cases the score variance and score information are
treated with the square root (SQRT, pink) to yield standard errors (estimates
of precision: CTT CSEM, on a normal scale in counts, and IRT (CSEM) on a logit
scale in measures.</div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
In summary, as CTT score variance and IRT score information
(red) increase, CSEM increases on a normal scale (Chart 89). Precision
decreases.<span style="mso-spacerun: yes;"> </span>At the same time IRT
error variance (green) and IRT (CSEM) decrease on a logit scale. Precision
increases with respect to the Rasch model point of origin zero (50% on a normal
scale). This inversion aligns the IRT (CSEM) to student scores in measures on a
logit scale.</div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
It appears that the meaning of this depends upon what is
being measured and how well it is being measured. CTT measures in counts and
sets error (based on the score variance, Chart 89, red) about the student score
count on a normal scale (CSEM). IRT converts counts to “measures”. IRT then
measures in “measures” and sets error (based on the error variance, Chart 89,
green) about the point of origin (zero) on a logit scale that corresponds to
50% on a normal scale.</div>
<div class="MsoNormal">
<br /></div>
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiGbdmsfexEPX4-xW8oNIptvcYWvG0HnLolkDD4zMuNBEny8SDHQZluCTSQJBj8aT2moVf3vuwBQb7IOV9rTt1ZW4WX57M99MmprrBrcU8mcEY0sbgf7M7z_7Zp6I9lXDLrqo290DI3WWs/s1600/Chart+90.jpg" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiGbdmsfexEPX4-xW8oNIptvcYWvG0HnLolkDD4zMuNBEny8SDHQZluCTSQJBj8aT2moVf3vuwBQb7IOV9rTt1ZW4WX57M99MmprrBrcU8mcEY0sbgf7M7z_7Zp6I9lXDLrqo290DI3WWs/s1600/Chart+90.jpg" height="320" width="260" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">Chart 90</td></tr>
</tbody></table>
<div class="MsoNormal">
The two methods of feeding an item analysis are using two
different reference points. This was easier to see when I took the core out of
Chart 88 and plotted it in a more common form in Chart 90. Precision on both
scales is shown in solid black. This line intersects the Rach model IRT
characteristic curve where normal is 50% and IRT is zero. At a count of 17
right, the normal scale shows higher precision; the logit scale shows lower
precision in respect to the perfect Rasch model. </div>
<div class="MsoNormal">
<b style="mso-bidi-font-weight: normal;"><br /></b></div>
<div class="MsoNormal">
<b style="mso-bidi-font-weight: normal;">The characteristic curve is a collection of points where student ability and item
difficulties match resulting in students with this ability getting 50% right answers with items with matching difficulties. This situation exists for CTT only at the average test score (mean).<o:p></o:p></b><br />
<b style="mso-bidi-font-weight: normal;"><br /></b></div>
<div class="MsoNormal">
[The <b style="mso-bidi-font-weight: normal;">slope of the
test characteristic curve</b> is given as the inverse of the raw score error
variance (3.24, red, Chart 88 - 89, and Table 46).]</div>
<div class="MsoNormal">
<br /></div>
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhCHE1Iq41qpN5wvN_BSWysDXQwKxP5Yjz8QpNoujvFcIUC3wrbkPDjr1Cpr3ejOWyE9dpHZhz966wUmhW3CqeWa1PKvQbzAntTiZgFKGpAcToWivS966AFFMHb-q9szaDhiiWXeJ4a_XI/s1600/Chart+91.jpg" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhCHE1Iq41qpN5wvN_BSWysDXQwKxP5Yjz8QpNoujvFcIUC3wrbkPDjr1Cpr3ejOWyE9dpHZhz966wUmhW3CqeWa1PKvQbzAntTiZgFKGpAcToWivS966AFFMHb-q9szaDhiiWXeJ4a_XI/s1600/Chart+91.jpg" height="320" width="259" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">Chart 91</td></tr>
</tbody></table>
<div class="MsoNormal">
Table 91 applies the above thinking to real classroom data
(Table 45c). This time the average score was not at 50% but at 81%. The lowest
student score on Table 45c was 12 (57%). </div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
In a lost reference, I have read that at the 50% point
students do not know anything; it is all chance. I can see that for true-false.
That could put CTT and IRT in conflict. A student must know something to earn a
score of 50% when there are four options to each item. There is a free 25%. The
student must supply the remaining 25%. Also few CCT tests are filled with items
that have maximum discrimination and precision. A high quality CTT test can
look very much like a high quality IRT test. The difference is that the IRT
test item analysis takes more into the calculations than the CTT test when
offered as forced-choice (a cheap way to rank students) or as with <a href="http://www.nine-patch.com/">knowledge and judgment scorin</a>g (where students report what they actually know and find
meaningful and useful; the basis for effective teaching). </div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
Historically, test reliability was the chief marketing point
of standardized tests. In the past decade the precision of individual student
scores has replaced test reliability. <a href="http://www.winsteps.com/winsteps.htm">IRT (CSEM)</a> provides a more marketable
product along with promoting the sale of equipment and related CAT services.
Again psychometricians on the backside are continuing to support and lend
credibility to the claims from the sales office on the front end.</div>
<!--[if gte mso 9]><xml>
<o:OfficeDocumentSettings>
<o:AllowPNG/>
</o:OfficeDocumentSettings>
</xml><![endif]--><!--[if gte mso 9]><xml>
<w:WordDocument>
<w:Zoom>0</w:Zoom>
<w:TrackMoves>false</w:TrackMoves>
<w:TrackFormatting/>
<w:PunctuationKerning/>
<w:DrawingGridHorizontalSpacing>18 pt</w:DrawingGridHorizontalSpacing>
<w:DrawingGridVerticalSpacing>18 pt</w:DrawingGridVerticalSpacing>
<w:DisplayHorizontalDrawingGridEvery>0</w:DisplayHorizontalDrawingGridEvery>
<w:DisplayVerticalDrawingGridEvery>0</w:DisplayVerticalDrawingGridEvery>
<w:ValidateAgainstSchemas/>
<w:SaveIfXMLInvalid>false</w:SaveIfXMLInvalid>
<w:IgnoreMixedContent>false</w:IgnoreMixedContent>
<w:AlwaysShowPlaceholderText>false</w:AlwaysShowPlaceholderText>
<w:Compatibility>
<w:BreakWrappedTables/>
<w:DontGrowAutofit/>
<w:DontAutofitConstrainedTables/>
<w:DontVertAlignInTxbx/>
</w:Compatibility>
</w:WordDocument>
</xml><![endif]--><!--[if gte mso 9]><xml>
<w:LatentStyles DefLockedState="false" LatentStyleCount="276">
</w:LatentStyles>
</xml><![endif]-->
<!--[if gte mso 10]>
<style>
/* Style Definitions */
table.MsoNormalTable
{mso-style-name:"Table Normal";
mso-tstyle-rowband-size:0;
mso-tstyle-colband-size:0;
mso-style-noshow:yes;
mso-style-parent:"";
mso-padding-alt:0in 5.4pt 0in 5.4pt;
mso-para-margin:0in;
mso-para-margin-bottom:.0001pt;
mso-pagination:widow-orphan;
font-size:12.0pt;
font-family:"Times New Roman";
mso-ascii-font-family:Cambria;
mso-ascii-theme-font:minor-latin;
mso-fareast-font-family:"Times New Roman";
mso-fareast-theme-font:minor-fareast;
mso-hansi-font-family:Cambria;
mso-hansi-theme-font:minor-latin;
mso-bidi-font-family:"Times New Roman";
mso-bidi-theme-font:minor-bidi;}
</style>
<![endif]--><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026"/>
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1"/>
</o:shapelayout></xml><![endif]-->
<!--StartFragment-->
<!--EndFragment--><br />
<div class="MsoNormal">
<br /></div>
Richard Harthttp://www.blogger.com/profile/04962997526156185761noreply@blogger.com0tag:blogger.com,1999:blog-6676724996771468267.post-69160186253128822012015-03-11T03:00:00.000-07:002015-03-11T03:00:05.950-07:00Modernizing Standardize Test Scores<div class="MsoNormal">
<span style="mso-spacerun: yes;"> </span>#13</div>
<div class="MsoNormal">
A single standardized right-count score (RCS) has little
meaning beyond a ranking. A knowledge and judgment score (JKS) from the same
set of questions not only tells us how much the student may know or can do but
also the judgment to make use of that knowledge and skill. A student with a RCS
must be told what he/she knows or can do. A student with a KJS tells the
teacher or test maker what he/she knows. A RCS becomes a token in a federally
sponsored political game. A KJS is a base onto which students build further
learning and teachers build further instruction.</div>
<div class="MsoNormal">
<br /></div>
<table cellpadding="0" cellspacing="0" class="tr-caption-container" style="float: left; text-align: left;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiQjIaz1NZaa9HI036iRvonzFSOdCOzY0VCb47t9a2BJ2yGD8Zh2ES0FMM7YCL4X_w1Vixf_McSFc9LaqGMCYy1Y8aqyeIvirHfM4NCGY4ZDr1LW8xnZpLkiwxxrhHFCPEuU6L8gzGGWSg/s1600/Table+40.jpg" imageanchor="1" style="clear: left; margin-bottom: 1em; margin-left: auto; margin-right: auto;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiQjIaz1NZaa9HI036iRvonzFSOdCOzY0VCb47t9a2BJ2yGD8Zh2ES0FMM7YCL4X_w1Vixf_McSFc9LaqGMCYy1Y8aqyeIvirHfM4NCGY4ZDr1LW8xnZpLkiwxxrhHFCPEuU6L8gzGGWSg/s1600/Table+40.jpg" height="184" width="200" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">Table 40. RCS</td></tr>
</tbody></table>
<table cellpadding="0" cellspacing="0" class="tr-caption-container" style="float: right; text-align: right;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgYRZFgTru19WnMmtMRKEp1iUI061pjtcywEnJLxpUxIbjQQlvFvlq9xnY1BijVkHpinronE5KVeG7skz6RRxp1pSilpq7Bw9nVrcxjZ5jOTgdr4-904eC9MDZggK-syDT7Hm3lFGS5QK0/s1600/Table+41.jpg" imageanchor="1" style="clear: right; margin-bottom: 1em; margin-left: auto; margin-right: auto;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgYRZFgTru19WnMmtMRKEp1iUI061pjtcywEnJLxpUxIbjQQlvFvlq9xnY1BijVkHpinronE5KVeG7skz6RRxp1pSilpq7Bw9nVrcxjZ5jOTgdr4-904eC9MDZggK-syDT7Hm3lFGS5QK0/s1600/Table+41.jpg" height="184" width="200" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">Table 41. KJS</td></tr>
</tbody></table>
<div class="MsoNormal">
The previous two posts dealt with student ability <b style="mso-bidi-font-weight: normal;">during</b> the test. This one looks at the
score <b style="mso-bidi-font-weight: normal;">after</b> the test. I developed
four runs of the Visual Education Statistics Engine: Table 40. RCS, Table 41. KJS
(simulated), and after maximizing item discrimination, Table 42. RCSmax, and Table
43. KJSmax. </div>
<div class="MsoNormal">
<br /></div>
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="float: left; margin-right: 1em; text-align: left;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhxw2Y8WPTL1y6TASJy8FZLFDntAbhzayEfYLWxlq8TSP7nvdRWpSmvBRjDXYtdsn6K82TGW4CAST6b5J2aLfPPVilJBPoV6ZXo1kTvHaEqsKtRI7yJalQ52pO0WXfVpxUaUypyu7rGij0/s1600/Table+42.jpg" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhxw2Y8WPTL1y6TASJy8FZLFDntAbhzayEfYLWxlq8TSP7nvdRWpSmvBRjDXYtdsn6K82TGW4CAST6b5J2aLfPPVilJBPoV6ZXo1kTvHaEqsKtRI7yJalQ52pO0WXfVpxUaUypyu7rGij0/s1600/Table+42.jpg" height="184" width="200" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">Table 42. RCSma</td></tr>
</tbody></table>
<table cellpadding="0" cellspacing="0" class="tr-caption-container" style="float: right; text-align: right;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjGnO6uX8EPVaEJGEu4hoOSaG3vnxdvQIIRqLp62B54WFcdTTabZQQvKhFJJpXGM9N6UPQ2sPu43bJa7aKbwDjJtE9qO8MBIC9XOwCvolsJZvJ6VOtSPYrJe9tYHGGNncwXEveSFcWZaaY/s1600/Table+43.jpg" imageanchor="1" style="clear: right; margin-bottom: 1em; margin-left: auto; margin-right: auto;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjGnO6uX8EPVaEJGEu4hoOSaG3vnxdvQIIRqLp62B54WFcdTTabZQQvKhFJJpXGM9N6UPQ2sPu43bJa7aKbwDjJtE9qO8MBIC9XOwCvolsJZvJ6VOtSPYrJe9tYHGGNncwXEveSFcWZaaY/s1600/Table+43.jpg" height="184" width="200" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">Table 43. KJSmax</td></tr>
</tbody></table>
<div class="MsoNormal">
Test reliability and the standard error of measurement (SEM) with
some related statistics are gathered into Table 44. The reliability and SEM
values are plotted on Chart 81 below.</div>
<div class="MsoNormal">
<br /></div>
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjQ5RSDhKVbjVML06kvPnt95Bd71wc86QOndViM1trnKs1370_RWiw2P9xTinGtrHACmgF-VNIs-kSfWUoisl6hOO7tLgDcFwetGn_QOIktN5JCmCQ0sQFNxKZGvXRwOPMjqrmvessnkbo/s1600/Table+44.jpg" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjQ5RSDhKVbjVML06kvPnt95Bd71wc86QOndViM1trnKs1370_RWiw2P9xTinGtrHACmgF-VNIs-kSfWUoisl6hOO7tLgDcFwetGn_QOIktN5JCmCQ0sQFNxKZGvXRwOPMjqrmvessnkbo/s1600/Table+44.jpg" height="96" width="400" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">Table 44</td></tr>
</tbody></table>
<div class="MsoNormal">
Students, on average, can reduce their wrong marks by about
one half when they at first switch to knowledge and judgment scoring. The most
obvious effect of changing 24 of 48 zeros to a value of 0.5 to simulate Knowledge
and Judgment Scoring (KJS) was to reduce test reliability (0.36, red). Scoring both
quantity and quality also increased the average test score from 64% to 73%.</div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
Psychometricians do not like the reduction in test
reliability. Standardized paper tests were marketed as “the higher the
reliability the better the test”. Marketing has now moved to “the lower the
standard error of measurement (SEM), the better the test”, using computers, CAT
and online testing (green). The simulated KJS shows a better SEM (10%) in
relation to 12% for RCS. By switching current emphasis from test reliability to
precision (SEM) KJS now shows a slight advantage to test makers over RCS.</div>
<div class="MsoNormal">
<br /></div>
<table cellpadding="0" cellspacing="0" class="tr-caption-container" style="float: left; margin-right: 1em; text-align: left;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEixj74J4YwkXl0lYW8yy_HAcDep30h87rGLuKN_OKHG2i0tY7hDlPqP3aNvV4jjJhm43npq4eeCuv0oJ52ifBYU_4pJfCMf0-Nq6BvI3X_kvtH7p_U9ZZvYwgW2yvzuG29IM5nxv6KbF2s/s1600/Chart+80.jpg" imageanchor="1" style="clear: left; margin-bottom: 1em; margin-left: auto; margin-right: auto;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEixj74J4YwkXl0lYW8yy_HAcDep30h87rGLuKN_OKHG2i0tY7hDlPqP3aNvV4jjJhm43npq4eeCuv0oJ52ifBYU_4pJfCMf0-Nq6BvI3X_kvtH7p_U9ZZvYwgW2yvzuG29IM5nxv6KbF2s/s1600/Chart+80.jpg" height="320" width="301" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">Chart 80</td></tr>
</tbody></table>
<div class="MsoNormal">
Chart 80 shows the general relationships between a
right-count score and a KJS. This is Chart 4/4 from the previous post tipped on
its side with the 60% passing performance replaced with the average scores of
64% RMS and 73% KJS. Again, KJS is not a giveaway. There is an increase in the
score, if the student elects to use his/her judgment. There is also an increase
in the ability to know what a student actually knows because the student is
given the opportunity to report what is known, not to just to mark an answer to
every question (even before looking at the test).</div>
<div class="MsoNormal">
<br /></div>
<div style="text-align: right;">
</div>
<div class="MsoNormal">
<table cellpadding="0" cellspacing="0" class="tr-caption-container" style="float: right; margin-left: 1em; text-align: right;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjgcMNTzmY_BAGg_X55da06_TP9WtD-pkE7FcQOqH-DcYzjGhsZdWFmXlO7ijZcSxMmiAXRGjcCmhhJYFSq-8_xSABOFkzvgYd_p0IFT7l9PqQGRhrKifQrOUEoz8vt2xBhpLmGrpAW0do/s1600/Chart+81a.jpg" imageanchor="1" style="clear: right; margin-bottom: 1em; margin-left: auto; margin-right: auto;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjgcMNTzmY_BAGg_X55da06_TP9WtD-pkE7FcQOqH-DcYzjGhsZdWFmXlO7ijZcSxMmiAXRGjcCmhhJYFSq-8_xSABOFkzvgYd_p0IFT7l9PqQGRhrKifQrOUEoz8vt2xBhpLmGrpAW0do/s1600/Chart+81a.jpg" height="320" width="306" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">Chart 81</td></tr>
</tbody></table>
Chart 81 expands Chart 80 using the statistics in Table 44. In
general there is little difference between a right-count score and a KJS,
statistically. What is different is what is known about the student; the full
meaning of the score. Right-count scoring delivers a score on a test carefully
crafted to deliver a desired on-average test score distribution and cut score. THE
TEST IS DESIGNED TO PRODUCE THE DESIRED SCORE DISTRIBUTION. The KJS adds to
this the ability to assess what students actually know and can do that is of
value to them. The knowledge and judgment score assess the complete student
(quantity and quality).</div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
Knowledge and Judgment Scoring requires appropriate
implementation for the maximum effect on student development. In my experience,
the switch from RCS must be voluntary to promote student development. It must
result in a change in the level of thinking and related study habits where the
student assumes responsibility for learning and reporting. At that time
students feel comfortable changing scoring methods. They like the quality
score. It reassures them that they really can learn and understand.</div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
KJS no longer has a totally negative effect on current psychometrician
attempts to sharpen their data reduction tools. But there are still the effects
of tradition and project size. The NCLB movement demonstrated (failed in part)
because low performing schools mimicked the standardized tests rather than
tended to teaching and learning. Their attempt to succeed was
counterproductive. Doing more of the same does not produce different results.
These schools could also be expected to mimic standardized tests offering KJS.</div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
The current CCSS movement is based on the need for one test
for all in an attempt to get valid comparisons between students, teachers,
schools and states. The effect has been gigantic contracts that only a few
companies have the capacity to bid on and little competition to modernize their
test scoring. </div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
KJS is then a supplement to RCS. It can be offered on
standardized tests. As such, it updates the multiple-choice test to its maximum
potential, IMHO. KJS can be implemented in the classroom, by testing companies
and entrepreneurs who see the mismatch between instruction and assessment.</div>
<!--[if gte mso 9]><xml>
<o:OfficeDocumentSettings>
<o:AllowPNG/>
</o:OfficeDocumentSettings>
</xml><![endif]--><!--[if gte mso 9]><xml>
<w:WordDocument>
<w:Zoom>0</w:Zoom>
<w:TrackMoves>false</w:TrackMoves>
<w:TrackFormatting/>
<w:PunctuationKerning/>
<w:DrawingGridHorizontalSpacing>18 pt</w:DrawingGridHorizontalSpacing>
<w:DrawingGridVerticalSpacing>18 pt</w:DrawingGridVerticalSpacing>
<w:DisplayHorizontalDrawingGridEvery>0</w:DisplayHorizontalDrawingGridEvery>
<w:DisplayVerticalDrawingGridEvery>0</w:DisplayVerticalDrawingGridEvery>
<w:ValidateAgainstSchemas/>
<w:SaveIfXMLInvalid>false</w:SaveIfXMLInvalid>
<w:IgnoreMixedContent>false</w:IgnoreMixedContent>
<w:AlwaysShowPlaceholderText>false</w:AlwaysShowPlaceholderText>
<w:Compatibility>
<w:BreakWrappedTables/>
<w:DontGrowAutofit/>
<w:DontAutofitConstrainedTables/>
<w:DontVertAlignInTxbx/>
</w:Compatibility>
</w:WordDocument>
</xml><![endif]--><!--[if gte mso 9]><xml>
<w:LatentStyles DefLockedState="false" LatentStyleCount="276">
</w:LatentStyles>
</xml><![endif]-->
<!--[if gte mso 10]>
<style>
/* Style Definitions */
table.MsoNormalTable
{mso-style-name:"Table Normal";
mso-tstyle-rowband-size:0;
mso-tstyle-colband-size:0;
mso-style-noshow:yes;
mso-style-parent:"";
mso-padding-alt:0in 5.4pt 0in 5.4pt;
mso-para-margin:0in;
mso-para-margin-bottom:.0001pt;
mso-pagination:widow-orphan;
font-size:12.0pt;
font-family:"Times New Roman";
mso-ascii-font-family:Cambria;
mso-ascii-theme-font:minor-latin;
mso-fareast-font-family:"Times New Roman";
mso-fareast-theme-font:minor-fareast;
mso-hansi-font-family:Cambria;
mso-hansi-theme-font:minor-latin;
mso-bidi-font-family:"Times New Roman";
mso-bidi-theme-font:minor-bidi;}
table.MsoTableGrid
{mso-style-name:"Table Grid";
mso-tstyle-rowband-size:0;
mso-tstyle-colband-size:0;
border:solid windowtext 1.0pt;
mso-border-alt:solid windowtext .5pt;
mso-padding-alt:0in 5.4pt 0in 5.4pt;
mso-border-insideh:.5pt solid windowtext;
mso-border-insidev:.5pt solid windowtext;
mso-para-margin:0in;
mso-para-margin-bottom:.0001pt;
mso-pagination:widow-orphan;
font-size:10.0pt;
font-family:"Times New Roman";
mso-ascii-font-family:Calibri;
mso-hansi-font-family:Calibri;}
</style>
<![endif]-->
<!--StartFragment-->
<!--EndFragment--><br />
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
Knowledge Factor has already done this with their patented
learning/assessment system, <a href="http://www.knowledgefactor.com/">Amplifire</a>.
It can prepare students online for current standardized tests. <a href="http://www.teacherspayteachers.com/Product/Smarter-Test-Scoring-and-Item-Analysis-1528371">Power
Up Plus</a> is free for paper classroom tests. (Please see the two preceding
posts for more details related to student ability during the test).</div>
Richard Harthttp://www.blogger.com/profile/04962997526156185761noreply@blogger.com0tag:blogger.com,1999:blog-6676724996771468267.post-27072000649198754372015-02-11T03:30:00.000-08:002015-02-11T03:30:02.869-08:00Learning Assessment Responsibilities<div class="MsoNormal">
Students, teachers, and test makers <a href="" name="_GoBack"></a>each
have responsibilities that contribute to the meaning of a multiple-choice test
score. This post extracts the responsibilities from the four charts in the
prior post, Meaningful Multiple-Choice Test Scores, that compares short answer,
right-count traditional multiple-choice, and knowledge and judgment scoring
(KJS) of both.</div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
Testing looks simple: learn, test, and evaluate. Short
answer, multiple-choice, or both with student judgment. Lower levels of thinking,
higher levels of thinking, or both as needed. Student ability below, on level,
or above grade level. There are many more variables for standardized test
makers to worry about in a nearly impossible situation. By the time these have
been sanitized from their standardized tests all that remains is a ranking on
the test that is of little if any instructional value (unless student judgment
is added to the scoring).</div>
<div class="MsoNormal">
<br /></div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjC5FkogZwV1YytzvytqE4_5v8_cnFr2SvwTrs7mc1b6OQYl8bSeCp5mNoLQZqYyrC97UiLsaqWi_49J77BYZeVVHILYyg4YjzDfVyFDTKwv6goyJk97XwiwfO5-7pKoj0phvF7JXFCdaU/s1600/Slide1.JPG" imageanchor="1" style="clear: right; float: right; margin-bottom: 1em; margin-left: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjC5FkogZwV1YytzvytqE4_5v8_cnFr2SvwTrs7mc1b6OQYl8bSeCp5mNoLQZqYyrC97UiLsaqWi_49J77BYZeVVHILYyg4YjzDfVyFDTKwv6goyJk97XwiwfO5-7pKoj0phvF7JXFCdaU/s1600/Slide1.JPG" height="320" width="295" /></a></div>
<div class="MsoNormal">
Chart 1/4 compares a short answer and a right-count
traditional multiple-choice test. The teacher has the most responsibility for
the test score when working with pupils at lower levels of thinking (60%). A
high quality student functioning at higher levels of thinking could take the
responsibility to report what is known or can be done in one pass and then just
mark the remainder for the same score (60%). The teacher’s score is based on
the subjective interpretation of the student’s work. The student’s score is
based on a matching of the subjective interpretation of the test questions with
test preparation. [The judgment needed to do this is not recorded in
traditional multiple-choice scores.]</div>
<div class="MsoNormal">
<br /></div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiLKIBDnSb4bgwHXKpp1I9mYLo_uVIFI9CjLQLKUngjUFOe2y9-OHU9S7S4qADgJStwYX28u4-a8PRB8A8RDrC-K5UluJW7qA4guA4D7jcj3Se2Bcpob71j_u2N1ADaVJI2PHsU6W-6qo4/s1600/Slide2.JPG" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiLKIBDnSb4bgwHXKpp1I9mYLo_uVIFI9CjLQLKUngjUFOe2y9-OHU9S7S4qADgJStwYX28u4-a8PRB8A8RDrC-K5UluJW7qA4guA4D7jcj3Se2Bcpob71j_u2N1ADaVJI2PHsU6W-6qo4/s1600/Slide2.JPG" height="320" width="317" /></a></div>
<div class="MsoNormal">
Chart 2/4 compares what students are told about
multiple-choice tests and what actually takes place. Students are told the
starting score is zero. One point is added for each right mark. Wrong or blank answers
add nothing. There is no penalty. Mark an answer to every question. As a classroom
test, this makes sense if the results are returned in a functional formative
assessment environment. Teachers have the responsibility to sum several scores
when ranking students for grades.</div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
As a standardized test, the single score is very unfair.
Test makers place great emphasis on the right-mark after-test score and the
precision of their data reduction tools (for <b style="mso-bidi-font-weight: normal;">individual</b> questions and for <b style="mso-bidi-font-weight: normal;">groups</b>
of students). They have a responsibility of pointing out that the student on
either side of you has an unknowable, different, starting score from chance,
let alone your luck on test day. The forced-choice test actually functions as a
lottery. Lower scoring students are well aware of this and adjust their sense
of responsibility accordingly (in the absence of a judgment or quality score to
guide them).</div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEivrosrXsUI0T__nfVP4zwaceJ08ZLloASa42jsp4P2-OiDWTVhdq0dP3fmVxKEmjwy3uUSutb7-X_4PgUq1rmVYbopBm_KvmRfbqNSDH0JWho4FzF28tUj-_4Wqpp3Vh20pbDbpxzMay0/s1600/Slide3.JPG" imageanchor="1" style="clear: right; float: right; margin-bottom: 1em; margin-left: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEivrosrXsUI0T__nfVP4zwaceJ08ZLloASa42jsp4P2-OiDWTVhdq0dP3fmVxKEmjwy3uUSutb7-X_4PgUq1rmVYbopBm_KvmRfbqNSDH0JWho4FzF28tUj-_4Wqpp3Vh20pbDbpxzMay0/s1600/Slide3.JPG" height="320" width="287" /></a></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
Chart 3/4 compares student performance by quality. Only a
student with a well-developed sense of responsibility, or a comparable innate
ability, can be expected to function as a high quality, high scoring, student
(100% but reported as 60%). A less self-motivated student or with less ability
can perform two passes at 100% and 80% to also yield 60%. The typical student,
facing a multiple-choice test, will make one pass; marking every question as it
comes to earn a quantity, quality, and test score of 60%; a rank of 60%. <b style="mso-bidi-font-weight: normal;">No one knows which right mark is a right
answer.</b> </div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
Teachers and test makers have a responsibility to assess and
report individual student quality on multiple-choice tests just as is done on
short-answer, essay, project, research, and performance tests. These notes of
encouragement and direction provide the same “feel good” effect found in a
knowledge and judgment scored quality score when accompanied with <b style="mso-bidi-font-weight: normal;">a list of what was known or could be done
(the right-marked questions).</b></div>
<div class="MsoNormal">
<b style="mso-bidi-font-weight: normal;"><br /></b></div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhbnpbUiQjA7DZNWs709okPKPex6OjYjZYJvs9uPVkJCCPyo5LhEpojCTaf4u-llL-deBXazDnVXb72jeXhi-JgSP4vK1KyjuldQ6y9IGGZmJQMmOW1xxXJzAtqRqcnpCnra2xchjjrayA/s1600/Slide4.JPG" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhbnpbUiQjA7DZNWs709okPKPex6OjYjZYJvs9uPVkJCCPyo5LhEpojCTaf4u-llL-deBXazDnVXb72jeXhi-JgSP4vK1KyjuldQ6y9IGGZmJQMmOW1xxXJzAtqRqcnpCnra2xchjjrayA/s1600/Slide4.JPG" height="320" width="309" /></a></div>
<div class="MsoNormal">
Chart 4/4 shows knowledge and judgment scoring (KJS) with a
five-option question made from a regular four-option question plus omit. Omit
replaces “just marking”. A short answer question scored with KJS earns one
point for judgment and +/-1 point for right or wrong. An essay question
expecting four bits of information (short sentence, relationship, sketch, or chart)
earns 4 points for judgment and +/-4 points for an acceptable or not acceptable
report. (All fluff, filler, and snow are ignored. Students quickly learn to not
waste time on these unless the test is scored at the lowest level of thinking
by a “positive” scorer.)</div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
Each student starts with the same multiple-choice score:
50%. Each student stops when each student has customized the test to that
student’s preparation. This produces an accurate, honest and fair test score. The
quality score provides judgment guidance for students at all levels. It is the
best that I know of when operating with paper and pencil. <a href="http://www.teacherspayteachers.com/Product/Smarter-Test-Scoring-and-Item-Analysis-1528371">Power
Up Plus</a> is a free example. <a href="http://www.knowledgefactor.com/">Amplifire</a>
refines judgment into confidence using a computer, and now on the Internet. It
is just easier to teach a high quality student who knows what he/she knows.</div>
<!--[if gte mso 9]><xml>
<o:OfficeDocumentSettings>
<o:AllowPNG/>
</o:OfficeDocumentSettings>
</xml><![endif]--><!--[if gte mso 9]><xml>
<w:WordDocument>
<w:Zoom>0</w:Zoom>
<w:TrackMoves>false</w:TrackMoves>
<w:TrackFormatting/>
<w:PunctuationKerning/>
<w:DrawingGridHorizontalSpacing>18 pt</w:DrawingGridHorizontalSpacing>
<w:DrawingGridVerticalSpacing>18 pt</w:DrawingGridVerticalSpacing>
<w:DisplayHorizontalDrawingGridEvery>0</w:DisplayHorizontalDrawingGridEvery>
<w:DisplayVerticalDrawingGridEvery>0</w:DisplayVerticalDrawingGridEvery>
<w:ValidateAgainstSchemas/>
<w:SaveIfXMLInvalid>false</w:SaveIfXMLInvalid>
<w:IgnoreMixedContent>false</w:IgnoreMixedContent>
<w:AlwaysShowPlaceholderText>false</w:AlwaysShowPlaceholderText>
<w:Compatibility>
<w:BreakWrappedTables/>
<w:DontGrowAutofit/>
<w:DontAutofitConstrainedTables/>
<w:DontVertAlignInTxbx/>
</w:Compatibility>
</w:WordDocument>
</xml><![endif]--><!--[if gte mso 9]><xml>
<w:LatentStyles DefLockedState="false" LatentStyleCount="276">
</w:LatentStyles>
</xml><![endif]-->
<!--[if gte mso 10]>
<style>
/* Style Definitions */
table.MsoNormalTable
{mso-style-name:"Table Normal";
mso-tstyle-rowband-size:0;
mso-tstyle-colband-size:0;
mso-style-noshow:yes;
mso-style-parent:"";
mso-padding-alt:0in 5.4pt 0in 5.4pt;
mso-para-margin-top:0in;
mso-para-margin-right:0in;
mso-para-margin-bottom:10.0pt;
mso-para-margin-left:0in;
line-height:115%;
mso-pagination:widow-orphan;
font-size:12.0pt;
font-family:"Times New Roman";
mso-ascii-font-family:Calibri;
mso-ascii-theme-font:minor-latin;
mso-fareast-font-family:"Times New Roman";
mso-fareast-theme-font:minor-fareast;
mso-hansi-font-family:Calibri;
mso-hansi-theme-font:minor-latin;}
</style>
<![endif]-->
<!--StartFragment-->
<!--EndFragment--><br />
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
Most teachers I have met question the score of 60% from KJS.
How can a student get a score of 60% and only mark 10% of the questions right?
Easy. Sum 50% for <b>perfect judgment</b>, 10% for right answers, and <b style="mso-bidi-font-weight: normal;">NO wrong</b>. Or sum 10% right, 10% right
and <b style="mso-bidi-font-weight: normal;">10% wrong</b>, and omit 20%. If the
student in the example chose to mark 10% right (a few well mastered facts) and
then just marked the rest (had no idea how to answer) the resulting score falls
below 40% <b style="mso-bidi-font-weight: normal;">(about 25% wrong)</b>. <b>With no
judgment, the two methods of scoring (smart and dumb) produce identical test
scores.</b> KJS is not a give-away. It is a simple, easy way to update currently
used multiple-choice questions to produce an accurate, honest, and fair test
score. KJS records what right-count traditional multiple-choice misses
(judgment) and what the CCSS movement <b>tries</b> to promote.</div>
Richard Harthttp://www.blogger.com/profile/04962997526156185761noreply@blogger.com0tag:blogger.com,1999:blog-6676724996771468267.post-69371155559908255262015-01-14T03:00:00.000-08:002015-01-14T03:00:04.056-08:00Meaningful Multiple-Choice Test Scores<div align="center" class="MsoNormal" style="text-align: center;">
<div style="text-align: left;">
The meaning of a multiple-choice test score is determined by
several factors in the testing cycle including test creation, test
instructions, and the shift from teacher to student being responsible for
learning and reporting. Luck-on-test-day, in this discussion, is considered to
have similar effects on the following scoring methods.</div>
</div>
<div class="MsoNormal">
<span style="font-size: 12.0pt; line-height: 115%; mso-bidi-font-size: 11.0pt;"><br /></span></div>
<div class="MsoNormal">
<span style="font-size: 12.0pt; line-height: 115%; mso-bidi-font-size: 11.0pt;">[Luck-on-test-day includes but is not limited to: test
blueprint, question author, item calibration, test creator, teacher,
curriculum, standards; classroom, home, and in between, environment; and a
little bit of random chance (act of God that psychometricians need to smooth
their data).]<span style="mso-spacerun: yes;"> </span><span style="mso-spacerun: yes;"> </span><o:p></o:p></span></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
Three ways of obtaining test scores: <b style="mso-bidi-font-weight: normal;">open ended short answer, closed ended right-count four-part
multiple-choice, and knowledge and judgment scoring (KJS)</b> for both short
answer and multiple-choice. These range from familiar manual scoring to what is
now easily done with KJS computer software. Each method of scoring has a
different starting score with a different meaning. The average customary class room
score of 75% is assumed (60% passing).</div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<table cellpadding="0" cellspacing="0" class="tr-caption-container" style="float: right; margin-left: 1em; text-align: right;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjpVo8gAYjzGB0FyTn_mHG0P4GKCn6nLWfcWXdNYVXMCafHR8v3VfEZZrgdC4NzkwQjYw5e6O8DqLtfBozBr67JBxb1LweZDsCEtq2j6uSDewEmJaAivVuqzd7G_TyZcHAXC7-Vw6lnR9U/s1600/Slide1.JPG" imageanchor="1" style="clear: right; margin-bottom: 1em; margin-left: auto; margin-right: auto;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjpVo8gAYjzGB0FyTn_mHG0P4GKCn6nLWfcWXdNYVXMCafHR8v3VfEZZrgdC4NzkwQjYw5e6O8DqLtfBozBr67JBxb1LweZDsCEtq2j6uSDewEmJaAivVuqzd7G_TyZcHAXC7-Vw6lnR9U/s1600/Slide1.JPG" height="320" width="295" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">Chart 1/4</td></tr>
</tbody></table>
<br /></div>
<div class="MsoNormal">
<b style="mso-bidi-font-weight: normal;">Open ended short
answer scores</b> start with zero and increase with each acceptable answer.
There may be several acceptable answers for a single short answer question. The
level of thinking required depends upon the stem of the question. There may be
an acceptable answer for a question both at lower and at higher levels of
thinking. These properties carry over into KJS below.</div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
The teacher or test maker is responsible for scoring the
test (Mastery = 60%; + Wrong = 0%; = 60% passing for quantity in Chart 1/4).
The quality of the answers can be judged by the scorer and may influence which
ones are considered right answers.</div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
The open ended short answer question is flexible (multiple
right answers) and with some subjectivity; different scorers are expected to
produce similar scores. The average test score is controlled by selecting a set
of items that is expected to yield an average test score of 75%. The student test
score is a rank based on items included in the test to survey what students
were expected to master, to group students who know from those who do not know
each item, and items that fail to show mastery or discrimination (unfinished
items for a host of reasons including luck-on-test-day above).<span style="mso-spacerun: yes;"> </span></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
The open ended short answer question can also be scored as a
multiple-choice item. First tabulate the answers. Sort the answers from high to
low count. <span style="mso-spacerun: yes;"> </span>The most frequent
answer, on a normal question, will be the right answer option. The next three
ranking answers will be real student supplied wrong answer options (rather than
test writer created wrong answer options). This pseudo-multiple-choice item can
now be printed as a real question on your next multiple-choice test (with
answers scrambled).</div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
A high quality student could also mark only right answers on
the first pass using the above test (Chart 1/4) and then finish by just marking
on the second pass to earn a score of 60%. A lower quality student could just
mark each item in order, as is usually done on multiple-choice tests, mixing
right and wrong marks, to earn the same score of 60%. Using only a score after
the test we cannot see what is taking place during the test. Turning a short
answer test into traditional multiple-choice hides student quality, the very
thing that the CCSS movement is now promoting.</div>
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="float: right; margin-left: 1em; text-align: right;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgQtlT49pbMJAUKRtqdFMILzjO8Bzcwz_w41oXUhFVv5mrO-WLYylE05OhioNJmZk9DkWhyphenhyphenQbLJGRYqBqGM2XvuFstGNVMmOcLAPfPIPpDO5zi1PDpdjqWdIcmQZEi9fxnduEesDaY9jY4/s1600/Slide2.JPG" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgQtlT49pbMJAUKRtqdFMILzjO8Bzcwz_w41oXUhFVv5mrO-WLYylE05OhioNJmZk9DkWhyphenhyphenQbLJGRYqBqGM2XvuFstGNVMmOcLAPfPIPpDO5zi1PDpdjqWdIcmQZEi9fxnduEesDaY9jY4/s1600/Slide2.JPG" height="320" width="317" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">Chart 2/4</td></tr>
</tbody></table>
<div class="MsoNormal">
<b style="mso-bidi-font-weight: normal;"><br /></b></div>
<div class="MsoNormal">
<b style="mso-bidi-font-weight: normal;">Closed ended
right-count four-option multiple-choice scores </b>start with zero and increase
with each right mark. Not really!! This is only how this method of scoring has
been marketed for a century by only considering a score based on right-counts
after the test is completed. In the first place traditional multiple-choice is
not multiple-choice, but forced-choice (it lacks one option discussed below).
This injects a 25% bonus (on average) at the start of the test (Chart 2/4). This
evil flaw in test design was countered, over 50 years ago, by a now defunct
“formula scoring”. After forcing students to guess, psychometricians wanted to remove
the effect of just marking! It took the SAT until March of this year, 2014, to
drop this “score correction”.<span style="mso-spacerun: yes;"> </span></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
[Since there was no way to tell which right answer must be
changed for the correction, it made no sense to anyone other than psychometricians
wanting to optimize their data reduction tools, with disregard for the effect of
the correction on the students taking such a test. Now that 4-option questions
have become popular on standardized tests, a student who can eliminate one
option can guess from the remaining three for better odds on getting a right
mark (which is not necessarily a right answer that reflects recall,
understanding, or skill).]</div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
The closed ended right-count four-option multiple-choice question
is inflexible (one right answer) and with no scoring subjectivity; all scorers
yield the same count of right marks. Again, the average test score is
controlled by selecting a set of items expected to yield 75% on-average (60%
passing). However, this 75% is not the same as that for the open ended short
answer test. As a forced-choice test, the multiple-choice test will be easier;
it starts with a 25% on-average advantage. (That means one student may start
with 15% and a classmate with 35%.) To further confound things, the level of
thinking used by students can also vary. A forced-choice test can be marked
entirely at lower levels of thinking.</div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
[Standardized tests control part of the above problems by
eliminating almost all mastery and unfinished items. The game is to use the
fewest items that will produce a desired score distribution with an acceptable
reliability. A traditional multiple-choice scored standardized test score of 60%
is a much more difficult accomplishment than the same score on a classroom
test.]</div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
A forced-choice test score is a rank of how well a student
did on a test. It is not a report of what a student actually knows or can do
that will serve as the basis for further instruction and learning. The
reasoning is rather simple: the forced-choice score is counted up AFTER the
test is finished; this is the final game score. How the game started (25%
on-average) and was played is not observed (but this is what sports fans pay
for). This is what students and teachers need to know so students can take
responsibility for self-corrective learning.</div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<table cellpadding="0" cellspacing="0" class="tr-caption-container" style="float: left; text-align: left;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEixrdReJZVhuVki_RRCc4FsELFzRT1_y6fycFxjxMDXEQGD5sCM0rdZZT21bcwonobvPQ0jiimKUvYMqfYFYUJCMx5lU88dJdXRUxN2-tE_HtMJzXnmReUXNL7MvNXYiyPrinWltWZfpW8/s1600/Slide3.JPG" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEixrdReJZVhuVki_RRCc4FsELFzRT1_y6fycFxjxMDXEQGD5sCM0rdZZT21bcwonobvPQ0jiimKUvYMqfYFYUJCMx5lU88dJdXRUxN2-tE_HtMJzXnmReUXNL7MvNXYiyPrinWltWZfpW8/s1600/Slide3.JPG" height="200" width="179" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">Chart 3/4</td></tr>
</tbody></table>
[Three student performances that all end up with a
traditional multiple-choice score of 60% are shown in Chart 3/4. The highest
quality student used two passes, “I know or can do this or I can eliminate all
the wrong options” and “I don’t have a clue”. The next lower quality student
used three passes, “I know or can do this”; “I can eliminate one or more answer
options before marking” and “I am just marking.” The lowest level of thinking
student just marks answers one pass, right and wrong, as most low quality,
lower level of thinking students do. But what takes place during the test is
not seen in the score made after the test. The lowest quality student must
review all past work (if tests are cumulative) or continue on with an
additional burden as a low quality student. A high quality student needs only
to check on what has not been learned.]<br />
<br />
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="float: right; margin-left: 1em; text-align: right;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg7PcjxLqejvUdhUgusGdmn6xPDg1JFxAAFRN2jxRe8Bppbmi-xEpdOKmZGJr4k7RbL5crLCKwZePOEfZ_tzXwNCT25_wDE1k2b__tmvNq75kIFjZyasfuqU822N9jJ_TdTDzlZHVuaMUY/s1600/Slide4.JPG" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg7PcjxLqejvUdhUgusGdmn6xPDg1JFxAAFRN2jxRe8Bppbmi-xEpdOKmZGJr4k7RbL5crLCKwZePOEfZ_tzXwNCT25_wDE1k2b__tmvNq75kIFjZyasfuqU822N9jJ_TdTDzlZHVuaMUY/s1600/Slide4.JPG" height="320" width="309" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">Chart 4/4</td></tr>
</tbody></table>
<br /></div>
<div class="MsoNormal">
<b style="mso-bidi-font-weight: normal;">Knowledge and
Judgment scores start </b>at 50% for every student plus one point for
acceptable and minus one point for not acceptable (right/wrong on traditional multiple-choice).
(Lower level of thinking students prefer: Wrong = 0, Omit = 1, and Right<span style="mso-spacerun: yes;"> </span>= 2) Omitting an answer is good
judgment to report what has yet to be learned or to be done (understood).
Omitting keeps the one point for good judgment. An unacceptable or wrong mark
is poor judgment. You lose one point for bad judgment. </div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
Now what is hidden with forced-choice scoring is visible
with knowledge and Judgment Scoring (KJS). Each student can show how the game is
played. There is a separate student score for quantity and for quality. A
starting score of 50% gives quantity and quality equal value (Chart 4/4). [Knowledge
Factor sets the starting score near 75%. Judgment is far more important than
knowledge in high risk occupations.]</div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
KJS includes a fifth answer option: omit (good judgment to
report what has yet to be learned or understood). When this option is not used,
the test reverts to forced-choice scoring (marking one of the four answer
options for every question).</div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
A high quality student marked 10 right out of 10 marked and
then omitted the remainder (in two passes through the test) or managed to do a
few of one right and one wrong (three passes) for a passing score of 60% in
Chart 4/4. A student of less quality did not omit but just marked for a score
of less than 50%. A lower level of thinking, low quality student marked 10
right and just marked the rest (two passes) for a score of less than 40%. KJS
yields a score based on student judgment (60%) or on the lack of that judgment
(less than 50%).</div>
<div class="MsoNormal">
<b style="mso-bidi-font-weight: normal;"><br /></b></div>
<div class="MsoNormal">
<b style="mso-bidi-font-weight: normal;">In summary</b>, the
current assessment fad is still oriented on right marks rather than on student
judgment (and development). Students with a practiced good judgment develop the
sense of responsibility needed to learn at all levels of thinking. They do not
have to wait for the teacher to tell them they are right. Learning is stimulated
and exhilarating. It is fun to learn when you can question, get answers, and
verify a right answer or a new level of understanding; when you can build on
your own trusted foundation. </div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
Low quality students learn by repeating the teacher. High
quality students learn by making sense of an assignment. Traditional multiple-choice
(TMC) assesses and rewards lower-levels-of-thinking. KJS assesses and rewards
all-levels-of-thinking. TMC requires little sense of responsibility. KJS
rewards (encourages) the sense of responsibility needed to learn at all levels
of thinking.</div>
<div class="MsoNormal" style="margin-left: .5in; mso-list: l0 level1 lfo1; text-indent: -.25in;">
<span style="mso-bidi-font-family: Calibri;"><span style="mso-list: Ignore;"><br /></span></span></div>
<div class="MsoNormal" style="margin-left: .5in; mso-list: l0 level1 lfo1; text-indent: -.25in;">
<!--[if !supportLists]--><span style="mso-bidi-font-family: Calibri;"><span style="mso-list: Ignore;">1.<span style="font: 7.0pt "Times New Roman";"> </span></span></span><!--[endif]-->A
<b style="mso-bidi-font-weight: normal;">short answer, hand scored, test score</b>
is an indicator of student ability and class ranking based on the scorer’s
judgment. The scorer can make a subjective estimate of student quality.</div>
<div class="MsoNormal" style="margin-left: .5in; mso-list: l0 level1 lfo1; text-indent: -.25in;">
<span style="mso-bidi-font-family: Calibri;"><span style="mso-list: Ignore;"><br /></span></span></div>
<div class="MsoNormal" style="margin-left: .5in; mso-list: l0 level1 lfo1; text-indent: -.25in;">
<!--[if !supportLists]--><span style="mso-bidi-font-family: Calibri;"><span style="mso-list: Ignore;">2.<span style="font: 7.0pt "Times New Roman";"> </span></span></span><!--[endif]-->A
<b style="mso-bidi-font-weight: normal;">TMC score</b> is only a rank on a completed
test with increased confounding at lower scores. A score matching a short
answer score is easier to obtain in the classroom and much more difficult to
obtain on a standardized test. </div>
<div class="MsoNormal" style="margin-left: .5in; mso-list: l0 level1 lfo1; text-indent: -.25in;">
<span style="mso-bidi-font-family: Calibri;"><span style="mso-list: Ignore;"><br /></span></span></div>
<div class="MsoNormal" style="margin-left: .5in; mso-list: l0 level1 lfo1; text-indent: -.25in;">
<!--[if !supportLists]--><span style="mso-bidi-font-family: Calibri;"><span style="mso-list: Ignore;">3.<span style="font: 7.0pt "Times New Roman";"> </span></span></span><!--[endif]-->A
<b style="mso-bidi-font-weight: normal;">KJS</b><b> test score</b> is based on a
student, self-reporting, estimate of what the student knows and can do on a
completed test (quantity) and an estimate of the student’s ability to make use of that
knowledge (judgment) during the test (quality). The score has student judgment
and quality, not scorer judgment and quality.</div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
In short, students who know that they can learn (get rapid
feedback on <b style="mso-bidi-font-weight: normal;">quantity and quality</b>),who
want to learn, enjoy learning (see Amplifire below). All testing methods fail
to promote these student development characteristics unless the test results
are meaningful, easy to use by students and teachers, and timely. Student
development requires student performance, not just talking about it or labeling
something formative assessment. <span style="mso-spacerun: yes;"> </span></div>
<div class="MsoNormal">
<b style="mso-bidi-font-weight: normal;"><br /></b></div>
<div class="MsoNormal">
<b style="mso-bidi-font-weight: normal;">Power Up Plus (PUP or
PowerUP)</b> scores both TMC and KJS. Students have the option of selecting the
method of scoring they are comfortable with. Such standardized tests have the
ability to estimate the level of thinking used in the classroom and by each
student.<span style="mso-spacerun: yes;"> </span>Lack of information,
misinformation, misconceptions and cheating can be detected by school, teacher,
classroom, and student.</div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
Power Up Plus is hosted at TeachersPayTeachers to share what
was learned in a nine year period with 3000 students at NWMSU. The <b style="mso-bidi-font-weight: normal;">free download</b> below supports individual
teachers who want to upgrade their multiple-choice tests for formative,
cumulative, and exit ticket assessment. Good teachers, working within the
bounds of accepted standards, do not need to rely on expensive assessments.
They (and their students) do need fast, easy to use, test results to develop
successful high quality students.</div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
I hope your students respond with the same positive
enthusiasm that over 90% of mine did. We need to assess students to promote
their abilities. We do not need to primarily assess students to promote the
development of psychometric tools that yield far less than what is marketed.</div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
A Brief History: </div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
Geoff Masters (1950-<span style="mso-spacerun: yes;"> </span>)<span style="mso-spacerun: yes;">
</span>A modification of traditional multiple-test test performance.</div>
<div class="MsoNormal">
<b style="mso-bidi-font-weight: normal;"><br /></b></div>
<div class="MsoNormal">
<b style="mso-bidi-font-weight: normal;">Created partial
credit scoring for the Rasch model</b> (1982) as a scoring refinement for
traditional right-count multiple-choice. It gives partial credit for near right
marks. It does not change the meaning of the right-count score (as quantity and
quality have the same value by default [both wrong marks and blanks are counted
as zeros], only quantity is scored). The routine is free in <a href="http://www.winsteps.com/ministep.htm">Ministep</a> software.</div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
Richard A. Hart (1930-<span style="mso-spacerun: yes;"> </span>)<span style="mso-spacerun: yes;">
</span>Promotes student development by student self-assessment of what each
student actually knows and can do, AFTER learning, with “next class period”
feedback.</div>
<div class="MsoNormal">
<b style="mso-bidi-font-weight: normal;"><br /></b></div>
<div class="MsoNormal">
<b style="mso-bidi-font-weight: normal;">Knowledge and
Judgment Scoring</b> was started as Net-Yield-Scoring in 1975. Later I used it to
reduce the time needed for students to write, and for me to score, short answer
and essay questions. I created software (1981) to score multiple-choice, both
right-count, and knowledge and judgment, to encourage students to take responsibility
for what they were learning at all levels of thinking in any subject area. Students
voted to give knowledge and judgment equal value. The right-count score retains
the same meaning (quantity of right marks) as above. The knowledge and judgment
score is a composite of the judgment score (quality, the “feel good” score
AFTER learning) and the right-count score (quantity). <a href="http://www.teacherspayteachers.com/Product/Smarter-Test-Scoring-and-Item-Analysis-1528371">Power
Up Plus</a> (2006) is classroom friendly (for students and teachers) and a free
download: <a href="http://www.teacherspayteachers.com/Product/Smarter-Test-Scoring-and-Item-Analysis-1528371">Smarter
Test Scoring and Item Analysis</a>.</div>
<div class="MsoNormal">
<b style="mso-bidi-font-weight: normal;"><br /></b></div>
<div class="MsoNormal">
<b style="mso-bidi-font-weight: normal;">Knowledge Factor</b>
(1995-<span style="mso-spacerun: yes;"> </span>) <span style="mso-spacerun: yes;"> </span><span style="mso-spacerun: yes;"> </span>Promotes student learning and retention by assessing student
knowledge and confidence, DURING learning, with “instant” feedback to develop “feeling
good” during learning. </div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<a href="http://www.knowledgefactor.com/">Knowledge Factor</a>
was built on the work of Walter R. Borg (1921-1990). The patented learning-assessment
program, Amplifire, places much more weight on confidence than on knowledge (a
wrong mark may reduce the score by three times as much as a right mark adds).
The software leads students through the steps needed to learn easily, quickly
and in a depth that is easily retained for more than a year. Students do not
have to master the study skills and the sense of responsibility needed to learn
at all levels of thinking needed for master with KJS. <a href="http://www.knowledgefactor.com/">Amplifire</a> is student friendly, online,
and so very commercially successful in developed topics that it is not free.</div>
<!--[if gte mso 9]><xml>
<o:OfficeDocumentSettings>
<o:AllowPNG/>
</o:OfficeDocumentSettings>
</xml><![endif]--><!--[if gte mso 9]><xml>
<w:WordDocument>
<w:Zoom>0</w:Zoom>
<w:TrackMoves>false</w:TrackMoves>
<w:TrackFormatting/>
<w:PunctuationKerning/>
<w:DrawingGridHorizontalSpacing>18 pt</w:DrawingGridHorizontalSpacing>
<w:DrawingGridVerticalSpacing>18 pt</w:DrawingGridVerticalSpacing>
<w:DisplayHorizontalDrawingGridEvery>0</w:DisplayHorizontalDrawingGridEvery>
<w:DisplayVerticalDrawingGridEvery>0</w:DisplayVerticalDrawingGridEvery>
<w:ValidateAgainstSchemas/>
<w:SaveIfXMLInvalid>false</w:SaveIfXMLInvalid>
<w:IgnoreMixedContent>false</w:IgnoreMixedContent>
<w:AlwaysShowPlaceholderText>false</w:AlwaysShowPlaceholderText>
<w:Compatibility>
<w:BreakWrappedTables/>
<w:DontGrowAutofit/>
<w:DontAutofitConstrainedTables/>
<w:DontVertAlignInTxbx/>
</w:Compatibility>
</w:WordDocument>
</xml><![endif]--><!--[if gte mso 9]><xml>
<w:LatentStyles DefLockedState="false" LatentStyleCount="276">
</w:LatentStyles>
</xml><![endif]-->
<!--[if gte mso 10]>
<style>
/* Style Definitions */
table.MsoNormalTable
{mso-style-name:"Table Normal";
mso-tstyle-rowband-size:0;
mso-tstyle-colband-size:0;
mso-style-noshow:yes;
mso-style-parent:"";
mso-padding-alt:0in 5.4pt 0in 5.4pt;
mso-para-margin:0in;
mso-para-margin-bottom:.0001pt;
mso-pagination:widow-orphan;
font-size:10.0pt;
font-family:"Times New Roman";
mso-ascii-font-family:Calibri;
mso-hansi-font-family:Calibri;}
</style>
<![endif]-->
<!--StartFragment-->
<!--EndFragment--><br />
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
[Judgment and confidence are not the same thing. Judgment is
measured by performance (percent of right marks), AFTER learning, at any level
of student score. Confidence is a good feeling that Amplifier skillfully uses
to promote rapid learning, DURING learning and self-assessment, into a mastery
level. Students can take confidence in their practiced and applied
self-judgment. The KJS and Amplifire test scores reflect the complete student.
IMHO standardized tests should do this also, considering their cost in time and
money.]</div>
Richard Harthttp://www.blogger.com/profile/04962997526156185761noreply@blogger.com0tag:blogger.com,1999:blog-6676724996771468267.post-90354205741826641052014-12-10T03:00:00.000-08:002014-12-10T03:00:03.043-08:00Information Functions - Adding Unbalanced Items<div class="MsoNormal">
13</div>
<div class="MsoNormal">
Adding 22 balanced items to Table 33 of 21 items, in the
prior post, resulted in a similar average test score (Table 36) and the same
item information functions (the added items were duplicates of those in the
first Nurse124 data set of 21 items.) What happens if an unbalance set of 6
items is added? I just deleted the 16 high scoring additions from Table 36.
Both balanced additions (Table 36) and unbalanced additions (Table 39) had the
same extended range of item difficulties (5 to 21 right marks, or 23% to 95%
difficulty).</div>
<div class="MsoNormal">
<br /></div>
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="float: left; margin-right: 1em; text-align: left;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEggRZYGBoP8L5ivyuwjUn4mSW0Do2P27e585-aT7hmwf-hezUOwgxYiAkWfWetGchTyzElO9asLSAqGi8vNh7Alww9L3IuDPmLNG-H5KN_eoOQUDTYwbcj77-26OeW8NYUvKoiEdJ6i7Uc/s1600/Table+33.jpg" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEggRZYGBoP8L5ivyuwjUn4mSW0Do2P27e585-aT7hmwf-hezUOwgxYiAkWfWetGchTyzElO9asLSAqGi8vNh7Alww9L3IuDPmLNG-H5KN_eoOQUDTYwbcj77-26OeW8NYUvKoiEdJ6i7Uc/s1600/Table+33.jpg" height="200" width="156" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">Table 33</td></tr>
</tbody></table>
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEinLDdranlunubN42RlfZyGr3NTFxIWGZZZXEQDyZ_QWtecPg04156zmFfJAIwYoUp-F9bX6-K0B77r3Eh21J80ELLhaqh-J7Ixt57kTO0s5Z6pvR6E1NmxQItDsypMurXr54S3hefe-yA/s1600/Table+36.jpg" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEinLDdranlunubN42RlfZyGr3NTFxIWGZZZXEQDyZ_QWtecPg04156zmFfJAIwYoUp-F9bX6-K0B77r3Eh21J80ELLhaqh-J7Ixt57kTO0s5Z6pvR6E1NmxQItDsypMurXr54S3hefe-yA/s1600/Table+36.jpg" height="106" width="200" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">Table 36</td></tr>
</tbody></table>
<div class="MsoNormal">
<table cellpadding="0" cellspacing="0" class="tr-caption-container" style="float: right; margin-left: 1em; text-align: right;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjPOASGb7D9cLP6scoZNCtvwaSebrq_3Agw8wCdcu5elzF75C9uplOnCjPLaWmKf9RDwgmlzu-L288d_LHW0VgjdXN9bhNnI-Dj-6VRKrxnRG-aJUE-H5BoHAS_37gzbdqlItD-kQ67S_Q/s1600/Table+39.jpg" imageanchor="1" style="clear: right; margin-bottom: 1em; margin-left: auto; margin-right: auto;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjPOASGb7D9cLP6scoZNCtvwaSebrq_3Agw8wCdcu5elzF75C9uplOnCjPLaWmKf9RDwgmlzu-L288d_LHW0VgjdXN9bhNnI-Dj-6VRKrxnRG-aJUE-H5BoHAS_37gzbdqlItD-kQ67S_Q/s1600/Table+39.jpg" height="150" width="200" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">Table 39</td></tr>
</tbody></table>
<br /></div>
<div class="MsoNormal">
Adding a balanced set of items to the Nurse124 data set kept
the average score the same: 80% and 79% (Table 36). Adding a set of more
difficult items to the Nurse124 data decreased the average score to 70% (Table
39) and decreased student scores. Traditionally, a student’s overall score is
then the average of the three test scores: 80%, 79% and 70% or 76% for an
average student (Tables 33, 36, and 39). An estimate of a student’s “ability”
is thus directly dependent upon his test scores which are dependent upon the
difficulty of the items on each test. This score is accepted as a best estimate
of the student’s true score. This value is a best guess of future test scores.
This makes common sense, that past is a predictor of future performance.</div>
<div class="MsoNormal">
<span style="mso-spacerun: yes;"><br /></span></div>
<div class="MsoNormal">
<span style="mso-spacerun: yes;"> </span>[Again a
distinction must be made between what is being measured by right mark scoring
(0,1) and by knowledge and judgment scoring (0,1,2). One yields a rank on a
test the student may not be able to read or understand. The other also
indicates the quality of each student’s knowledge; the ability to make meaningful
use of knowledge and skills. Both methods of analysis can use the exact same
tests. I continue to wonder why people are still paying full price but harvesting
only a portion of the results.] </div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
The Rasch model IRT takes a very different route to
“ability”. The very same student mark data sets can be used. Expected IRT student
scores are based on the probability that half of all students with a given
ability location will correctly mark a question with a comparable difficulty
location on a single logit scale. (More at <a href="http://winsteps.com/winsteps.htl">Winsteps</a> and my <a href="http://raschmodelaudit.blogspot.com/">Rasch Model Audit</a> blog.)<span style="mso-spacerun: yes;"> </span>[The location starts from the natural
log of a ratio of right/wrong score and wrong/right difficulty. A convergence
of score and difficulty yields the final location. The 50% test score becomes
the zero logit location, the only point right mark scoring and IRT scores are
in full agreement.]</div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
The Rasch model IRT converts student scores and item
difficulties [in the marginal cells of student data] into the probabilities of
a right answer (Table 33b). [The probabilities replace the marks in the central
cell field of student data.] It also yields raw student scores, and their conditional
standard error of measurements (CSEM)s (Table 33c, 34c, and 39c) based on the <b style="mso-bidi-font-weight: normal;">probabilities</b> of a right answer rather
than the <b style="mso-bidi-font-weight: normal;">count</b> of right marks. (For
more see my <a href="http://raschmodelaudit.blogspot.com/">Rasch Model Audit</a> blog.) </div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
Student ability becomes fixed and separated from the student
test score; a student with a given ability can obtain a range of scores on
future tests without affecting his ability location. A calibrated item can yield a range
of difficulties on future tests without affecting its difficulty calibrated location. This makes
sense only in relation to the trust you can have in the person interpreting IRT
results; that person’s skill, knowledge, and (most important) experience at all
levels of assessment: student performance expectations, test blueprint, and politics.</div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
In practice, student data that do not fit well, “look
right”, can be eliminated from the data set. Also the same data set (Table 33,
Table 36, and Table 39) can be treated differently if it is classified as field
test, operational test, benchmark test, or current test. </div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
At this point states recalibrated and creatively
equilibrated test results to optimize federal dollars during the NCLB era by
showing <b style="mso-bidi-font-weight: normal;">gradual</b> <b style="mso-bidi-font-weight: normal;">continuing</b> improvement. <span style="mso-spacerun: yes;"> </span>It is time to end the ranking of students by right mark
scoring (0,1 scoring) and <b style="mso-bidi-font-weight: normal;">include</b> KJS,
or PCM (0,1,2 scoring) [that about every state education department has:
<a href="http://winsteps.com/winsteps.htl">Winsteps</a>], so that standardized testing yields the results needed to guide
student development: the main goal of the CCSS movement.</div>
<!--[if gte mso 9]><xml>
<o:OfficeDocumentSettings>
<o:AllowPNG/>
</o:OfficeDocumentSettings>
</xml><![endif]--><!--[if gte mso 9]><xml>
<w:WordDocument>
<w:Zoom>0</w:Zoom>
<w:TrackMoves>false</w:TrackMoves>
<w:TrackFormatting/>
<w:PunctuationKerning/>
<w:DrawingGridHorizontalSpacing>18 pt</w:DrawingGridHorizontalSpacing>
<w:DrawingGridVerticalSpacing>18 pt</w:DrawingGridVerticalSpacing>
<w:DisplayHorizontalDrawingGridEvery>0</w:DisplayHorizontalDrawingGridEvery>
<w:DisplayVerticalDrawingGridEvery>0</w:DisplayVerticalDrawingGridEvery>
<w:ValidateAgainstSchemas/>
<w:SaveIfXMLInvalid>false</w:SaveIfXMLInvalid>
<w:IgnoreMixedContent>false</w:IgnoreMixedContent>
<w:AlwaysShowPlaceholderText>false</w:AlwaysShowPlaceholderText>
<w:Compatibility>
<w:BreakWrappedTables/>
<w:DontGrowAutofit/>
<w:DontAutofitConstrainedTables/>
<w:DontVertAlignInTxbx/>
</w:Compatibility>
</w:WordDocument>
</xml><![endif]--><!--[if gte mso 9]><xml>
<w:LatentStyles DefLockedState="false" LatentStyleCount="276">
</w:LatentStyles>
</xml><![endif]-->
<!--[if gte mso 10]>
<style>
/* Style Definitions */
table.MsoNormalTable
{mso-style-name:"Table Normal";
mso-tstyle-rowband-size:0;
mso-tstyle-colband-size:0;
mso-style-noshow:yes;
mso-style-parent:"";
mso-padding-alt:0in 5.4pt 0in 5.4pt;
mso-para-margin-top:0in;
mso-para-margin-right:0in;
mso-para-margin-bottom:10.0pt;
mso-para-margin-left:0in;
line-height:115%;
mso-pagination:widow-orphan;
font-size:12.0pt;
font-family:"Times New Roman";
mso-ascii-font-family:Calibri;
mso-ascii-theme-font:minor-latin;
mso-fareast-font-family:"Times New Roman";
mso-fareast-theme-font:minor-fareast;
mso-hansi-font-family:Calibri;
mso-hansi-theme-font:minor-latin;}
</style>
<![endif]--><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026"/>
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1"/>
</o:shapelayout></xml><![endif]-->
<!--StartFragment-->
<!--EndFragment--><br />
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
The need to equilibrate a test is an admission of failure.
The practice has become “normal” because failure is so common. It opened the
door to cheating at state and national levels. [To my knowledge no one has been
charged and convicted of a crime for this cheating.] Current computer adaptive
testing (CAT) hovers about the 50% level of difficulty. This optimizes
psychometric tools. Having a disinterested party outside of the educational
community doing the assessment analysis and online <a href="https://www.blogger.com/blogger.g?blogID=6676724996771468267" name="_GoBack"></a>CAT
reduce the opportunity to cheat. They do not IMHO optimize the usefulness of
the test results. End-of-course tests are now molding standardized testing into
an instrument to evaluate teacher effectiveness rather than assess student
knowledge and judgment (student development).<br />
<br />
- - - - - - - - - - - - - - - - - - - - -<br />
<div class="MsoNormal" style="-webkit-text-stroke-width: 0px; color: black; font-family: Times; font-size: medium; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px;">
<div class="MsoNormal">
<div style="margin: 0px;">
<b><span style="font-family: "Arial Bold";"><br /></span></b></div>
</div>
<div class="MsoNormal">
<div style="margin: 0px;">
<b><span style="font-family: "Arial Bold";">The Best of the Blog - FREE<o:p></o:p></span></b></div>
</div>
<div class="MsoNormal">
<div style="margin: 0px;">
<br /></div>
</div>
<div class="MsoNormal">
<div style="margin: 0px;">
The Visual Education Statistics Engine (VESEngine) presents the common education statistics on one Excel traditional two-dimensional spreadsheet. The post includes definitions. <a href="http://richard-hart.blogspot.com/2013/10/multiple-choice-test-analysis-summary.html">Download</a> as .xlsm or .xls.</div>
</div>
<div class="MsoNormal">
<div style="margin: 0px;">
<br /></div>
</div>
<div class="MsoNormal">
<div style="margin: 0px;">
This blog started five years ago. It has meandered through several views. The current project is <a href="http://richard-hart.blogspot.com/2014/02/test-scoring-mathematical-model.html">visualizing</a> the VESEngine in three dimensions. The observed student mark patterns (on their answer sheets) are on one level. The variation in the mark patterns (variance) is on the second level.</div>
</div>
<div style="margin: 0px;">
<span style="font-size: 12pt; line-height: 18px;"></span><br /></div>
<div class="MsoNormal">
<div style="margin: 0px;">
Power Up Plus (PUP) is classroom friendly software used to score and analyze what students guess (traditional multiple-choice) <b>and</b> what they report as the basis for further learning and instruction (knowledge and judgment scoring multiple-choice). This is a quick way to update your multiple-choice to meet Common Core State Standards (promote understanding as well as rote memory). Knowledge and judgment scoring originated as a classroom project, starting in 1980, that converted passive pupils into self-correcting highly successful achievers in two to nine months. Download as <a href="http://www.nine-patch.com/download/PUP522xlsm.zip">.xlsm</a> or <a href="http://www.nine-patch.com/download/PUP522xls.zip">.xls</a>.</div>
</div>
</div>
</div>
Richard Harthttp://www.blogger.com/profile/04962997526156185761noreply@blogger.com0tag:blogger.com,1999:blog-6676724996771468267.post-20155851710572284632014-11-12T03:30:00.000-08:002014-11-12T03:30:01.164-08:00Information Functions - Adding Balanced Items<div class="MsoNormal">
<span style="mso-spacerun: yes;"> </span>12</div>
<div class="MsoNormal">
I learned in the prior post that <b style="mso-bidi-font-weight: normal;">test</b> precision can be adjusted by selecting the needed set of <b style="mso-bidi-font-weight: normal;">items</b> based on their item information
functions (IIF). This post makes use of that observation to improve the
Nurse124 data set that generated the set of IFFs in Chart 75.</div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
I observed that Tables 33 and 34, in the prior post,
contained no items with difficulties below 45%. The item information functions
(IIF) were also skewed (Chart 75). This is not the symmetrical display
associated with the Rasch IRT model. I reasoned that adding a balanced set of
items would increase the number of IFFs without changing the average item
difficulty.</div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
Table 36a shows the addition of a balanced set of 22 items
to the Nurse124 data set of 21 items. As each lower ranking item was added, one
or more high ranking items were added to keep the average test score near 80%.
This table added six lower ranking items and 16 higher scoring items resulting
in an average score of 79% and 43 items total.</div>
<div class="MsoNormal">
<br /></div>
<table cellpadding="0" cellspacing="0" class="tr-caption-container" style="float: right; text-align: right;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjxEFJ7_ahRgn2XJaHCXKs4JyTFejl_BJKVAltN-Kca6FFPh8If-kMPydbCBru8F6lK6KpNyLMbjnqBwnxyJ9JGrpuMKs3BaCEBvl5sVR2QphFyQ6Uok7jwRv9eyGoIJHoogZTM1nthDrQ/s1600/Table+36.jpg" imageanchor="1" style="clear: right; margin-bottom: 1em; margin-left: auto; margin-right: auto;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjxEFJ7_ahRgn2XJaHCXKs4JyTFejl_BJKVAltN-Kca6FFPh8If-kMPydbCBru8F6lK6KpNyLMbjnqBwnxyJ9JGrpuMKs3BaCEBvl5sVR2QphFyQ6Uok7jwRv9eyGoIJHoogZTM1nthDrQ/s1600/Table+36.jpg" height="106" width="200" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">Table 36</td></tr>
</tbody></table>
<div class="MsoNormal">
The average item difficulty for the Nurse124 data set was
17.57 and the expanded set was 17.28. The average test score of 80% came in as
79%. Student scores (ability) also remained about the same. [I did not take the
time to tweak the additions for a better fit.] Both item difficulty and student
score (ability) remained about the same.</div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
The conditional standard error of measurement (CSEM) did
change with the addition of more items (Chart 79 below). The number of cells
containing information expanded from 99 to 204 cells. The average right count
student score increased from 17 to 34.</div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
Table 36c shows the resulting item information functions
(IIF). The original set of 11 IIFs now contains 17 IIFs (orange). The original set
of 9 different student scores now contains 12 different scores, however the
range of student scores is comparable between the two sets. This makes sense as
the average test scores are similar and the student scores are also about the
same.</div>
<table cellpadding="0" cellspacing="0" class="tr-caption-container" style="float: left; margin-right: 1em; text-align: left;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgx-VzRoQsrCMVYpSRCv1tNgCQjUHJPv1MDjg9_4uJdEt2uehsKOJeBNeJerL-AS5qI3X1SWFhGXdim0gzsMxHIRzpZTo1RJlNhUsllTglP03sH6Q3hg419RpT2vI0kl3whHpBL0Tyke_o/s1600/Table+37.jpg" imageanchor="1" style="clear: left; margin-bottom: 1em; margin-left: auto; margin-right: auto;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgx-VzRoQsrCMVYpSRCv1tNgCQjUHJPv1MDjg9_4uJdEt2uehsKOJeBNeJerL-AS5qI3X1SWFhGXdim0gzsMxHIRzpZTo1RJlNhUsllTglP03sH6Q3hg419RpT2vI0kl3whHpBL0Tyke_o/s1600/Table+37.jpg" height="79" width="200" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">Table 37</td></tr>
</tbody></table>
<table cellpadding="0" cellspacing="0" class="tr-caption-container" style="float: right; margin-left: 1em; text-align: right;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgsdxmf2UNMitu59gFnI_sq8IAE0Kh-1bfMsevpfpE1K-Dbp5wMd2JkrJ1dRw_p9x0jywUOAUiRE__3PJAGiBBomUM-IiQvMLVPXwec3SAwTg5UynXOxWD4O5FmVB7c4V6_TMEwiyUwqWo/s1600/Chart+77.jpg" imageanchor="1" style="clear: left; margin-bottom: 1em; margin-left: auto; margin-right: auto;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgsdxmf2UNMitu59gFnI_sq8IAE0Kh-1bfMsevpfpE1K-Dbp5wMd2JkrJ1dRw_p9x0jywUOAUiRE__3PJAGiBBomUM-IiQvMLVPXwec3SAwTg5UynXOxWD4O5FmVB7c4V6_TMEwiyUwqWo/s1600/Chart+77.jpg" height="145" width="200" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">Chart 77</td></tr>
</tbody></table>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
Chart 77 (Table 37) shows the 17 IIFs as they spread across the
student ability range of 12 rankings (student score right count/% right). The
trace for the IIF with a difficulty of 11/50% (blue square) peaks (0.25) near
the average test score of 79%. This was expected as the maximum information
value <b style="mso-bidi-font-weight: normal;">within</b> an IIF occurs when the
item difficulty and student ability score match. [The three bottom traces on
Chart 77 (blue, red, and green) have been colored in Table 37 as an aid in
relating the table and chart (rotate Table 37 counter-clockwise 90 degrees).] </div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
Even more important is the way the traces are increasingly
skewed the further the IIFs are away from this maximum, 11/50%, trace (blue
square, Chart 77). Also the IIF with a difficulty of 18/82%, near the average
test score, produced the identical total information (1.41) from both the
Nurse124 and the supplemented data sets. But these values also drifted apart
for the two data sets for IIFs of higher and lower difficulty. </div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
Two IIFs near the 50% difficulty point delivered the maximum
information (2.17). Here again is evidence that prompts psychometricians to
work closely to the 50% or zero logit point to optimize their tools when
working on low quality data (limiting scoring only to right counts rather than also
offering students the option to assess their judgment to report what is
actually meaningful and useful; to assess their <a href="http://www.nine-patch.com/">development</a> toward being a successful,
independent, high quality achiever). [Students that only need some guidance
rather than endless “re-teaching”; that, for the most part, consider right
count standardized tests a joke and a waste of time.] </div>
<div class="MsoNormal">
<table cellpadding="0" cellspacing="0" class="tr-caption-container" style="float: right; text-align: right;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhKhWAWGrO9Kvlp1YFl-2SDTT5wNPTRTXOiCYQcEJMk7Vu4MO6LS5uMlWiQRqxH7gmu-P0zA_XbCK4wfSjX-OZ_yvYe-o4fyWPkgofz4CIinnZYt4ZU8JDv47NVnQCdUBr6xzqM8TvWxKs/s1600/Chart+78.jpg" imageanchor="1" style="clear: right; margin-bottom: 1em; margin-left: auto; margin-right: auto;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhKhWAWGrO9Kvlp1YFl-2SDTT5wNPTRTXOiCYQcEJMk7Vu4MO6LS5uMlWiQRqxH7gmu-P0zA_XbCK4wfSjX-OZ_yvYe-o4fyWPkgofz4CIinnZYt4ZU8JDv47NVnQCdUBr6xzqM8TvWxKs/s1600/Chart+78.jpg" height="160" width="200" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">Chart 78</td></tr>
</tbody></table>
<br /></div>
<div class="MsoNormal">
<table cellpadding="0" cellspacing="0" class="tr-caption-container" style="float: left; text-align: left;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi4vA3p5FGiMFbVFCYD8PQT9e-MmnGSskDkoTC26fiR7hYtHd459fun0oWGRNGzS9cPvdhLtJ0S0CwCADMq8F89j6-luATg8Yi7V9wVFNCRJhorOiU7w26h9QUzmA2sG9ieCTjxBr7ApHs/s1600/Table+38.jpg" imageanchor="1" style="clear: right; margin-bottom: 1em; margin-left: auto; margin-right: auto;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi4vA3p5FGiMFbVFCYD8PQT9e-MmnGSskDkoTC26fiR7hYtHd459fun0oWGRNGzS9cPvdhLtJ0S0CwCADMq8F89j6-luATg8Yi7V9wVFNCRJhorOiU7w26h9QUzmA2sG9ieCTjxBr7ApHs/s1600/Table+38.jpg" height="80" width="200" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">Tabel 38</td></tr>
</tbody></table>
The test information function for the supplemented data set
Is the sum of the information in all 17 item information functions (Table 38
and Chart 78). It took 16 easy items to balance 6 difficult items. The result
was a marked increase in precision at the student score levels between 30/70%
and 32/74%. [More at <a href="http://raschmodelaudit.blogspot.com/">Rasch Model
Audit</a> blog.]</div>
<a href="https://www.blogger.com/blogger.g?blogID=6676724996771468267" name="_GoBack"></a><br />
<table cellpadding="0" cellspacing="0" class="tr-caption-container" style="float: left; margin-right: 1em; text-align: left;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg6Is-bVdDxdo-OPYDp-Fs0xlee-CyveHmXAbQI87VCJj5PwpcT9LFALiw7B_EGs8CNH8xyFxGO2M9_dKJHsUXuoHsKpi737stkZDR_Ir4FTn7WxlhX_-cuq-1F0AGfgpDO4a6ESAR1jdQ/s1600/Chart+79.jpg" imageanchor="1" style="clear: left; margin-bottom: 1em; margin-left: auto; margin-right: auto;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg6Is-bVdDxdo-OPYDp-Fs0xlee-CyveHmXAbQI87VCJj5PwpcT9LFALiw7B_EGs8CNH8xyFxGO2M9_dKJHsUXuoHsKpi737stkZDR_Ir4FTn7WxlhX_-cuq-1F0AGfgpDO4a6ESAR1jdQ/s1600/Chart+79.jpg" height="115" width="200" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">Chart 79</td></tr>
</tbody></table>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
Chart 79 summarizes the relationships between the Nurse124
data, the supplemented data (adding a balanced set of items that keeps student
ability and item difficulty unchanged), and the CTT and IRT data reduction
methods. The IRT logit values (green) were plotted directly and inverted (1/CSEM)
for comparison. In general, both CTT (blue) and IRT inverted (red) produced <span style="mso-spacerun: yes;"> </span>comparable CSEM values. </div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
Adding 22 items increased the CTT Test SEM from 1.75 to
2.54. The standard deviation (SD) between student test scores increased from
2.07 to 4.46. The relative effect being, 1.75/2.07 and 2.54/4.46, or 84% and
57% with a difference of 27, or an improvement in precision of 27/84 or 32%. </div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
Chart 79 also makes it very obvious that the higher the
student test score the lower the CTT CSEM, the more precise the student score measurement,
the less error. That makes sense.</div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
The above statement about a CTT CSEM must be related to a
second statement that the more item information, the greater the precision of
measurement by the item at this student score rank. The first statement
harvests variance from the central cell field from <b style="mso-bidi-font-weight: normal;">within rows of student (right) marks</b> (Table 36a) and from <b style="mso-bidi-font-weight: normal;">rows of probabilities (of right marks)</b>
in Table 36c. </div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
The <a href="http://www.rasch.org/rmt/rmt204f.htm">binomial
variance</a> CTT CSEM view is then comparable to the reciprocal or inverted
(1/CSEM) view of the test information function CSEM view (Chart 79). CTT (blue,
CTT Nurse124, Chart 79) and IRT inverted (red, IRT N124 Inverted) produced
similar results even with an average test score of 79% that is 29 percentage
points away from the 50%, zero logit, IRT optimum performance point. </div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
The second statement harvests variance, item information
functions, in Table 36c from <b style="mso-bidi-font-weight: normal;">columns of
probabilities (of right marks).</b> Layering one IIF on top of another across
the student score distribution yields the test information function (Chart 78).</div>
<!--[if gte mso 9]><xml>
<o:OfficeDocumentSettings>
<o:AllowPNG/>
</o:OfficeDocumentSettings>
</xml><![endif]--><!--[if gte mso 9]><xml>
<w:WordDocument>
<w:Zoom>0</w:Zoom>
<w:TrackMoves>false</w:TrackMoves>
<w:TrackFormatting/>
<w:PunctuationKerning/>
<w:DrawingGridHorizontalSpacing>18 pt</w:DrawingGridHorizontalSpacing>
<w:DrawingGridVerticalSpacing>18 pt</w:DrawingGridVerticalSpacing>
<w:DisplayHorizontalDrawingGridEvery>0</w:DisplayHorizontalDrawingGridEvery>
<w:DisplayVerticalDrawingGridEvery>0</w:DisplayVerticalDrawingGridEvery>
<w:ValidateAgainstSchemas/>
<w:SaveIfXMLInvalid>false</w:SaveIfXMLInvalid>
<w:IgnoreMixedContent>false</w:IgnoreMixedContent>
<w:AlwaysShowPlaceholderText>false</w:AlwaysShowPlaceholderText>
<w:Compatibility>
<w:BreakWrappedTables/>
<w:DontGrowAutofit/>
<w:DontAutofitConstrainedTables/>
<w:DontVertAlignInTxbx/>
</w:Compatibility>
</w:WordDocument>
</xml><![endif]--><!--[if gte mso 9]><xml>
<w:LatentStyles DefLockedState="false" LatentStyleCount="276">
</w:LatentStyles>
</xml><![endif]-->
<!--[if gte mso 10]>
<style>
/* Style Definitions */
table.MsoNormalTable
{mso-style-name:"Table Normal";
mso-tstyle-rowband-size:0;
mso-tstyle-colband-size:0;
mso-style-noshow:yes;
mso-style-parent:"";
mso-padding-alt:0in 5.4pt 0in 5.4pt;
mso-para-margin-top:0in;
mso-para-margin-right:0in;
mso-para-margin-bottom:10.0pt;
mso-para-margin-left:0in;
line-height:115%;
mso-pagination:widow-orphan;
font-size:12.0pt;
font-family:"Times New Roman";
mso-ascii-font-family:Calibri;
mso-ascii-theme-font:minor-latin;
mso-fareast-font-family:"Times New Roman";
mso-fareast-theme-font:minor-fareast;
mso-hansi-font-family:Calibri;
mso-hansi-theme-font:minor-latin;}
</style>
<![endif]--><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026"/>
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1"/>
</o:shapelayout></xml><![endif]-->
<!--StartFragment-->
<!--EndFragment--><br />
<div class="MsoNormal">
<br />
The Rasch IRT model harvests the variance from rows and from columns of probabilities of getting</div>
<div class="MsoNormal">
a right answer that were generated
from the marginal student scores and item difficulties. CTT harvests from the variance of the marks students actually made. Yet,
at the count only right mark level, they deliver very similar results, with the
exception of the IIF from IRT analysis that the CTT analysis does do.<br />
<br />
<div class="MsoNormal">
- - - - - - - - - - - - - - - - - - - - -</div>
<div class="MsoNormal">
<b><span style="font-family: "Arial Bold";"><br /></span></b></div>
<div class="MsoNormal">
<b><span style="font-family: "Arial Bold";">The Best of the Blog - FREE<o:p></o:p></span></b></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
The Visual Education Statistics Engine (VESEngine) presents the common education statistics on one Excel traditional two-dimensional spreadsheet. The post includes definitions. <a href="http://richard-hart.blogspot.com/2013/10/multiple-choice-test-analysis-summary.html">Download</a> as .xlsm or .xls.</div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
This blog started five years ago. It has meandered through several views. The current project is <a href="http://richard-hart.blogspot.com/2014/02/test-scoring-mathematical-model.html">visualizing</a> the VESEngine in three dimensions. The observed student mark patterns (on their answer sheets) are on one level. The variation in the mark patterns (variance) is on the second level.</div>
<span style="font-size: 12pt; line-height: 18px;"></span><br />
<div class="MsoNormal">
Power Up Plus (PUP) is classroom friendly software used to score and analyze what students guess (traditional multiple-choice) <b>and</b> what they report as the basis for further learning and instruction (knowledge and judgment scoring multiple-choice). This is a quick way to update your multiple-choice to meet Common Core State Standards (promote understanding as well as rote memory). Knowledge and judgment scoring originated as a classroom project, starting in 1980, that converted passive pupils into self-correcting highly successful achievers in two to nine months. Download as <a href="http://www.nine-patch.com/download/PUP522xlsm.zip">.xlsm</a> or <a href="http://www.nine-patch.com/download/PUP522xls.zip">.xls</a>.</div>
</div>
<div class="MsoNormal">
<br /></div>
Richard Harthttp://www.blogger.com/profile/04962997526156185761noreply@blogger.com0tag:blogger.com,1999:blog-6676724996771468267.post-10732361032982751342014-10-08T03:00:00.000-07:002014-10-08T03:00:02.569-07:00Customizing Test Precision - Information Functions<div class="MsoNormal">
<span style="font-size: 12.0pt; line-height: 115%; mso-bidi-font-size: 11.0pt;"><span style="mso-spacerun: yes;"> </span>11<o:p></o:p></span></div>
<div class="MsoNormal">
<span style="font-size: 12.0pt; line-height: 115%; mso-bidi-font-size: 11.0pt;"><br /></span></div>
<div class="MsoNormal">
<span style="font-size: 12.0pt; line-height: 115%; mso-bidi-font-size: 11.0pt;">(Continued from the prior two posts.) <o:p></o:p></span></div>
<div class="MsoNormal">
<span style="font-size: 12.0pt; line-height: 115%; mso-bidi-font-size: 11.0pt;"><br /></span></div>
<div class="MsoNormal">
<span style="font-size: 12.0pt; line-height: 115%; mso-bidi-font-size: 11.0pt;">The past two posts have established that there is little
difference between classical test theory (CTT) and item response theory (IRT)
in respect to test reliability and conditional error of measurement (CSEM)
estimates (other than the change in scales). IRT now is the analysis of choice
for standardized tests. The Rasch model IRT is the easiest to use and also works
well with small data sets including classroom tests. How two normal scales for
student scores and item difficulties are combined onto one IRT logit scale is
no longer a concern to me, other than the same method must be used throughout the
duration of an assessment program.<o:p></o:p></span></div>
<div class="MsoNormal">
<span style="font-size: 12.0pt; line-height: 115%; mso-bidi-font-size: 11.0pt;"><br /></span></div>
<div class="MsoNormal">
<div style="text-align: right;">
</div>
<table cellpadding="0" cellspacing="0" class="tr-caption-container" style="float: right; margin-left: 1em; text-align: right;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg-xpA4pZh6l5BkPrlRaK8sXNIrBeciHEG1A2jn9xHayIqv16URMHMiNHfEo8K2wTZfs8F19blioK3VYFhsL8nDHrPAmAzTy3TlyKuX8Vliqc35nQU2Dnx1khpP0u-6RvBg9-F-TLzFB7g/s1600/Table+33.jpg" imageanchor="1" style="clear: right; margin-bottom: 1em; margin-left: auto; margin-right: auto;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg-xpA4pZh6l5BkPrlRaK8sXNIrBeciHEG1A2jn9xHayIqv16URMHMiNHfEo8K2wTZfs8F19blioK3VYFhsL8nDHrPAmAzTy3TlyKuX8Vliqc35nQU2Dnx1khpP0u-6RvBg9-F-TLzFB7g/s1600/Table+33.jpg" height="200" width="156" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">Table 33</td></tr>
</tbody></table>
<span style="font-size: 12.0pt; line-height: 115%; mso-bidi-font-size: 11.0pt;">What is new and different from CTT is an additional insight
from the IRT data in Table 32c (information p*q values). I copied Table 32 into
Table 33 with some editing. I colored the cells holding the maximum amount of
information (0.25) yellow in Table 33c. This color was then carried back to
Table 33a, Right and Wrong Marks. [Item Information is related to the marginal
cells in Table 33a (as <b style="mso-bidi-font-weight: normal;">probabilities</b>),
and not to the central cell field (as mark <b style="mso-bidi-font-weight: normal;">counts</b>).]
The eleven item information functions (in <b style="mso-bidi-font-weight: normal;">columns</b>)
were re-tabled into Table 34 and graphed in Chart 75. [Adding the information in <b style="mso-bidi-font-weight: normal;">rows</b> yields the student score CSEM in
Table 33c.]<o:p></o:p></span><br />
<span style="font-size: 12.0pt; line-height: 115%; mso-bidi-font-size: 11.0pt;"><br /></span></div>
<div class="MsoNormal">
<table cellpadding="0" cellspacing="0" class="tr-caption-container" style="float: left; margin-right: 1em; text-align: left;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgvEz_BfK1QJ8YSzvAgi_eRCMIjwP-m8cpXWtp_ANBuVTCKkRSKOp-HANtKKi5EBXX3PrN2dIPCkud8iafVOwbpUPn9cveKpZgVIbIhhP9gtf0C5Dx05stIV2gnQqjZz10qzp8_ikOEKq8/s1600/Table+34.jpg" imageanchor="1" style="clear: left; margin-bottom: 1em; margin-left: auto; margin-right: auto;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgvEz_BfK1QJ8YSzvAgi_eRCMIjwP-m8cpXWtp_ANBuVTCKkRSKOp-HANtKKi5EBXX3PrN2dIPCkud8iafVOwbpUPn9cveKpZgVIbIhhP9gtf0C5Dx05stIV2gnQqjZz10qzp8_ikOEKq8/s1600/Table+34.jpg" height="91" width="200" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">Table 34</td></tr>
</tbody></table>
</div>
<div style="text-align: left;">
</div>
<div class="MsoNormal">
<table cellpadding="0" cellspacing="0" class="tr-caption-container" style="float: right; text-align: right;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhIEsIrbAR2xXos-dfxXAgHQyBJTLucZaefAWaEMKDLpDCjLO4UsPtLzSI44YR5woecS1CCKRdIEAV_ZtU8p-UE4jw1MM5438Q2sbvOlefNtfgASegDrdVk3c0Mptym6Q2EMftUSv5rZLA/s1600/Chart+75.jpg" imageanchor="1" style="clear: right; margin-bottom: 1em; margin-left: auto; margin-right: auto;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhIEsIrbAR2xXos-dfxXAgHQyBJTLucZaefAWaEMKDLpDCjLO4UsPtLzSI44YR5woecS1CCKRdIEAV_ZtU8p-UE4jw1MM5438Q2sbvOlefNtfgASegDrdVk3c0Mptym6Q2EMftUSv5rZLA/s1600/Chart+75.jpg" height="140" width="200" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">Chart 75</td></tr>
</tbody></table>
<span style="font-size: 12pt; line-height: 115%;">The Nurse124 data yielded an average test score of 16.8
marks or 80%. This skewed the </span><b style="font-size: 12pt; line-height: 115%;">item
information functions</b><span style="font-size: 12pt; line-height: 115%;"> away from the 50% or zero logit difficulty point
(Chart 75). The more difficult the item, the more information developed, from
0.49 to 1.87 for 95% right count to a maximum at 54% and 45% right count. [No item on the
test had a difficulty of 50%.]</span><br />
<br /></div>
<div class="MsoNormal">
<div style="text-align: left;">
</div>
<table cellpadding="0" cellspacing="0" class="tr-caption-container" style="float: left; margin-right: 1em; text-align: left;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiDcHyObnJ5_y6FhGW0uhIYB6hGyhbzdCghNBB3LBqD8FP6m6SXPev9fmO_HjtNjUGZ2ILxqTzPQzzbyxtZiOFGjcmfYVu2J0xSyTOn4m8WIijPUXPM1hxYWd2690ai0oihCNsMqbBwo5Y/s1600/Table+35.jpg" imageanchor="1" style="clear: left; margin-bottom: 1em; margin-left: auto; margin-right: auto;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiDcHyObnJ5_y6FhGW0uhIYB6hGyhbzdCghNBB3LBqD8FP6m6SXPev9fmO_HjtNjUGZ2ILxqTzPQzzbyxtZiOFGjcmfYVu2J0xSyTOn4m8WIijPUXPM1hxYWd2690ai0oihCNsMqbBwo5Y/s1600/Table+35.jpg" height="90" width="200" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">Table 35</td></tr>
</tbody></table>
</div>
<div class="MsoNormal">
<div style="text-align: left;">
</div>
<table cellpadding="0" cellspacing="0" class="tr-caption-container" style="float: right; text-align: right;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiiW-yw5OZz2_MVC9ZnkSlYEq92FSWkywguvUIe6n3gKjip48lUrNarh4eTqnQiG9Svt-kGUfHGDveL1AcLmwxGT6Se_Az0Qcm_M1sX1R3tOtCs5KW66kOffsnj95ZJ4YIx1qqvJES42DM/s1600/Chart+76.jpg" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiiW-yw5OZz2_MVC9ZnkSlYEq92FSWkywguvUIe6n3gKjip48lUrNarh4eTqnQiG9Svt-kGUfHGDveL1AcLmwxGT6Se_Az0Qcm_M1sX1R3tOtCs5KW66kOffsnj95ZJ4YIx1qqvJES42DM/s1600/Chart+76.jpg" height="160" width="200" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">Chart 76</td></tr>
</tbody></table>
<span style="font-size: 12.0pt; line-height: 115%; mso-bidi-font-size: 11.0pt;">The sum of information (59.96) by item difficulty level and
student score level is tabled in Table 35 and plotted as the <b style="mso-bidi-font-weight: normal;">test information function</b> in Chart 76. This
test does not do a precise job of assessing student ability. The test was most
precise (19.32) at the 16 right count/76% right location. [Location can be
designated by measure (logit), input raw score (red) or output expected score (Table 33b).] <o:p></o:p></span></div>
<div class="separator" style="clear: both; text-align: center;">
</div>
<div style="text-align: right;">
</div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<span style="font-size: 12.0pt; line-height: 115%; mso-bidi-font-size: 11.0pt;">The item with an 18 right count/92% right difficulty (Table 35) did not
contribute the most information individually but did as a group of three items
(9.17). <span style="mso-spacerun: yes;"> </span>The three highest scoring,
easiest, items (counts of 19, 20, and 21) are just too easy for a standardized
test but may be important survey items needed to verify knowledge and skills for
this class of high performing students. None of these three items reached an information level
maximum of 1/4. [It now becomes apparent how items can be selected to produce a
desired test information function.]<o:p></o:p></span></div>
<div class="MsoNormal">
<span style="font-size: 12.0pt; line-height: 115%; mso-bidi-font-size: 11.0pt;"><br /></span></div>
<div class="MsoNormal">
<span style="font-size: 12.0pt; line-height: 115%; mso-bidi-font-size: 11.0pt;">The more information available is interpreted as greater
precision or less error (smaller CSEM in Table 33c). [CSEM = 1/SQRT(SUM(p*q))
on Table 33c. p*q is at a maximum when p = q; when right = wrong: (RT x WG)/(RT
+ WG)^2 or (3 x 3)/36 = 1/4].<o:p></o:p></span></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<span style="font-size: 12.0pt; line-height: 115%; mso-bidi-font-size: 11.0pt;">Each item information function spans the range of student
scores on the test (Chart 76). Each item information function measures student
ability most precisely near the point that item difficulty and student ability
match (50% right) along the IRT S-curve. [The more difficult an item, the more
ability students must have to mark correctly 50% of the time. Student ability
is the number correct on the S-curve. Item difficulty is the number wrong on the
S-curve (see more at </span><a href="http://raschmodelaudit.blogspot.com/"><span style="font-size: 12.0pt; line-height: 115%; mso-bidi-font-size: 11.0pt;">Rasch Model
Audit</span></a><span style="font-size: 12.0pt; line-height: 115%; mso-bidi-font-size: 11.0pt;">).] <span style="mso-spacerun: yes;"> </span><o:p></o:p></span></div>
<div class="MsoNormal">
<span style="font-size: 12.0pt; line-height: 115%; mso-bidi-font-size: 11.0pt;"><br /></span></div>
<div class="MsoNormal">
<span style="font-size: 12.0pt; line-height: 115%; mso-bidi-font-size: 11.0pt;">Extracting item information functions from a data table
provides a powerful tool (a test information function) for psychometricians to
customize a test (page 127, </span><a href="http://www.msde.maryland.gov/NR/rdonlyres/E865B914-1C2D-4B39-A276-FBC02765E950/28803/2010_MOD_Math_TechReport_041411.pdf"><span style="font-size: 12.0pt; line-height: 115%; mso-bidi-font-size: 11.0pt;">Maryland
2010</span></a><span style="font-size: 12.0pt; line-height: 115%; mso-bidi-font-size: 11.0pt;">). A test can be adjusted for maximum precision (minimum
CSEM) at a desired cut point.<o:p></o:p></span></div>
<div class="MsoNormal">
<span style="font-size: 12.0pt; line-height: 115%; mso-bidi-font-size: 11.0pt;"><br /></span></div>
<div class="MsoNormal">
<span style="font-size: 12.0pt; line-height: 115%; mso-bidi-font-size: 11.0pt;">The bright side of this is that the concept of “information”
(not applicable to CTT), and the ability to put student ability and item
difficulty on one scale, gives psychometricians powerful tools. The dark side
is that the form in which the test data is obtained remains at the lowest levels of thinking in the classroom. Over the past decade of the
NCLB era, as psychometrics has made marked improvements, the student mark data
it is being supplied has remained in the casino arena: Mark an answer to each
question (even if you cannot read or understand the question), do not guess,
and hope for good luck on test day. <o:p></o:p></span></div>
<div class="MsoNormal">
<span style="font-size: 12.0pt; line-height: 115%; mso-bidi-font-size: 11.0pt;"><br /></span></div>
<div class="MsoNormal">
<span style="font-size: 12.0pt; line-height: 115%; mso-bidi-font-size: 11.0pt;">The concepts of information, item discrimination and CAT all demand values hovering about the 50% point for peak
psychometric performance. Standardized testing has migrated away from letting
students report what they know and can do to a lottery that compares their
performance (luck on test day) on a minimum set of items randomly drawn from a set calibrated on the performance of a reference population on another test day. </span><br />
<span style="font-size: 12.0pt; line-height: 115%; mso-bidi-font-size: 11.0pt;"><br /></span>
<span style="font-size: 12.0pt; line-height: 115%; mso-bidi-font-size: 11.0pt;">The testing is optimized for psychometric performance, not for student performance. </span><span style="font-size: 12pt; line-height: 18px;">The range over which a student score may fall is critical to each student. The more precise the cut score, the narrower this range, the lower the number of students that fall below that point on the score distribution, which may have passed on another test day. In general, no teacher or student will ever know. [Please keep in mind that the psychometrician does not have to see the test questions. This blog has used the Nurse124 data without even showing the actual test questions or the test blueprint.] </span><br />
<span style="font-size: 12pt; line-height: 18px;"><br /></span></div>
<div class="MsoNormal">
<span style="font-size: 12.0pt; line-height: 115%; mso-bidi-font-size: 11.0pt;">It does not have to be that way. </span><a href="http://www.nine-patch.com/"><span style="font-size: 12.0pt; line-height: 115%; mso-bidi-font-size: 11.0pt;">Knowledge and Judgment Scoring</span></a><span style="font-size: 12.0pt; line-height: 115%; mso-bidi-font-size: 11.0pt;"> (classroom
friendly) and the </span><a href="http://www.winsteps.com/winsteps.htm"><span style="font-size: 12.0pt; line-height: 115%; mso-bidi-font-size: 11.0pt;">partial
credit Rasch model</span></a><span style="font-size: 12.0pt; line-height: 115%; mso-bidi-font-size: 11.0pt;"> (that is included in the software states use) can both
update traditional multiple-choice to the levels of thinking required by the
common core state standards (CCSS) movement. We need an accurate, honest and fair assessment of what is of value to students, as well as precise ranking on an efficient CAT. <o:p></o:p></span><br />
<span style="font-size: 12.0pt; line-height: 115%; mso-bidi-font-size: 11.0pt;"><br /></span>
<br />
<div class="MsoNormal">
- - - - - - - - - - - - - - - - - - - - -</div>
<div class="MsoNormal">
<b><span style="font-family: "Arial Bold";"><br /></span></b></div>
<div class="MsoNormal">
<b><span style="font-family: "Arial Bold";">The Best of the Blog - FREE<o:p></o:p></span></b></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
The Visual Education Statistics Engine (VESEngine) presents the common education statistics on one Excel traditional two-dimensional spreadsheet. The post includes definitions. <a href="http://richard-hart.blogspot.com/2013/10/multiple-choice-test-analysis-summary.html">Download</a> as .xlsm or .xls.</div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
This blog started five years ago. It has meandered through several views. The current project is <a href="http://richard-hart.blogspot.com/2014/02/test-scoring-mathematical-model.html">visualizing</a> the VESEngine in three dimensions. The observed student mark patterns (on their answer sheets) are on one level. The variation in the mark patterns (variance) is on the second level.</div>
<br />
<div class="MsoNormal">
<br /></div>
<span style="font-size: 12.0pt; line-height: 115%; mso-bidi-font-size: 11.0pt;"></span><br />
<div class="MsoNormal">
Power Up Plus (PUP) is classroom friendly software used to score and analyze what students guess (traditional multiple-choice) <b>and</b> what they report as the basis for further learning and instruction (knowledge and judgment scoring multiple-choice). This is a quick way to update your multiple-choice to meet Common Core State Standards (promote understanding as well as rote memory). Knowledge and judgment scoring originated as a classroom project, starting in 1980, that converted passive pupils into self-correcting highly successful achievers in two to nine months. Download as <a href="http://www.nine-patch.com/download/PUP522xlsm.zip">.xlsm</a> or <a href="http://www.nine-patch.com/download/PUP522xls.zip">.xls</a>.</div>
</div>
Richard Harthttp://www.blogger.com/profile/04962997526156185761noreply@blogger.com0tag:blogger.com,1999:blog-6676724996771468267.post-25804013878928104022014-09-10T03:00:00.000-07:002014-09-10T03:00:08.049-07:00Conditional Standard Error of Measurement - Precision<div class="MsoNormal">
<span style="font-size: 12.0pt; line-height: 115%; mso-bidi-font-size: 11.0pt;"> 10<span style="mso-spacerun: yes;"> </span></span><br />
<span style="font-size: 12.0pt; line-height: 115%; mso-bidi-font-size: 11.0pt;">(Continued from prior
post.)<o:p></o:p></span></div>
<div class="MsoNormal">
<span style="font-size: 12.0pt; line-height: 115%; mso-bidi-font-size: 11.0pt;"><br /></span></div>
<div class="MsoNormal">
<span style="font-size: 12.0pt; line-height: 115%; mso-bidi-font-size: 11.0pt;">Table 32a contains two estimates (red) of the test standard
error of measurement (SEM) that are in full agreement.<span style="mso-spacerun: yes;"> </span>One estimate, 1.75, is from the average
of the conditional standard error of measurements (CSEM, green) for each
student raw score. The traditional estimate, 1.74, uses the traditional test
reliability, KR20. No problem here.<o:p></o:p></span></div>
<div class="MsoNormal">
<span style="font-size: 12.0pt; line-height: 115%; mso-bidi-font-size: 11.0pt;"><br /></span></div>
<div class="MsoNormal">
<span style="font-size: 12.0pt; line-height: 115%; mso-bidi-font-size: 11.0pt;">The third estimate of the test SEM in Table 32c is different.
It is based on CSEM values expressed in logits (the natural log, 2.718) rather
than on the normal scale. The values are also inverted in relation to the
traditional values in Table 32 (Chart 74). There is a small but important
difference. The IRT CSEM values are much more linear that the CTT CSEM values.
Also the center of this plot is the mean of the number of items (Chart 30,
prior post), not the mean of the item difficulties or student scores. [Also
most of this chart was calculated as most of these relationships do not require
actual data to be charted. Only nine score levels came from the Nurse124 data.]<o:p></o:p></span></div>
<div class="MsoNormal">
<span style="font-size: 12.0pt; line-height: 115%; mso-bidi-font-size: 11.0pt;"><br /></span></div>
<div class="MsoNormal">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgpbKtU0hRwkfk8oFarq3QEzMVslBILYJbJJi8almCkuJW1XApCQLoYi0l58hS0k5wVjVoIlY1PH5SUxA6UwHIfDdkNmkU6lA6cUV4htGz79U1Mrk8m3Jx-N3S1PLH1BRwtUsKW_AHOtPM/s1600/Chart+74.jpg" imageanchor="1" style="clear: right; float: right; margin-bottom: 1em; margin-left: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgpbKtU0hRwkfk8oFarq3QEzMVslBILYJbJJi8almCkuJW1XApCQLoYi0l58hS0k5wVjVoIlY1PH5SUxA6UwHIfDdkNmkU6lA6cUV4htGz79U1Mrk8m3Jx-N3S1PLH1BRwtUsKW_AHOtPM/s1600/Chart+74.jpg" height="255" width="320" /></a><span style="font-size: 12.0pt; line-height: 115%; mso-bidi-font-size: 11.0pt;">Chart 74 shows the binomial CSEM values for CTT (normal) and
IRT (logit) values obtained by inverting the CTT values: “SEM(Rasch Measure in
logits) = 1/(SEM(Raw Score)”, <a href="http://www.rasch.org/rmt/rmt204f.htm">2007</a>.
I then adjusted each of these so the corresponding curves, on the same scale,
crossed near the average CSEM or test SEM: 1.75 for CTT and 0.64 for IRT. The
extreme values for no right and all right were not included. CSEM values for
extreme values go to zero or to infinity with the following result: <o:p></o:p></span></div>
<div class="MsoNormal">
<span style="font-size: 12.0pt; line-height: 115%; mso-bidi-font-size: 11.0pt;"><br /></span></div>
<div class="MsoNormal">
<span style="font-size: 12.0pt; line-height: 115%; mso-bidi-font-size: 11.0pt;">“An apparent paradox is that extreme scores have perfect
precision, but extreme measures have perfect imprecision.” <a href="http://www.winsteps.com/winman/reliability.htm">http://www.winsteps.com/winman/reliability.htm</a><o:p></o:p></span></div>
<div class="MsoNormal">
<span style="font-size: 12.0pt; line-height: 115%; mso-bidi-font-size: 11.0pt;"><br /></span></div>
<div class="MsoNormal">
<span style="font-size: 12.0pt; line-height: 115%; mso-bidi-font-size: 11.0pt;">Precision is then not a constant across the range of student
scores for both methods of analysis. The test SEM of 0.64 logits is comparable
to 1.74 counts on the normal scale.<o:p></o:p></span></div>
<div class="MsoNormal">
<span style="font-size: 12.0pt; line-height: 115%; mso-bidi-font-size: 11.0pt;"><br /></span></div>
<div class="MsoNormal">
<span style="font-size: 12.0pt; line-height: 115%; mso-bidi-font-size: 11.0pt;">The estimate of precision, CSEM, serves <b style="mso-bidi-font-weight: normal;">three</b> different purposes. For CTT and IRT it narrows down the range
in which a student’s test score is expected to fall <b style="mso-bidi-font-weight: normal;">(1)</b>. The average of the (green) individual score CSEM values
estimates the test SEM as 1.75 counts out of a range of 21 items. This is less
than the 2.07 counts for the test standard deviation (SD) <b style="mso-bidi-font-weight: normal;">(2)</b>. Cut scores with greater precision are more believable and
useful. <o:p></o:p></span></div>
<div class="MsoNormal">
<span style="font-size: 12.0pt; line-height: 115%; mso-bidi-font-size: 11.0pt;"><br /></span></div>
<div class="MsoNormal">
<span style="font-size: 12.0pt; line-height: 115%; mso-bidi-font-size: 11.0pt;">For IRT analysis, the CSEM indicates the degree that the data
fit the perfect Rasch model <b style="mso-bidi-font-weight: normal;">(3)</b>. A
better fit also results in more believable and useful results.<o:p></o:p></span></div>
<div class="MsoNormal">
<span style="font-size: 12.0pt; line-height: 115%; mso-bidi-font-size: 11.0pt;"><br /></span></div>
<div class="MsoNormal">
<span style="font-size: 12.0pt; line-height: 115%; mso-bidi-font-size: 11.0pt;">“A standard error quantifies the <b style="mso-bidi-font-weight: normal;">precision </b>of a measure or an estimate. It is the standard deviation
of an imagined error distribution representing the possible distribution of
observed values around their “true” theoretical value. This precision is based
on information within the data. The quality-control fit statistics report on <b style="mso-bidi-font-weight: normal;">accuracy</b>, i.e., how closely the
measures or estimates correspond to a reference standard outside the data, in
this case, the Rasch model.” <a href="http://www.winsteps.com/winman/standarderrors.htm">ht<span style="mso-bookmark: _GoBack;"></span>tp://www.winsteps.com/winman/standarderrors.htm</a><!--[if !supportNestedAnchors]--><a href="https://www.blogger.com/blogger.g?blogID=6676724996771468267" name="_GoBack"></a><!--[endif]--> <o:p></o:p></span></div>
<div class="MsoNormal">
<span style="font-size: 12.0pt; line-height: 115%; mso-bidi-font-size: 11.0pt;"><br /></span></div>
<div class="MsoNormal">
<span style="font-size: 12.0pt; line-height: 115%; mso-bidi-font-size: 11.0pt;">Precision also has some very practical limitations when
delivering tests by computer adaptive testing (CAT). Linacre, <a href="http://www.rasch.org/rmt/rmt202f.htm">2006</a>, has prepared two very
neat tables showing the number of items that must be on a test to obtain a
desired degree of precision expressed in logits and in confidence limits. The
closer the test “targets” an average score of 50%, the fewer items needed for a
desired precision.<o:p></o:p></span></div>
<div class="MsoNoSpacing">
<span style="font-size: 12.0pt; mso-bidi-font-size: 11.0pt;"><br /></span></div>
<div class="MsoNoSpacing">
<span style="font-size: 12.0pt; mso-bidi-font-size: 11.0pt;">The
two top students, with the same score of 20, missed items with different
difficulties. They both yield the same CSEM. The CSEM ignores the pattern of
marks and the difficulty of items. A CSEM value obtained in this manner is
related only to the raw score. Absolute values for the CSEM are sensitive to
item difficulty (Table 23a and 23b).<o:p></o:p></span></div>
<div class="MsoNoSpacing">
<br /></div>
<div class="MsoNormal">
<span style="font-size: 12.0pt; line-height: 115%; mso-bidi-font-size: 11.0pt;">The precision of a cut score has received increasing
attention during the NCLB era. In part, court actions have made the work of
psychometricians more transparent. The technical report for a standardized test
can now exceed 100 pages. There has been a shift of emphasis from test SEM, to
individual score CSEM, to IRT <b style="mso-bidi-font-weight: normal;">information</b>
as an explanation of test precision.<o:p></o:p></span></div>
<div class="MsoNormal">
<span style="font-size: 12.0pt; line-height: 115%; mso-bidi-font-size: 11.0pt;"><span style="mso-spacerun: yes;"><br /></span></span></div>
<div class="MsoNormal">
<span style="font-size: 12.0pt; line-height: 115%; mso-bidi-font-size: 11.0pt;"><span style="mso-spacerun: yes;"> </span>“(Note that the
test <b style="mso-bidi-font-weight: normal;">information</b> function and the
raw score error variance at a given level of proficiency [student <span style="mso-spacerun: yes;"> </span>score], are analogous for the Rasch
model.)” Texas Technical Digest 2005-2006, page 145. And finally, “The
conditional standard error of measurement is the inverse of the information
function.” Maryland Technical Report—2010 Maryland Mod-MSA: Reading, page 99.<o:p></o:p></span></div>
<div class="MsoNormal">
<span style="font-size: 12.0pt; line-height: 115%; mso-bidi-font-size: 11.0pt;"><br /></span></div>
<div class="MsoNormal">
<span style="font-size: 12.0pt; line-height: 115%; mso-bidi-font-size: 11.0pt;">I cannot end this without repeating that this discussion of
precision is based on traditional multiple-choice (TMC) that only ranks
students, a casino operation. Students are not given the opportunity to include
their judgment of what they know or can do that is of value to themselves, and
their teachers, in future learning and instruction, as is done with essays,
problem solving, and projects. This is easily done with knowledge and judgment
scoring (KJS) of multiple-choice tests.<o:p></o:p></span></div>
<!--[if gte mso 9]><xml>
<o:OfficeDocumentSettings>
<o:AllowPNG/>
</o:OfficeDocumentSettings>
</xml><![endif]--><!--[if gte mso 9]><xml>
<w:WordDocument>
<w:Zoom>0</w:Zoom>
<w:TrackMoves>false</w:TrackMoves>
<w:TrackFormatting/>
<w:PunctuationKerning/>
<w:DrawingGridHorizontalSpacing>18 pt</w:DrawingGridHorizontalSpacing>
<w:DrawingGridVerticalSpacing>18 pt</w:DrawingGridVerticalSpacing>
<w:DisplayHorizontalDrawingGridEvery>0</w:DisplayHorizontalDrawingGridEvery>
<w:DisplayVerticalDrawingGridEvery>0</w:DisplayVerticalDrawingGridEvery>
<w:ValidateAgainstSchemas/>
<w:SaveIfXMLInvalid>false</w:SaveIfXMLInvalid>
<w:IgnoreMixedContent>false</w:IgnoreMixedContent>
<w:AlwaysShowPlaceholderText>false</w:AlwaysShowPlaceholderText>
<w:Compatibility>
<w:BreakWrappedTables/>
<w:DontGrowAutofit/>
<w:DontAutofitConstrainedTables/>
<w:DontVertAlignInTxbx/>
</w:Compatibility>
</w:WordDocument>
</xml><![endif]--><!--[if gte mso 9]><xml>
<w:LatentStyles DefLockedState="false" LatentStyleCount="276">
</w:LatentStyles>
</xml><![endif]-->
<!--[if gte mso 10]>
<style>
/* Style Definitions */
table.MsoNormalTable
{mso-style-name:"Table Normal";
mso-tstyle-rowband-size:0;
mso-tstyle-colband-size:0;
mso-style-noshow:yes;
mso-style-parent:"";
mso-padding-alt:0in 5.4pt 0in 5.4pt;
mso-para-margin-top:0in;
mso-para-margin-right:0in;
mso-para-margin-bottom:10.0pt;
mso-para-margin-left:0in;
line-height:115%;
mso-pagination:widow-orphan;
font-size:12.0pt;
font-family:"Times New Roman";
mso-ascii-font-family:Calibri;
mso-ascii-theme-font:minor-latin;
mso-fareast-font-family:"Times New Roman";
mso-fareast-theme-font:minor-fareast;
mso-hansi-font-family:Calibri;
mso-hansi-theme-font:minor-latin;}
</style>
<![endif]-->
<!--StartFragment-->
<!--EndFragment--><br />
<div class="MsoNormal">
<span style="font-size: 12.0pt; line-height: 115%; mso-bidi-font-size: 11.0pt;"><br /></span></div>
<div class="MsoNormal">
<span style="font-size: 12.0pt; line-height: 115%; mso-bidi-font-size: 11.0pt;">(Continued)<o:p></o:p></span><br />
<span style="font-size: 12.0pt; line-height: 115%; mso-bidi-font-size: 11.0pt;"><br /></span>
<br />
<div class="MsoNormal">
- - - - - - - - - - - - - - - - - - - - -</div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
Table26.xlsm, is now available free by request.</div>
<div class="MsoNormal">
<b><span style="font-family: "Arial Bold";"><br /></span></b></div>
<div class="MsoNormal">
<b><span style="font-family: "Arial Bold";">The Best of the Blog - FREE<o:p></o:p></span></b></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
The Visual Education Statistics Engine (VESEngine) presents the common education statistics on one Excel traditional two-dimensional spreadsheet. The post includes definitions. <a href="http://richard-hart.blogspot.com/2013/10/multiple-choice-test-analysis-summary.html">Download</a> as .xlsm or .xls.</div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
This blog started five years ago. It has meandered through several views. The current project is <a href="http://richard-hart.blogspot.com/2014/02/test-scoring-mathematical-model.html">visualizing</a> the VESEngine in three dimensions. The observed student mark patterns (on their answer sheets) are on one level. The variation in the mark patterns (variance) is on the second level.</div>
<br />
<div class="MsoNormal">
<br /></div>
<span style="font-size: 12.0pt; line-height: 115%; mso-bidi-font-size: 11.0pt;"></span><br />
<div class="MsoNormal">
Power Up Plus (PUP) is classroom friendly software used to score and analyze what students guess (traditional multiple-choice) and what they report as the basis for further learning and instruction (knowledge and judgment scoring multiple-choice). This is a quick way to update your multiple-choice to meet Common Core State Standards (promote understanding as well as rote memory). Knowledge and judgment scoring originated as a classroom project, starting in 1980, that converted passive pupils into self-correcting highly successful achievers in two to nine months. Download as <a href="http://www.nine-patch.com/download/PUP522xlsm.zip">.xlsm</a> or <a href="http://www.nine-patch.com/download/PUP522xls.zip">.xls</a>.</div>
</div>
Richard Harthttp://www.blogger.com/profile/04962997526156185761noreply@blogger.com0tag:blogger.com,1999:blog-6676724996771468267.post-17979533888976958402014-08-13T03:00:00.000-07:002014-08-24T09:36:11.758-07:00Test Score Reliability - TMC and IRT<div class="MsoNormal">
<span style="font-size: 12.0pt; line-height: 115%; mso-bidi-font-size: 11.0pt;"><span style="mso-spacerun: yes;"> </span>9<o:p></o:p></span></div>
<div class="MsoNormal">
<span style="font-size: 12.0pt; line-height: 115%; mso-bidi-font-size: 11.0pt;">The main purpose of this post is to investigate the
similarities between traditional multiple-choice (TMC), or classical test
theory (CTT), and item response theory (IRT). The discussion is based on TMC
and IRT as the math is simpler than when using </span><a href="http://www.nine-patch.com/"><span style="font-size: 12.0pt; line-height: 115%; mso-bidi-font-size: 11.0pt;">knowledge and judgment scoring</span></a><span style="font-size: 12.0pt; line-height: 115%; mso-bidi-font-size: 11.0pt;"> (KJS) and
the </span><a href="http://www.winsteps.com/winsteps.htm"><span style="font-size: 12.0pt; line-height: 115%; mso-bidi-font-size: 11.0pt;">IRT partial
credit model</span></a><span style="font-size: 12.0pt; line-height: 115%; mso-bidi-font-size: 11.0pt;"> (PCM). The difference is that TMC and IRT input marks at the
lowest levels of thinking; resulting in a traditional ranking. KJS and PCM
input the same marks at all levels of thinking; resulting in a ranking plus a
quality indication of what a student actually knows and understands that is of
value to that student (and teacher) in further instruction and learning.<o:p></o:p></span><br />
<span style="font-size: 12.0pt; line-height: 115%; mso-bidi-font-size: 11.0pt;"><br /></span></div>
<div class="MsoNormal">
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjWLMRLbTSd2k5Tmz5yRWkfM5vitBj2tdqPM3zajbwUgKVbaQv6CpoYI8GC5zzZ16gjEeTyVSZ660mz7QOZ2phorDcBPPtzTQlJyM7Up8doxyhW22cyrEQCnvdhlBxNYNsFggINo16EeiI/s1600/Table+32.jpg" imageanchor="1" style="clear: right; float: right; margin-bottom: 1em; margin-left: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjWLMRLbTSd2k5Tmz5yRWkfM5vitBj2tdqPM3zajbwUgKVbaQv6CpoYI8GC5zzZ16gjEeTyVSZ660mz7QOZ2phorDcBPPtzTQlJyM7Up8doxyhW22cyrEQCnvdhlBxNYNsFggINo16EeiI/s1600/Table+32.jpg" height="320" width="257" /></a></div>
</div>
<div class="MsoNormal">
<span style="font-size: 12.0pt; line-height: 115%; mso-bidi-font-size: 11.0pt;">I applied the instructions in the </span><a href="http://www.winsteps.com/a/winsteps-manual.pdf"><span style="font-size: 12.0pt; line-height: 115%; mso-bidi-font-size: 11.0pt;">Winsteps Manual</span></a><span style="font-size: 12.0pt; line-height: 115%; mso-bidi-font-size: 11.0pt;">, page 576,
for checking out the Winsteps reliability estimate computation, to the
Nursing124 data used in the past several posts (22 students and 21 items). Table
32 is a busy table that is discussed in the next several posts. The two
estimates for test reliability (0.29 and 0.28, orange) are identical based on
TMC and IRT (considering rounding errors).<o:p></o:p></span></div>
<div class="MsoNormal">
<span style="font-size: 12.0pt; line-height: 115%; mso-bidi-font-size: 11.0pt;"><br /></span></div>
<div class="MsoNormal">
<span style="font-size: 12.0pt; line-height: 115%; mso-bidi-font-size: 11.0pt;">Table 32a shows the TMC test reliability estimated from the <b style="mso-bidi-font-weight: normal;">ratio</b> of true variance to total
variance. The total variance <b style="mso-bidi-font-weight: normal;">between
scores, </b>4.08<b style="mso-bidi-font-weight: normal;">,</b> minus the error
variance <b style="mso-bidi-font-weight: normal;">within items, </b>2.95<b style="mso-bidi-font-weight: normal;">,</b> yields the true variance, 1.13. The
KR20 then completes the reliability calculation to yield 0.29 using normal
values.<o:p></o:p></span></div>
<div class="MsoNormal">
<span style="font-size: 12.0pt; line-height: 115%; mso-bidi-font-size: 11.0pt;"><br /></span></div>
<div class="MsoNormal">
<span style="font-size: 12.0pt; line-height: 115%; mso-bidi-font-size: 11.0pt;">For an IRT estimate of test reliability, the values on a
normal scale are converted to the logit scale (ln ratio w/r). In this case, the
sum of item difficulty logits, ln ratio w/r, was -1.62 (Table 32b). This value
is subtracted from each item difficulty logit value to shift the mean of the
item distribution to the zero logit point (Rasch Adjust, Table 32b). Winsteps
then optimizes the fit of the data (blue) to the perfect Rasch Model. Now comparable
<b style="mso-bidi-font-weight: normal;">student ability</b> and <b style="mso-bidi-font-weight: normal;">item difficulty</b> values are in register
at the same locations on a single logit scale. The 50% point on the normal
scale is now at the zero location for both student ability and item difficulty.
<o:p></o:p></span></div>
<div class="MsoNormal">
<span style="font-size: 12.0pt; line-height: 115%; mso-bidi-font-size: 11.0pt;"><br /></span></div>
<div class="MsoNormal">
<span style="font-size: 12.0pt; line-height: 115%; mso-bidi-font-size: 11.0pt;">The probability for each right mark (expected score ) in the
central cells is the product of the respective marginal cells (blue) for item
difficulty (Winsteps Table 13.1) and student ability (Winsteps Table 17.1). The sum of these
probabilities (Table 32b, pink) is identical to the normal Score Mean (Table 32a,
pink).<o:p></o:p></span><br />
<span style="font-size: 12.0pt; line-height: 115%; mso-bidi-font-size: 11.0pt;"><br /></span></div>
<div class="MsoNoSpacing">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEikbto5ji3YSDJqlr3Rs8_Frr0r9uD_ZHU8aXSx_rVB6BGp5-TyjftmMGlAy5FOtucphsse30FiiQ1ZELfWNZ3G0-B1xI7xMAeUvO0zfdWDBvlZvK3v6cDoKFekZU8FBUn3uTLXHEHfg9U/s1600/New+Table+13.1.jpg" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEikbto5ji3YSDJqlr3Rs8_Frr0r9uD_ZHU8aXSx_rVB6BGp5-TyjftmMGlAy5FOtucphsse30FiiQ1ZELfWNZ3G0-B1xI7xMAeUvO0zfdWDBvlZvK3v6cDoKFekZU8FBUn3uTLXHEHfg9U/s1600/New+Table+13.1.jpg" height="150" width="200" /></a></div>
<div class="MsoNoSpacing">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi_OjB_SCC06gCeoc2KMnqT8Ku1fv8zs9zRhbK5fzWXX4HdE79yRagrm1ALtiX25GuQf4fXO17KN1zgIEmvugfneCca4ZV28xysiz0QrmQtC3lrr36Pq1-io9i_Lo_0EBHEN_Ixf11-VH0/s1600/New+Table+17.1.jpg" imageanchor="1" style="clear: right; float: right; margin-bottom: 1em; margin-left: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi_OjB_SCC06gCeoc2KMnqT8Ku1fv8zs9zRhbK5fzWXX4HdE79yRagrm1ALtiX25GuQf4fXO17KN1zgIEmvugfneCca4ZV28xysiz0QrmQtC3lrr36Pq1-io9i_Lo_0EBHEN_Ixf11-VH0/s1600/New+Table+17.1.jpg" height="146" width="200" /></a><span style="font-size: 12.0pt; mso-bidi-font-size: 11.0pt;">The
“information” in each central cell, in Table 32c, was obtained by p*q or p * (1
- p) from Table 32b. Adding up the internal cells for each score yields the sum
of information for that score. <span style="mso-spacerun: yes;"> </span><o:p></o:p></span></div>
<div class="MsoNoSpacing">
<br /></div>
<div class="MsoNoSpacing">
<span style="font-size: 12.0pt; mso-bidi-font-size: 11.0pt;">The
next column shows the square root of the sum of information. This value
inverted yields the conditional standard error of measurement (CSEM). The conditional
variance (CVar) <b style="mso-bidi-font-weight: normal;">within</b> each student
ability measure is then obtained by reversing the equation for normal values in
Table 32a: The CVar is obtained as the square of the CSEM instead of the CSEM
being obtained as the square root of the CVar. The average of these values is
the test model error variance (EV) in measures: 0.43. <o:p></o:p></span></div>
<div class="MsoNoSpacing">
<br /></div>
<div class="MsoNoSpacing">
<span style="font-size: 12.0pt; mso-bidi-font-size: 11.0pt;">The
observed variance (OV) <b style="mso-bidi-font-weight: normal;">between</b>
measures is estimated in the exact same way as is done for normal scores: the
variance between measures from Excel =VAR.P (0.61) or the square of the SD: 0.78
squared = 0.61. <o:p></o:p></span></div>
<div class="MsoNoSpacing">
<br /></div>
<div class="MsoNoSpacing">
<span style="font-size: 12.0pt; mso-bidi-font-size: 11.0pt;">The
test reliability in measures {(OV –EV)/OV = (0.61 – 0.45)/0.61 = 0.28) is then obtained
from the same equation for normal values: {total variance – error
variance)/total variance = (4.08 – 2.96)/4.08 = 0.29, in table 32a. Normal and
measure dimensions for the same value differ, but ratios do not, as a ratio has
no dimension. <b style="mso-bidi-font-weight: normal;">TMC and IRT produced the
same values for test reliability. As will KJS and the PCM.<o:p></o:p></b></span><br />
<span style="font-size: 12.0pt; mso-bidi-font-size: 11.0pt;"><b style="mso-bidi-font-weight: normal;"><br /></b></span>
<span style="font-size: 12.0pt; mso-bidi-font-size: 11.0pt;"><b style="mso-bidi-font-weight: normal;">(Continued)</b></span><br />
<span style="font-size: 12.0pt; mso-bidi-font-size: 11.0pt;"><b style="mso-bidi-font-weight: normal;"><br /></b></span>
<br />
<div class="MsoNormal">
- - - - - - - - - - - - - - - - - - - - -</div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
Table26.xlsm, is now available free by request. </div>
<div class="MsoNormal">
<b style="mso-bidi-font-weight: normal;"><span style="font-family: "Arial Bold";"><br /></span></b></div>
<div class="MsoNormal">
<b style="mso-bidi-font-weight: normal;"><span style="font-family: "Arial Bold";">The Best of the Blog - FREE<o:p></o:p></span></b></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
The Visual Education Statistics Engine (VESEngine) presents
the common education statistics on one Excel traditional two-dimensional
spreadsheet. The post includes definitions. <a href="http://richard-hart.blogspot.com/2013/10/multiple-choice-test-analysis-summary.html">Download</a>
as .xlsm or .xls. </div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
This blog started five years ago. It has meandered through
several views. The current project is <a href="http://richard-hart.blogspot.com/2014/02/test-scoring-mathematical-model.html">visualizing</a>
the VESEngine in three dimensions. The observed student mark patterns (on their
answer sheets) are on one level. The variation in the mark patterns (variance)
is on the second level.</div>
<!--[if gte mso 9]><xml>
<o:OfficeDocumentSettings>
<o:AllowPNG/>
</o:OfficeDocumentSettings>
</xml><![endif]--><!--[if gte mso 9]><xml>
<w:WordDocument>
<w:Zoom>0</w:Zoom>
<w:TrackMoves>false</w:TrackMoves>
<w:TrackFormatting/>
<w:PunctuationKerning/>
<w:DrawingGridHorizontalSpacing>18 pt</w:DrawingGridHorizontalSpacing>
<w:DrawingGridVerticalSpacing>18 pt</w:DrawingGridVerticalSpacing>
<w:DisplayHorizontalDrawingGridEvery>0</w:DisplayHorizontalDrawingGridEvery>
<w:DisplayVerticalDrawingGridEvery>0</w:DisplayVerticalDrawingGridEvery>
<w:ValidateAgainstSchemas/>
<w:SaveIfXMLInvalid>false</w:SaveIfXMLInvalid>
<w:IgnoreMixedContent>false</w:IgnoreMixedContent>
<w:AlwaysShowPlaceholderText>false</w:AlwaysShowPlaceholderText>
<w:Compatibility>
<w:BreakWrappedTables/>
<w:DontGrowAutofit/>
<w:DontAutofitConstrainedTables/>
<w:DontVertAlignInTxbx/>
</w:Compatibility>
</w:WordDocument>
</xml><![endif]--><!--[if gte mso 9]><xml>
<w:LatentStyles DefLockedState="false" LatentStyleCount="276">
</w:LatentStyles>
</xml><![endif]-->
<!--[if gte mso 10]>
<style>
/* Style Definitions */
table.MsoNormalTable
{mso-style-name:"Table Normal";
mso-tstyle-rowband-size:0;
mso-tstyle-colband-size:0;
mso-style-noshow:yes;
mso-style-parent:"";
mso-padding-alt:0in 5.4pt 0in 5.4pt;
mso-para-margin-top:0in;
mso-para-margin-right:0in;
mso-para-margin-bottom:10.0pt;
mso-para-margin-left:0in;
mso-pagination:widow-orphan;
font-size:12.0pt;
font-family:"Times New Roman";
mso-ascii-font-family:Cambria;
mso-ascii-theme-font:minor-latin;
mso-fareast-font-family:"Times New Roman";
mso-fareast-theme-font:minor-fareast;
mso-hansi-font-family:Cambria;
mso-hansi-theme-font:minor-latin;}
</style>
<![endif]--><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026"/>
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1"/>
</o:shapelayout></xml><![endif]-->
<!--StartFragment-->
<!--EndFragment--><br />
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
Power Up Plus (PUP) is classroom friendly software used to
score and analyze what students guess (traditional multiple-choice) and what
they report as the basis for further learning and instruction (knowledge and
judgment scoring multiple-choice). This is a quick way to update your
multiple-choice to meet Common Core State Standards (promote understanding as
well as rote memory). Knowledge and judgment scoring originated as a classroom
project, starting in 1980, that converted passive pupils into self-correcting
highly successful achievers in two to nine months. Download as <a href="http://www.nine-patch.com/download/PUP522xlsm.zip">.xlsm</a> or <a href="http://www.nine-patch.com/download/PUP522xls.zip">.xls</a>.</div>
</div>
Richard Harthttp://www.blogger.com/profile/04962997526156185761noreply@blogger.com0tag:blogger.com,1999:blog-6676724996771468267.post-58155557659077629832014-07-09T03:00:00.000-07:002014-07-17T11:44:36.445-07:00Small Sample Math Model - SEMs<div class="MsoNormal">
<span style="mso-spacerun: yes;"> </span>8</div>
<div class="MsoNormal">
The test standard error of measurement (SEM) can be
calculated in two ways: The traditional way is by relating the variance between
student scores and within item difficulties; between an external column and the
internal cell columns. </div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
The second way harvests the variance conditioned on each student
score and then sums the CSEM (SQRT(conditional student score error variance))
for the test. The first method links two properties: student ability and item
difficulty. The second only uses one property: student ability.</div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg72MfFSwAUKETE5_A_dc6nRPxci9o5rEwjXMdDyqSt464_6X8-A9URpbHuMWW45ot4ypwnxgOwjBVHORj45M0TUfeGshLj4EqjRVjF3iWAY_AY3hflvrwc5y3qezDPKua8gXKthclA0RA/s1600/Table+29.jpg" imageanchor="1" style="clear: right; float: right; margin-bottom: 1em; margin-left: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg72MfFSwAUKETE5_A_dc6nRPxci9o5rEwjXMdDyqSt464_6X8-A9URpbHuMWW45ot4ypwnxgOwjBVHORj45M0TUfeGshLj4EqjRVjF3iWAY_AY3hflvrwc5y3qezDPKua8gXKthclA0RA/s1600/Table+29.jpg" height="320" width="284" /></a>I set up a model with 12 students and 11 items (see previous
post and Table26.xlsm below). Extreme values of zero and 100% were excluded.
Four samples with average test scores of 5, 6, 7 (Table 29), and 8 were created
with the standard deviation (1.83) and the variance within item difficulties (1.83)
held constant. This allowed the SEM to vary between methods.</div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
The calculation of the test SEM (1.36) by way of reliability
(KR20) is reviewed on the top level of Chart 73. The test SEM remained the same
for all four tests. </div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
My first calculation of the test SEM by way of conditional
standard error of measurement (CSEM) began with the deviation of each mark from
the student score (Table 29 center). I squared the deviations and summed to get
the conditional variance for each score. The individual student CSEM is given
as the square root of the conditional variance (the SD of the conditional
variance). The test SEM (1.48) is then the sum of the student CSEM values.</div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
[My second calculation was based on the binomial standard
error of measurement given in Crocker, Linda, and James Algina, 1986,
Introduction to Classical & Modern Test Theory, Wadsworth Group, pages
124-127. </div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
By including the “correction for obtaining unbiased
estimates of population variance”, (n/(n – 1), the SEM value increased from
1.48 to 1.55 (Table 29). This is a perfect match to the binomial SEM.]</div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
The two SEMs are then based on different sample sizes and
different assumptions. The traditional SEM (1.36) is based on the raggedly
distributed small sample size in hand. The binomial SEM (1.55) assumes a
perfectly normally distributed large theoretical population. </div>
<div class="MsoNormal">
<br />
[V<b>ariance calculations (variance is additive)</b>:<br />
<br />
<ul>
<li><b>Test variance</b>: Score deviations from the test mean (as counts), squared, and summed = a sum of squares (SS). SS/N = MSS or variance: 3.33. {Test SD = SQRT(Var) = 1.83. Test SEM = 1.36.}</li>
<li><br /></li>
<li><b>Conditional error variance</b>: Deviations from the student score (as a percent), squared, and summed = the conditional error variance (CVar) for that student score. {Test SEM = Average SQRT(CVar) = 1.48 (n) and 1.55 (n-1)}</li>
<li><br /></li>
<li><b>Conditional error variance</b>: Variance Within the Score row (Excel, VAR) x (n or n - 1) = the CVar for that student score. {Test SEM VAR.P = 1.48 and VAR.S = 1.55.] </li>
</ul>
</div>
<div class="MsoNormal">
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjJLKUJwhZ_bi2jGcNC41JZ_CUOtOdK4KGBixFtkeSQgQzB6lKoaPntL6OVlC8cDxwWfdWW_9xYYqcEWNbXVQ48XBxXH85hrmfsBNyYqKmhIshi2AUNkY_opKsIS4oq_0jktAGEM44CJuo/s1600/Chart+73.jpg" imageanchor="1" style="clear: right; float: right; margin-bottom: 1em; margin-left: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjJLKUJwhZ_bi2jGcNC41JZ_CUOtOdK4KGBixFtkeSQgQzB6lKoaPntL6OVlC8cDxwWfdWW_9xYYqcEWNbXVQ48XBxXH85hrmfsBNyYqKmhIshi2AUNkY_opKsIS4oq_0jktAGEM44CJuo/s1600/Chart+73.jpg" height="240" width="320" /></a></div>
Squaring values produces curved distributions (Chart 73). <b style="mso-bidi-font-weight: normal;"><span style="font-family: "Arial Bold";">The
curves represent the possible values.</span></b> They do not represent the
number of items or student scores having those values.</div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
The True MSS = Total MSS – Error MSS = 3.33 -1.83 = 1.50,
involves subtracting a <b style="mso-bidi-font-weight: normal;"><span style="font-family: "Arial Bold";">convex</span></b> distribution centered on the
average test score from a <b style="mso-bidi-font-weight: normal;"><span style="font-family: "Arial Bold";">concave</span></b> distribution centered on
the maximum value of 0.25 (not on the average item difficulty). </div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
The student score MSS is at a maximum when the item error SS
is at a minimum. The error MSS is at a maximum (0.25) when the student score
MSS is at a minimum (0.00). This makes sense. This item is perfectly aligned
with the student score distribution at a point where there is not differing
from the average test score.</div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
The KR20 is then a ratio of the True MSS/Total MSS,
1.50/3.33 = 0.50. [KR20 ranges from 0 to 1, not reproducible to fully
reproducible]. The test SEM is then a portion, SQRT(1 – KR20) of the SD [also
1.83 in this example, SQRT(3.33)] = SQRT(1 – 0.50) * 1.83 = 1.36.</div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh0o0SctQSUFVre1xwscIDn25infv4WdiSSZvqjgxcalX4McwpL34eh2ZHHcCcKKL2Kr0ct6Rt9bCWix7A-0Pesk_E2qM2XjG6xXCHwEIQj0YzOcqekvGlB6R4cOh5tEmaPeycrxg4u8xQ/s1600/Table+30.jpg" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh0o0SctQSUFVre1xwscIDn25infv4WdiSSZvqjgxcalX4McwpL34eh2ZHHcCcKKL2Kr0ct6Rt9bCWix7A-0Pesk_E2qM2XjG6xXCHwEIQj0YzOcqekvGlB6R4cOh5tEmaPeycrxg4u8xQ/s1600/Table+30.jpg" height="200" width="190" /></a>I was able to set the test SEM estimates using KR20 all to 1.36
for all four tests by setting the SD of student scores and the item error MSS
to <b style="mso-bidi-font-weight: normal;"><span style="font-family: "Arial Bold";">constant
</span></b>values by switching a 0 and 1 pair in student mark patterns. [The SD
and the item error MSS do not have to be the same values.]</div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
All possible individual student <b style="mso-bidi-font-weight: normal;"><span style="font-family: "Arial Bold";">score binomial CSEM</span></b>
values for a test with 11 items are listed in Table 30. The CSEM is given as
the SQRT(conditional variance). The conditional variance is: (X * (n – X))/(n –
1) or n*(pg) * (n/(n - 1)). There is then no need to administer a test to calculate a student score binomial CSEM
value. There is a need to administer a test to find the test SEM. The test SEM
(Table 29) is the sum of these values, 1.55.</div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhLqhUDz1armn8l4VgCRmnJeBKWaxY6kKsCm4Ny3uosPh9JgTO6p7G-ea__FniNsUDHq1jXP-fyG1bHushab4Xc3IG7kHeX27PGyFsgDw_cFjFuNVD-0zOUdzh34_LG1eDNJOwG6sEafE0/s1600/Table+31.jpg" imageanchor="1" style="clear: right; float: right; margin-bottom: 1em; margin-left: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhLqhUDz1armn8l4VgCRmnJeBKWaxY6kKsCm4Ny3uosPh9JgTO6p7G-ea__FniNsUDHq1jXP-fyG1bHushab4Xc3IG7kHeX27PGyFsgDw_cFjFuNVD-0zOUdzh34_LG1eDNJOwG6sEafE0/s1600/Table+31.jpg" height="173" width="320" /></a>The student CSEM and thus the test SEM values are derived
only from student mark patterns. They differ from the test SEM values derived
from the KR20 (Table 31). With KR20 derived values held constant, the binomial CSEM
derived values for SEM decreased with higher test scores. This makes sense.
There is less room for chance events. Precision increases with higher test
scores.</div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
Given a choice, a testing company would select the KR20
method using CTT analysis to report test SEM results.</div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
[The same SEM values for tests with 5 right and 6 right
resulted from the fact that the median score was 5.5. The values for 5 right
and 6 right fall an equal distance from the mean on either side. Therefore 5
and 6 or 6 and 5 both add up to 11.]</div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
I positioned the green curve on Chart 73 using the above
information. </div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
A CSEM value is independent from the average test score and
item difficulties. (Swapping paired 0s and 1s in student mark patterns to
adjust the item error variance made no difference in the CSEM value.) The
average of the CSEM values, the test SEM, is dependent on the number of items on
the test with each value. If all scores are the same, the CSEMs and the SEM
will be the same (Tables 30 and 31).</div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
I hope at this stage to have a visual mathematical model
that is robust enough to make meaningful comparisons with the Rasch IRT model. I
would like to return to this model and do two things (or have someone volunteer
do it):<br />
<br />
<ol>
<li><span style="text-indent: -0.25in;">Combine all the features that have been teased
out, in Chart 72 and Chart 73, into one model.</span></li>
<li><span style="text-indent: -0.25in;">Animate the model in a meaningful way with
change gages and history graphs.</span></li>
</ol>
</div>
<div class="MsoNormal">
Now to return to the Nursing data that represent the real
classroom, filled with successful instruction, learning, and assessment.</div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
- - - - - - - - - - - - - - - - - - - - -</div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
Table26.xlsm, is now available free by request. (Files hosted at nine-patch.com are also being relocated now that Nine-Patch Multiple-Choice, Inc has been dissolved.)</div>
<div class="MsoNormal">
<b style="mso-bidi-font-weight: normal;"><span style="font-family: "Arial Bold";"><br /></span></b></div>
<div class="MsoNormal">
<b style="mso-bidi-font-weight: normal;"><span style="font-family: "Arial Bold";">The Best of the Blog - FREE<o:p></o:p></span></b></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
The Visual Education Statistics Engine (VESEngine) presents
the common education statistics on one Excel traditional two-dimensional
spreadsheet. The post includes definitions. <a href="http://richard-hart.blogspot.com/2013/10/multiple-choice-test-analysis-summary.html">Download</a>
as .xlsm or .xls. </div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
This blog started five years ago. It has meandered through
several views. The current project is <a href="http://richard-hart.blogspot.com/2014/02/test-scoring-mathematical-model.html">visualizing</a>
the VESEngine in three dimensions. The observed student mark patterns (on their
answer sheets) are on one level. The variation in the mark patterns (variance)
is on the second level.</div>
<!--[if gte mso 9]><xml>
<o:OfficeDocumentSettings>
<o:AllowPNG/>
</o:OfficeDocumentSettings>
</xml><![endif]--><!--[if gte mso 9]><xml>
<w:WordDocument>
<w:Zoom>0</w:Zoom>
<w:TrackMoves>false</w:TrackMoves>
<w:TrackFormatting/>
<w:PunctuationKerning/>
<w:DrawingGridHorizontalSpacing>18 pt</w:DrawingGridHorizontalSpacing>
<w:DrawingGridVerticalSpacing>18 pt</w:DrawingGridVerticalSpacing>
<w:DisplayHorizontalDrawingGridEvery>0</w:DisplayHorizontalDrawingGridEvery>
<w:DisplayVerticalDrawingGridEvery>0</w:DisplayVerticalDrawingGridEvery>
<w:ValidateAgainstSchemas/>
<w:SaveIfXMLInvalid>false</w:SaveIfXMLInvalid>
<w:IgnoreMixedContent>false</w:IgnoreMixedContent>
<w:AlwaysShowPlaceholderText>false</w:AlwaysShowPlaceholderText>
<w:Compatibility>
<w:BreakWrappedTables/>
<w:DontGrowAutofit/>
<w:DontAutofitConstrainedTables/>
<w:DontVertAlignInTxbx/>
</w:Compatibility>
</w:WordDocument>
</xml><![endif]--><!--[if gte mso 9]><xml>
<w:LatentStyles DefLockedState="false" LatentStyleCount="276">
</w:LatentStyles>
</xml><![endif]-->
<!--[if gte mso 10]>
<style>
/* Style Definitions */
table.MsoNormalTable
{mso-style-name:"Table Normal";
mso-tstyle-rowband-size:0;
mso-tstyle-colband-size:0;
mso-style-noshow:yes;
mso-style-parent:"";
mso-padding-alt:0in 5.4pt 0in 5.4pt;
mso-para-margin-top:0in;
mso-para-margin-right:0in;
mso-para-margin-bottom:10.0pt;
mso-para-margin-left:0in;
mso-pagination:widow-orphan;
font-size:10.0pt;
font-family:"Times New Roman";
mso-ascii-font-family:Cambria;
mso-ascii-theme-font:minor-latin;
mso-hansi-font-family:Cambria;
mso-hansi-theme-font:minor-latin;}
</style>
<![endif]-->
<!--StartFragment-->
<!--EndFragment--><br />
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
Power Up Plus (PUP) is classroom friendly software used to
score and analyze what students guess (traditional multiple-choice) and what
they report as the basis for further learning and instruction (knowledge and
judgment scoring multiple-choice). This is a quick way to update your
multiple-choice to meet Common Core State Standards (promote understanding as
well as rote memory). Knowledge and judgment scoring originated as a classroom
project, starting in 1980, that converted passive pupils into self-correcting
highly successful achievers in two to nine months. Download as <a href="http://www.nine-patch.com/download/PUP522xlsm.zip">.xlsm</a> or <a href="http://www.nine-patch.com/download/PUP522xls.zip">.xls</a>.</div>
Richard Harthttp://www.blogger.com/profile/04962997526156185761noreply@blogger.com0tag:blogger.com,1999:blog-6676724996771468267.post-37669158161680150892014-06-18T03:00:00.000-07:002014-06-11T14:23:50.820-07:00Small Sample Math Model - Item Discrimination<div class="MsoNormal">
<span style="mso-spacerun: yes;"> </span>#7<span style="mso-spacerun: yes;"> </span></div>
<div class="MsoNormal">
The ability of an item to place students into two distinct
groups is not a part of the mathematical model developed in the past few posts.
Discrimination ability, however, provides insight into how the model works. A
practical standardized test must have student scores spread out enough to
assign desired rankings. Discriminating items produce this spread of student
scores.</div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
Current CCSS multiple-choice standardized test scoring only ranks, it does
not tell us what a student actually knows that is useful and meaningful to the
student as the basis for further learning and effective instruction. This can
be done with <a href="http://www.nine-patch.com/">Knowledge and Judgment Scoring</a>
and the partial credit Rasch IRT model using the very same tests. This post is
using traditional scoring as it simplifies the analysis (and the model) to just
right and wrong, no judgment or higher levels of thinking are required of
students.</div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhJo_u6KD_rc2Euh25jvOin-fkgmy70kRR6QO1NKR7gWFQoPmKZi4VhOxNxRBLGzWaD6S-kw3FXnOxVVFqwsuPMulMMjSSIM1Mv9dbtyhh5qQrtXJEeL6ld31ofZ0dSOZwDBridQ9btF08/s1600/Table+26.jpg" imageanchor="1" style="clear: right; float: right; margin-bottom: 1em; margin-left: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhJo_u6KD_rc2Euh25jvOin-fkgmy70kRR6QO1NKR7gWFQoPmKZi4VhOxNxRBLGzWaD6S-kw3FXnOxVVFqwsuPMulMMjSSIM1Mv9dbtyhh5qQrtXJEeL6ld31ofZ0dSOZwDBridQ9btF08/s1600/Table+26.jpg" height="295" width="320" /></a>I created a simple data set of 12 students and 11 items
(Table 26) with an average score of 5. I then modified this set to produce
average scores of 6, 7, and 8 (Table 27). [This can also be considered as the
same test given to students in grades 5, 6, 7, and 8.]</div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
The item error mean sum of squares (MSS), variance, for a
test with an average score of 8 was 1.83. I then adjusted the MSS for the other
three grades to match this value. A right and a wrong mark were exchanged in a
student mark pattern (row) to make an adjustment (Table 27). I stopped with 1.85,
1.85, 1.83, and 1.83 for grades 5, 6, 7, and 8. (This forced the KR20 = 0.495
and SEM = 1.36 to remain the same for all four sets.)</div>
<div class="MsoNormal">
<br /></div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEikaAMVVdGMsGDKEkM_-xEOl9KbPtJyD1s6TfkFlhHxp2OgG2dpb6ORbEn_rbiYihA7IcHos7oEsvlfY9cq-AMPpF70V9PcfTYt0b096__uSjUrsDBome2IAlxXZFpdHkAf8XjpwUbZsDo/s1600/Table+27.jpg" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEikaAMVVdGMsGDKEkM_-xEOl9KbPtJyD1s6TfkFlhHxp2OgG2dpb6ORbEn_rbiYihA7IcHos7oEsvlfY9cq-AMPpF70V9PcfTYt0b096__uSjUrsDBome2IAlxXZFpdHkAf8XjpwUbZsDo/s1600/Table+27.jpg" height="83" width="320" /></a></div>
<div class="MsoNormal">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiXG3pPguZ7ekcq5UG60jWHp9UmgH3BC70YrE12UdyUb2GKWai56UYl9Ehu_CJiigNvLl0taiatY23Noczz9wISbyW8LO6oogUf6EvEP4iwedMxX23M_uEYdpNgxpIA-rIW9z1A-TLCSx0/s1600/Table+28.jpg" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiXG3pPguZ7ekcq5UG60jWHp9UmgH3BC70YrE12UdyUb2GKWai56UYl9Ehu_CJiigNvLl0taiatY23Noczz9wISbyW8LO6oogUf6EvEP4iwedMxX23M_uEYdpNgxpIA-rIW9z1A-TLCSx0/s1600/Table+28.jpg" height="83" width="320" /></a>The average item difficulty (Table 27) varied, as expected,
with the average test score. The average item discrimination (Pearson r and PBR)
(Table 28) was stable. In general, with a few outliers in this small data set,
the most discriminating items had the same difficulty as the average test
score. [This behavior for the item discrimination to be maximized at the
average test score is a basic component of the Rasch IRT model, which by design
limits, must use the 50% point.]</div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgJ5Q6TS0M7kXGpj3RPZgPxgvJ46dVjzadBxue51nALAsz5xHpOZIssXrqnpY0yyV-OTTCO49tC7dyet6fKg7kQdcS7MuBPEqabj1yPg9wnoa9dJmoZUs5z6HmAj3dqjN7aFBXsMsSSaAQ/s1600/Chart+71a.jpg" imageanchor="1" style="clear: right; float: right; margin-bottom: 1em; margin-left: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgJ5Q6TS0M7kXGpj3RPZgPxgvJ46dVjzadBxue51nALAsz5xHpOZIssXrqnpY0yyV-OTTCO49tC7dyet6fKg7kQdcS7MuBPEqabj1yPg9wnoa9dJmoZUs5z6HmAj3dqjN7aFBXsMsSSaAQ/s1600/Chart+71a.jpg" height="192" width="320" /></a>Scatter chart, Chart 71, has sufficient detail to show that
items tend to be most discriminating when they have a difficulty near the
average test score (not just near 50%).</div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
The question is often asked, “Do tests have to be designed
for an average score of 50%?”<span style="mso-spacerun: yes;"> </span>If
the SD remains the same, I found no difference in the KR20 or SEM. [The
observed SD is ignored by the Rasch IRT model used by many states for test
analysis.]</div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
The maximum item discrimination value of 0.64 was always
associated with an item mark pattern in which all right marks and all wrong
marks were in two groups with no mixing of right and wrong marks. I loaded a
perfect Guttman mark pattern and found that 0.64 was the maximum corrected
value for this size of data set. (The corrected values are better estimates than
the uncorrected values in a small data set.)</div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
Items of equal difficulty can have very different
discrimination values. In Table 26, three items have a difficulty of 7 right
marks. Their corrected discrimination values were 0.34 and 0.58.</div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
Psychometricians have solved the problem this creates in
estimating test reliability by deleting an item and recalculating the test
reliability to find the effect of any item in a test. The VESEngine (free download
below) includes this feature: Test Reliability (TR) toggle button. Test
reliability (KR20) and item discrimination (PBR) are interdependent on student
and item performance. A change in one usually results in a change in one or
more of the other factors. [Student ability and item difficulty are considered
independent using the Rasch model IRT analysis.] {I have yet to determine if
comparing CTT to IRT is a case of comparing apples to apples, apples to oranges
or apples to cider.}</div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh0gqfG8dJOE6CC94Ll3wtUh_4KVpSthoIwbn1fCVNiFlVaEE7_p-YTX8rrkYLyO4Os9ET-2SrwI7UK28xaQql9aJiFtlv7r-6_Bq233b1BvFlMiQrw2619sHwX9ZJ_bZ9PKxCGcijkgBw/s1600/Chart+72a.jpg" imageanchor="1" style="clear: right; float: right; margin-bottom: 1em; margin-left: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh0gqfG8dJOE6CC94Ll3wtUh_4KVpSthoIwbn1fCVNiFlVaEE7_p-YTX8rrkYLyO4Os9ET-2SrwI7UK28xaQql9aJiFtlv7r-6_Bq233b1BvFlMiQrw2619sHwX9ZJ_bZ9PKxCGcijkgBw/s1600/Chart+72a.jpg" height="240" width="320" /></a>Two additions to the model (Chart 72) are the two distributions of
the error MSS (black curve) and the portion of right and wrong marks (red
curve). Both have a maximum of 1/4 at the 50% point and a minimum of zero at
each end. Both are insensitive to the position of right marks in an item mark
pattern. The average score for right and for wrong marks is sensitive to the
mark pattern as the difference between these two values determines part of the
item discrimination value; PBR = (Proportion * Difference in Average Scores)/SD.</div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
Traditional, classical test theory (CTT), test analysis can
use a range of average test scores. In this example there was no difference in
the analysis with average test scores of 5 right (45%) to 8 right (73%). </div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
Rasch model item response theory (IRT) test analysis
transforms normal counts into logits that have only one reference point of 50%
(zero logit) when student ability and item difficulty are positioned on one
common scale. This point is then extended in either direction by values that
represent equal student ability and item discrimination (50% right) from zero
to 100% (-50% to +50%) using the Rasch model IRT. This scale ignores the
observed item discrimination.</div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
- - - - - - - - - - - - - - - - - - - - -</div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
The Best of the Blog - FREE</div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
The Visual Education Statistics Engine (VESEngine) presents
the common education statistics on one Excel traditional two-dimensional
spreadsheet. The post includes definitions. <a href="http://richard-hart.blogspot.com/2013/10/multiple-choice-test-analysis-summary.html">Download</a>
as .xlsm or .xls. <br />
<br /></div>
<div class="MsoNormal">
This blog started five years ago. It has meandered through
several views. The current project is <a href="http://richard-hart.blogspot.com/2014/02/test-scoring-mathematical-model.html">visualizing</a>
the VESEngine in three dimensions. The observed student mark patterns (on their
answer sheets) are on one level. The variation in the mark patterns (variance)
is on the second level.</div>
<!--[if gte mso 9]><xml>
<o:OfficeDocumentSettings>
<o:AllowPNG/>
</o:OfficeDocumentSettings>
</xml><![endif]--><!--[if gte mso 9]><xml>
<w:WordDocument>
<w:Zoom>0</w:Zoom>
<w:TrackMoves>false</w:TrackMoves>
<w:TrackFormatting/>
<w:PunctuationKerning/>
<w:DrawingGridHorizontalSpacing>18 pt</w:DrawingGridHorizontalSpacing>
<w:DrawingGridVerticalSpacing>18 pt</w:DrawingGridVerticalSpacing>
<w:DisplayHorizontalDrawingGridEvery>0</w:DisplayHorizontalDrawingGridEvery>
<w:DisplayVerticalDrawingGridEvery>0</w:DisplayVerticalDrawingGridEvery>
<w:ValidateAgainstSchemas/>
<w:SaveIfXMLInvalid>false</w:SaveIfXMLInvalid>
<w:IgnoreMixedContent>false</w:IgnoreMixedContent>
<w:AlwaysShowPlaceholderText>false</w:AlwaysShowPlaceholderText>
<w:Compatibility>
<w:BreakWrappedTables/>
<w:DontGrowAutofit/>
<w:DontAutofitConstrainedTables/>
<w:DontVertAlignInTxbx/>
</w:Compatibility>
</w:WordDocument>
</xml><![endif]--><!--[if gte mso 9]><xml>
<w:LatentStyles DefLockedState="false" LatentStyleCount="276">
</w:LatentStyles>
</xml><![endif]-->
<!--[if gte mso 10]>
<style>
/* Style Definitions */
table.MsoNormalTable
{mso-style-name:"Table Normal";
mso-tstyle-rowband-size:0;
mso-tstyle-colband-size:0;
mso-style-noshow:yes;
mso-style-parent:"";
mso-padding-alt:0in 5.4pt 0in 5.4pt;
mso-para-margin-top:0in;
mso-para-margin-right:0in;
mso-para-margin-bottom:10.0pt;
mso-para-margin-left:0in;
mso-pagination:widow-orphan;
font-size:12.0pt;
font-family:"Times New Roman";
mso-ascii-font-family:Cambria;
mso-ascii-theme-font:minor-latin;
mso-fareast-font-family:"Times New Roman";
mso-fareast-theme-font:minor-fareast;
mso-hansi-font-family:Cambria;
mso-hansi-theme-font:minor-latin;}
</style>
<![endif]--><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026"/>
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1"/>
</o:shapelayout></xml><![endif]-->
<!--StartFragment-->
<!--EndFragment--><br />
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
Power Up Plus (PUP) is classroom friendly software used to
score and analyze what students guess (traditional multiple-choice) and what
they report as the basis for further learning and instruction (knowledge and
judgment scoring multiple-choice). This is a quick way to update your
multiple-choice to meet Common Core State Standards (promote understanding as
well as rote memory). Knowledge and judgment scoring originated as a classroom
project, starting in 1980, that converted passive pupils into self-correcting
highly successful achievers in two to nine months. Download as <a href="http://www.nine-patch.com/download/PUP522xlsm.zip">.xlsm</a> or <a href="http://www.nine-patch.com/download/PUP522xls.zip">.xls</a>.</div>
Richard Harthttp://www.blogger.com/profile/04962997526156185761noreply@blogger.com0tag:blogger.com,1999:blog-6676724996771468267.post-65676412435696214242014-05-07T03:00:00.000-07:002014-05-30T13:52:05.779-07:00Test Scoring Math Model - Precision<div class="MsoNormal">
<span style="mso-spacerun: yes;"> </span>#6</div>
<div class="MsoNormal">
The precision of the average test score can be obtained from
the math model in <b style="mso-bidi-font-weight: normal;">two ways</b>: directly
from the mean sum of squares (MSS) or variance, and traditionally, by way of
the test reliability (KR20).</div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
I obtained the <b style="mso-bidi-font-weight: normal;">precision
of each individual student test score</b> from the math model by taking the
square root of the sum of squared deviations (SS) within each score mark
pattern (green, Table 25). The value is called the conditional standard error
of measurement (<b style="mso-bidi-font-weight: normal;">CSEM</b>) as it sums
deviations for one student score (one condition), not for the total test.</div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhe_-eLjw9S65eAdBIZtzKzEdFdybqoVbq5XguXVXS3sAsDM7Is5i8e3xM5HKNYn50q1xu7c78jWfGL26TMTy1XSVaYUdt4iVlakrIdIzdtMEV6vvlXyBxMSpNeDyicUncZVWojREZCu48/s1600/Chart70+copy.jpg" imageanchor="1" style="clear: right; float: right; margin-bottom: 1em; margin-left: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhe_-eLjw9S65eAdBIZtzKzEdFdybqoVbq5XguXVXS3sAsDM7Is5i8e3xM5HKNYn50q1xu7c78jWfGL26TMTy1XSVaYUdt4iVlakrIdIzdtMEV6vvlXyBxMSpNeDyicUncZVWojREZCu48/s1600/Chart70+copy.jpg" height="165" width="320" /></a>I multiplied the mean sum of squares (MSS) by the number of
items averaged (21) to yield the SS (0.15 x 21 = 3.15 for a 17 right mark score)
(or I could have just added up the squared deviations). The SQRT(3.15) = 1.80 right
marks for the CSEM. Some 2/3 of the time a <b style="mso-bidi-font-weight: normal;">re-tested
score</b> of 17 right marks can be expected to fall between 15.20 and 18.80 (15
and 19) right marks (Chart 70).</div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
The test Standard Error of Measurement (<b style="mso-bidi-font-weight: normal;">SEM</b>) is then the average of the 22 individual <b style="mso-bidi-font-weight: normal;">CSEM</b> values (1.75 right marks or 8.31%).</div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
The <b>traditional
derivation</b> of the test SEM (the error in the average test score) combines
the test reliability (KR20) and the <b>SD</b>
(spread) of the average test score.</div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
The SD (2.07) is from the SQRT(MSS, 4.08) between student
scores. The test reliability (0.29) is the ratio of the true variance (MSS,
1.12) to the total variance (MSS, 4,08) between student scores (see previous
post).</div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
The <b style="mso-bidi-font-weight: normal;">expectation</b>
is that the greater the reliability of a test, the smaller the error in estimating
the average test score. An equation is now needed to transform variance values
on the top level of the math model to apply to the lower linear level.</div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
SEM = SQRT(1 – KR20) * SD = SQRT(1 – 0.29) * 2.07 = SQRT(0.71)
* 2.07 = 0.84 * 2.07 = 1.75 right marks.</div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
The <b style="mso-bidi-font-weight: normal;">operation</b> of
“1 – KR20” aligns the value of 0.71 to extract the portion of the SD that
represents the SEM. If the test reliability goes up, the error in estimating
the average test score (SEM) goes down.</div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
Chart 70 shows the variance (MSS), the SS, and the CSEM
based on 21 items, for each student score. It also shows the distribution of
the <b>CSEM values that I averaged for the test SEM</b>.</div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
The i<b style="mso-bidi-font-weight: normal;">ndividual CSEM</b>
is highest (largest error, poorer precision) when the student score is 50%
(Charts 65 and 70). Higher student scores yield lower CSEM values (better
precision). This makes sense.</div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
The <b style="mso-bidi-font-weight: normal;">test SEM</b> (the
average of the CSEM values) is related to the distribution of student test
scores (purple dash, Chart 70). Adding easy items (easy in the sense that the
students were well prepared) decreases error, improves precision, reduces the SEM.</div>
<div class="MsoNormal" style="margin-bottom: .0001pt; margin-bottom: 0in; mso-layout-grid-align: none; mso-pagination: none; text-autospace: none;">
<span style="color: #1f242e; font-family: Georgia; font-size: 13.0pt; mso-bidi-font-family: Georgia;"><br /></span></div>
<div class="MsoNormal" style="margin-bottom: .0001pt; margin-bottom: 0in; mso-layout-grid-align: none; mso-pagination: none; text-autospace: none;">
<span style="color: #1f242e; font-family: Georgia; font-size: 13.0pt; mso-bidi-font-family: Georgia;">- - - -
- - - - - - - - - - - - - - - - - <o:p></o:p></span><br />
<span style="color: #1f242e; font-family: Georgia; font-size: 13.0pt; mso-bidi-font-family: Georgia;"><br /></span>
<br />
<div class="MsoNormal">
The Best of the Blog - FREE</div>
<div class="MsoNormal">
</div>
<ul>
<li>The Visual Education Statistics Engine (VESEngine) presents
the common education statistics on one Excel traditional two-dimensional
spreadsheet. The post includes definitions. <a href="http://richard-hart.blogspot.com/2013/10/multiple-choice-test-analysis-summary.html">Download</a>
as .xlsm or .xls.</li>
</ul>
<ul>
<li>This blog started seven years ago. It has meandered through
several views. The current project is <a href="http://richard-hart.blogspot.com/2014/02/test-scoring-mathematical-model.html">visualizing</a>
the VESEngine in three dimensions. The observed student mark patterns (on their
answer sheets) are on one level. The variation in the mark patterns is on a
second level.</li>
</ul>
<div class="MsoNormal">
</div>
<ul>
<li>Power Up Plus (PUP) is classroom friendly software used to
score and analyze what students guess (traditional multiple-choice) and what
they report as the basis for further learning and instruction (knowledge and
judgment scoring multiple-choice). This is a quick way to update your
multiple-choice to meet Common Core State Standards (promote understanding as
well as rote memory). Knowledge and judgment scoring originated as a classroom
project, starting in 1980, that converted passive pupils into self-correcting
highly successful achievers in two to nine months. Download as <a href="http://www.nine-patch.com/download/PUP522xlsm.zip">.xlsm</a> or <a href="http://www.nine-patch.com/download/PUP522xls.zip">.xls</a>. <a href="http://www.nine-patch.com/qstart.htm"><span style="color: #0000fe; font-family: Georgia; font-size: 13.0pt; mso-bidi-font-family: Georgia; text-decoration: none; text-underline: none;">Quick Start</span></a></li>
</ul>
</div>
<br />
<!--[if gte mso 9]><xml>
<o:OfficeDocumentSettings>
<o:AllowPNG/>
</o:OfficeDocumentSettings>
</xml><![endif]--><!--[if gte mso 9]><xml>
<w:WordDocument>
<w:Zoom>0</w:Zoom>
<w:TrackMoves>false</w:TrackMoves>
<w:TrackFormatting/>
<w:PunctuationKerning/>
<w:DrawingGridHorizontalSpacing>18 pt</w:DrawingGridHorizontalSpacing>
<w:DrawingGridVerticalSpacing>18 pt</w:DrawingGridVerticalSpacing>
<w:DisplayHorizontalDrawingGridEvery>0</w:DisplayHorizontalDrawingGridEvery>
<w:DisplayVerticalDrawingGridEvery>0</w:DisplayVerticalDrawingGridEvery>
<w:ValidateAgainstSchemas/>
<w:SaveIfXMLInvalid>false</w:SaveIfXMLInvalid>
<w:IgnoreMixedContent>false</w:IgnoreMixedContent>
<w:AlwaysShowPlaceholderText>false</w:AlwaysShowPlaceholderText>
<w:Compatibility>
<w:BreakWrappedTables/>
<w:DontGrowAutofit/>
<w:DontAutofitConstrainedTables/>
<w:DontVertAlignInTxbx/>
</w:Compatibility>
</w:WordDocument>
</xml><![endif]--><!--[if gte mso 9]><xml>
<w:LatentStyles DefLockedState="false" LatentStyleCount="276">
</w:LatentStyles>
</xml><![endif]-->
<!--[if gte mso 10]>
<style>
/* Style Definitions */
table.MsoNormalTable
{mso-style-name:"Table Normal";
mso-tstyle-rowband-size:0;
mso-tstyle-colband-size:0;
mso-style-noshow:yes;
mso-style-parent:"";
mso-padding-alt:0in 5.4pt 0in 5.4pt;
mso-para-margin-top:0in;
mso-para-margin-right:0in;
mso-para-margin-bottom:10.0pt;
mso-para-margin-left:0in;
mso-pagination:widow-orphan;
font-size:12.0pt;
font-family:"Times New Roman";
mso-ascii-font-family:Cambria;
mso-ascii-theme-font:minor-latin;
mso-fareast-font-family:"Times New Roman";
mso-fareast-theme-font:minor-fareast;
mso-hansi-font-family:Cambria;
mso-hansi-theme-font:minor-latin;}
</style>
<![endif]--><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026"/>
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1"/>
</o:shapelayout></xml><![endif]-->
<!--StartFragment-->
<!--EndFragment-->Richard Harthttp://www.blogger.com/profile/04962997526156185761noreply@blogger.com0tag:blogger.com,1999:blog-6676724996771468267.post-50483982365428779882014-04-23T03:00:00.001-07:002014-04-23T03:00:15.589-07:00Test Scoring Math Model - Reliability<div class="MsoNormal">
5</div>
<div class="MsoNormal">
An estimate of the reliability or reproducibility of a test
can be extracted from the variation within the tabled right marks (Table 25).
The variance from within the item <b style="mso-bidi-font-weight: normal;">columns</b>
is related to the variance from within the student score <b style="mso-bidi-font-weight: normal;">column</b>. </div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj-j971s3U1o2oF8ePId0DlyBp9404Rq7RzkFxcSogM_vkQ6PhDFBWUgvQY0rseJZys3PChDfvlMb5VerKwW0oCUvzxqGZSrDi8U-zr9cOQXozuAUGwejzNV-_37VgHzaOh0EbpxnQ6k2A/s1600/Chart+68.jpg" imageanchor="1" style="clear: right; float: right; margin-bottom: 1em; margin-left: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj-j971s3U1o2oF8ePId0DlyBp9404Rq7RzkFxcSogM_vkQ6PhDFBWUgvQY0rseJZys3PChDfvlMb5VerKwW0oCUvzxqGZSrDi8U-zr9cOQXozuAUGwejzNV-_37VgHzaOh0EbpxnQ6k2A/s1600/Chart+68.jpg" height="263" width="320" /></a>The error within items variance (2.96) and total variance
(MSS) between student scores (4.08) are both obtained from columns in Table 25b (blue, Chart 68). The <b style="mso-bidi-font-weight: normal;">true variance</b>
is then 4.08 – 2.96 = 1.12.</div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
The <b style="mso-bidi-font-weight: normal;">ratio</b> of <b style="mso-bidi-font-weight: normal;">true variance</b> to the <b style="mso-bidi-font-weight: normal;">total variance</b> between scores
(1.12/4.08) becomes an indicator of test reliability (0.28). This makes sense.</div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
A test with <b style="mso-bidi-font-weight: normal;">perfect
reliability</b> (4.08/4.08 = 1.0) would have no variation, error variance = 0,
within the item columns in Table 25. A test with <b style="mso-bidi-font-weight: normal;">no reliability</b> (0.0/4.08) would show equal values (4.08) for within
item columns, and between test scores.</div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
The <b style="mso-bidi-font-weight: normal;">KR20</b> formula
then adjusts the above value (0.28 x 21/20) to 0.29 [from a large population (n) to a
small sample value (n-1)]. The KR20 ratio has no unit labels (“var/var” = “”). All
of the above takes place on the upper (variance) level of the math model. </div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEja5Tgy4tNwrqyGC6cJcVrshwoXgeYul_6XsujUqBdZPP-8U8LyNke_s447CZSiL3KyTAeXeBKYDWOUk9wCXus_bUvUb-CRZRrV0afXNpkdm6uDQzf0yNZL1M6hDjJyf2dYgCt2ojACunE/s1600/Chart+6869.jpg" imageanchor="1" style="clear: right; float: right; margin-bottom: 1em; margin-left: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEja5Tgy4tNwrqyGC6cJcVrshwoXgeYul_6XsujUqBdZPP-8U8LyNke_s447CZSiL3KyTAeXeBKYDWOUk9wCXus_bUvUb-CRZRrV0afXNpkdm6uDQzf0yNZL1M6hDjJyf2dYgCt2ojACunE/s1600/Chart+6869.jpg" height="224" width="320" /></a>Doubling the number of students taking the test (Chart 69) has no effect on reliability. Doubling the number of items doubles the error variance but increases the total variance by the square. The test <b style="mso-bidi-font-weight: normal;">reliability
increases</b> from 0.29 to 0.64.</div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
The square root
of the total variance between scores (4.08) yields the <b style="mso-bidi-font-weight: normal;">standard deviation (SD)</b> for the score distribution [(2.02 for (n)
and 2.07 for (n-1)] on the lower floor of the math model. </div>
<div class="MsoNormal" style="margin-bottom: .0001pt; margin-bottom: 0in; mso-layout-grid-align: none; mso-pagination: none; text-autospace: none;">
<span style="color: #1f242e; font-family: Georgia; font-size: 13.0pt; mso-bidi-font-family: Georgia;"><br /></span></div>
<div class="MsoNormal" style="margin-bottom: .0001pt; margin-bottom: 0in; mso-layout-grid-align: none; mso-pagination: none; text-autospace: none;">
<span style="color: #1f242e; font-family: Georgia; font-size: 13.0pt; mso-bidi-font-family: Georgia;">- - - -
- - - - - - - - - - - - - - - - - <o:p></o:p></span><br />
<br />
<div class="MsoNormal" style="margin-bottom: 0in;">
<div class="MsoNormal">
The Best of the Blog - FREE</div>
<div class="MsoNormal">
</div>
<ul>
<li>The Visual Education Statistics Engine (VESEngine) presents the common education statistics on one Excel traditional two-dimensional spreadsheet. The post includes definitions. <a href="http://richard-hart.blogspot.com/2013/10/multiple-choice-test-analysis-summary.html">Download</a> as .xlsm or .xls.</li>
</ul>
<div class="MsoNormal">
</div>
<ul>
<li>This blog started seven years ago. It has meandered through several views. The current project is <a href="http://richard-hart.blogspot.com/2014/02/test-scoring-mathematical-model.html">visualizing</a> the VESEngine in three dimensions. The observed student mark patterns (on their answer sheets) are on one level. The variation in the mark patterns is on a second level.</li>
</ul>
<div class="MsoNormal">
</div>
<ul>
<li>Power Up Plus (PUP) is classroom friendly software used to score and analyze what students guess (traditional multiple-choice) and what they report as the basis for further learning and instruction (knowledge and judgment scoring multiple-choice). This is a quick way to update your multiple-choice to meet Common Core State Standards (promote understanding as well as rote memory). Knowledge and judgment scoring originated as a classroom project, starting in 1980, that converted passive pupils into self-correcting highly successful achievers in two to nine months. Download as <a href="http://www.nine-patch.com/download/PUP522xlsm.zip">.xlsm</a> or <a href="http://www.nine-patch.com/download/PUP522xls.zip">.xls</a>. <a href="http://www.nine-patch.com/qstart.htm"><span style="color: #0000fe; font-family: Georgia; font-size: 13pt; text-decoration: none;">Quick Start</span></a></li>
</ul>
</div>
</div>
<div class="MsoNormal" style="margin-bottom: .0001pt; margin-bottom: 0in; mso-layout-grid-align: none; mso-pagination: none; text-autospace: none;">
<br /></div>
Richard Harthttp://www.blogger.com/profile/04962997526156185761noreply@blogger.com0tag:blogger.com,1999:blog-6676724996771468267.post-25067251396946376722014-03-05T03:00:00.000-08:002014-03-05T03:00:09.228-08:00Test Scoring Math Model - Variance<div class="MsoNormal">
<span style="mso-tab-count: 1;"> </span>#4</div>
<div class="MsoNormal">
The first thing I noticed when inspecting the top of the test
scoring math model (Table 25) was that the variation within the <b style="mso-bidi-font-weight: normal;">central cell field</b> has a different
reference point (external to the data) than the variation between scores in the
<b style="mso-bidi-font-weight: normal;">marginal cell column</b> (internal to
the data). Also the variation within the central cell field (the variance) is
harvested in two ways: within rows (scores) and within columns (items).</div>
<div class="MsoNormal">
<br /></div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiy8MmbNzjX9dl0774_qwmMDMdWT6XMDMyr6qvwEFlzTYZWPOiH639kfSpsc_NXZ4P9XPWQQREhnAnUyWAEI-WlnlNEaltNG5KgTMANtH11nxJU0JuMEdsA4ysJdKq2K9fZ5LSbenspU4Y/s1600/Chart+64.jpg" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiy8MmbNzjX9dl0774_qwmMDMdWT6XMDMyr6qvwEFlzTYZWPOiH639kfSpsc_NXZ4P9XPWQQREhnAnUyWAEI-WlnlNEaltNG5KgTMANtH11nxJU0JuMEdsA4ysJdKq2K9fZ5LSbenspU4Y/s1600/Chart+64.jpg" height="96" width="200" /></a></div>
<div class="MsoNormal">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEioTSuNXnyU0R032g0Z3DCsWbmBHSbCCrK9aH_kb7U23tD18xl3dmeOd9mF2FK7Z6q0TWbcBrLzkFGH_-ry3QHpN-9N0BZ3KbVqLpeTkBN1ZjG7TKh1LBCRcL5uOpjSGd0RogyC314btK8/s1600/Chart+65.jpg" imageanchor="1" style="clear: right; float: right; margin-bottom: 1em; margin-left: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEioTSuNXnyU0R032g0Z3DCsWbmBHSbCCrK9aH_kb7U23tD18xl3dmeOd9mF2FK7Z6q0TWbcBrLzkFGH_-ry3QHpN-9N0BZ3KbVqLpeTkBN1ZjG7TKh1LBCRcL5uOpjSGd0RogyC314btK8/s1600/Chart+65.jpg" height="92" width="200" /></a>The mean sum of squared deviations (MSS) or variance <b style="mso-bidi-font-weight: normal;">within</b> a column or a row has a fixed
range (Chart 64 and Chart 65). The maximum occurs when the marks are 1/2 right
and 1/2 wrong (1/2 x 1/2 = 1/4 or 25%). [Variance also equals p * q or (Right *
Wrong)/(Right + Wrong)] The contribution each mark makes to the variance is distributed
along this gentle curve. The variable <b style="mso-bidi-font-weight: normal;">data
are fit to a rigid model</b>.</div>
<div class="MsoNormal">
<br /></div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgOnHHGBHiocA7IC3ObnaWpXMbgkCM9DKP07nrPLSw_exOWjs1Sg4LK0DA3TCQOfjTiK-mrxYRuttPajnFi0KH5aeAHiLYlTWIKeXo1T-_XhyfqGAO4rNHiXR6C3Y2lbAmw-SLIgqnOAXQ/s1600/Photo+64-65b.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgOnHHGBHiocA7IC3ObnaWpXMbgkCM9DKP07nrPLSw_exOWjs1Sg4LK0DA3TCQOfjTiK-mrxYRuttPajnFi0KH5aeAHiLYlTWIKeXo1T-_XhyfqGAO4rNHiXR6C3Y2lbAmw-SLIgqnOAXQ/s1600/Photo+64-65b.jpg" height="226" width="320" /></a></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
I obtained the overall shape of these two variances by
folding Chart 64 and Chart 65 into Photo 64-65. <span style="mso-spacerun: yes;"> </span>The result is a dome or a depression above or below the upper
floor of the model.</div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
The peak of the dome (maximum variance) is reached when a
student functioning at 50% marks an item with 50% difficulty. <b style="mso-bidi-font-weight: normal;">Standardized test makers</b> try to
maximize this feature of the model. The larger the mismatch between item
difficulty and student ability, the lower down the position of the variance on
the dome. <b style="mso-bidi-font-weight: normal;">CAT</b> attempts to adjust
item difficulty to match student preparedness. </div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjVKJeQdN2UTkmoyU3JxzmVGyCBFMFXudPOI15jkTVrGLUNpOz8LxoIIw-K4AUC3ACw4daX79I5uXPuBCKO48WKmZaPwPUjTpHgZ0gIHELuTsWx81EOyrFPfiUnhQ6QYybxi-y98Mq-QVM/s1600/Chart+66.jpg" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjVKJeQdN2UTkmoyU3JxzmVGyCBFMFXudPOI15jkTVrGLUNpOz8LxoIIw-K4AUC3ACw4daX79I5uXPuBCKO48WKmZaPwPUjTpHgZ0gIHELuTsWx81EOyrFPfiUnhQ6QYybxi-y98Mq-QVM/s1600/Chart+66.jpg" height="200" width="198" /></a>Chart 66 is a direct overhead view of the dome. Elevation
lines have been added at 5% intervals from zero to 25%. I then <b style="mso-bidi-font-weight: normal;">fitted the data</b> from Nursing124 <b style="mso-bidi-font-weight: normal;">to the roof of the model</b>. The data only
spread over one quadrant of the model. The data could completely cover the dome
in an ideal situation in which every combination of score and difficulty
occurred. </div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
The total test variance within items is then the sum of the
variance within all items (0.04 to 0.25 = 2.96). The total test variance within
scores is the sum of the variance of all scores (0.05 to 0.24 = 3.33). See
Table 8.</div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhrdPzMucBBMg4YAc_zW7mZibE51D45OYTrrYvHvKsxCJ6Fk_GjNX4Q81GqhJfNrjsvQRII8Ff2mXyfCNv7wAeWadUrDm8tX4Bif3OqbyeNW6wbEDr6OKYsJBVlpR-XTPd9TgOBcKsA610/s1600/Chart67a.jpg" imageanchor="1" style="clear: right; float: right; margin-bottom: 1em; margin-left: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhrdPzMucBBMg4YAc_zW7mZibE51D45OYTrrYvHvKsxCJ6Fk_GjNX4Q81GqhJfNrjsvQRII8Ff2mXyfCNv7wAeWadUrDm8tX4Bif3OqbyeNW6wbEDr6OKYsJBVlpR-XTPd9TgOBcKsA610/s1600/Chart67a.jpg" height="200" width="157" /></a>The math <b style="mso-bidi-font-weight: normal;">model
adjusts to fit the data</b> in the marginal cell student score column (variance
between scores). The reference point is not a static feature of the model but
the average test score (16.77 or 80%). The plot of the variance between scores
can be attached to the right side of the math model (Chart 67).</div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
The variance <b style="mso-bidi-font-weight: normal;">within</b>
columns and rows spreads across the static frame of the model. The model then
adjusts to fit the variance <b style="mso-bidi-font-weight: normal;">between</b>
scores (rows) to match the spread of the active <b style="mso-bidi-font-weight: normal;">within</b> rows.</div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
I can see another interpretation of the model variance if
the dome is inverted as a depression. As a flight instrument on a blimp: pitch,
roll, and yaw (within item, 2.96; within score, 3.31; and between scores, 4.10)
the blimp would have the nose up, rolled to the side, and with the rudder hard
over.</div>
<div class="MsoNormal" style="margin-bottom: .0001pt; margin-bottom: 0in; mso-layout-grid-align: none; mso-pagination: none; text-autospace: none;">
<span style="color: #1f242e; font-family: Georgia; font-size: 13.0pt; mso-bidi-font-family: Georgia;"><br /></span></div>
<div class="MsoNormal" style="margin-bottom: .0001pt; margin-bottom: 0in; mso-layout-grid-align: none; mso-pagination: none; text-autospace: none;">
<span style="color: #1f242e; font-family: Georgia; font-size: 13.0pt; mso-bidi-font-family: Georgia;">- - - -
- - - - - - - - - - - - - - - - - <o:p></o:p></span></div>
<div class="MsoNormal" style="margin-bottom: .0001pt; margin-bottom: 0in; mso-layout-grid-align: none; mso-pagination: none; text-autospace: none;">
<br /></div>
<div class="MsoNormal" style="margin-bottom: .0001pt; margin-bottom: 0in; mso-layout-grid-align: none; mso-pagination: none; text-autospace: none;">
<span style="color: #1f242e; font-family: Georgia; font-size: 13.0pt; mso-bidi-font-family: Georgia;">Free
software to help you and your students experience and understand how to break
out of traditional-multiple choice (TMC) and into </span><a href="http://www.nine-patch.com/"><span style="color: #362919; font-family: Georgia; font-size: 13.0pt; mso-bidi-font-family: Georgia;">Knowledge and Judgment
Scoring</span></a><span style="color: #1f242e; font-family: Georgia; font-size: 13.0pt; mso-bidi-font-family: Georgia;"> (KJS) (tricycle to bicycle):<o:p></o:p></span></div>
<div class="MsoNormal" style="margin-bottom: .0001pt; margin-bottom: 0in; mso-layout-grid-align: none; mso-pagination: none; text-autospace: none;">
<br /></div>
<div class="MsoNormal" style="margin-bottom: .0001pt; margin-bottom: 0in; mso-layout-grid-align: none; mso-pagination: none; text-autospace: none;">
<span style="color: #1f242e; font-family: Georgia; font-size: 13.0pt; mso-bidi-font-family: Georgia;"><o:p><br /></o:p></span></div>
<div class="MsoNormal" style="margin-bottom: .0001pt; margin-bottom: 0in; mso-layout-grid-align: none; mso-pagination: none; tab-stops: 11.0pt .5in; text-autospace: none;">
</div>
<ul>
<li><a href="http://www.nine-patch.com/complete/BreakOut.htm"><span style="color: #0000fe; font-family: Georgia; font-size: 13.0pt; mso-bidi-font-family: Georgia; text-decoration: none; text-underline: none;">Break Out Overview</span></a></li>
<li><a href="http://www.nine-patch.com/download/BrkOutSC.zip"><span style="color: #0000fe; font-family: Georgia; font-size: 13.0pt; mso-bidi-font-family: Georgia; text-decoration: none; text-underline: none;">Download Break Out</span></a></li>
<li><span style="color: #0000fe; font-family: Georgia; font-size: 13.0pt; mso-bidi-font-family: Georgia; text-decoration: none; text-underline: none;"><a href="http://www.nine-patch.com/qstart.htm">Quick Start</a></span></li>
</ul>
Richard Harthttp://www.blogger.com/profile/04962997526156185761noreply@blogger.com0tag:blogger.com,1999:blog-6676724996771468267.post-62939920098644153462014-02-19T03:00:00.000-08:002014-02-19T03:00:09.157-08:00Test Scoring Math Model - Input<div class="MsoNormal">
The <b>mathematical
model</b> (Table 25) in the previous post relates all the parts of a
traditional item analysis including the observed score distribution, test
reproducibility, and the precision of a score. Factors that influence test
scores can be detected and measured by the variation between and within
selected columns and rows.</div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
The model <b style="mso-bidi-font-weight: normal;">is only
aware of variation</b> within and between mark patterns (deviations from the
mean). The variance (the sum of squared deviations from the mean divided by the
number summed or the mean sum of squares or MSS) is the property of the data
that relates the mark patterns to the normal distribution. This permits
generating useful descriptive and predictive insights.</div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
The deviation of each mark from the mean is obtained by
subtracting the mean from the value of the mark (Table 25a). The <b style="mso-bidi-font-weight: normal;">squared deviation</b> value is then elevated
to the upper floor of the model (Step 1, Table 25b). [Un-squared deviations
from the mean would add up to zero.]</div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
[IF YOU ARE ONLY USING MULTIPLE-CHOICE TO RANK STUDENTS, YOU
MAY WANT TO SKIP THE FOLLOWING DISCUSSION ON THE MEANING OF TEST SCORES WHEN
USED TO GUIDE INSTRUCTION AND STUDENT DEVELOPMENT.]</div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
The model’s operation gains meaning by relating the score
and item mark distributions to a normal distribution. It compares observed data
to what is expected from chance alone or as I like to call it, the <b style="mso-bidi-font-weight: normal;">know-nothing mean</b>.</div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi7_4uizy9l5ZE4v7pe9zYBZqval21onBpR_B_ANyK2UJj34AhrMZnIN-23eo1Lzsr-rcUYYaDzdfNvlCWmEBpqE_4GesXcTZ5oy5VL20SRzM66B2Gzi39lx9Y2boXlkngS10bHPRh1VPA/s1600/Chart62.jpg" imageanchor="1" style="clear: right; float: right; margin-bottom: 1em; margin-left: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi7_4uizy9l5ZE4v7pe9zYBZqval21onBpR_B_ANyK2UJj34AhrMZnIN-23eo1Lzsr-rcUYYaDzdfNvlCWmEBpqE_4GesXcTZ5oy5VL20SRzM66B2Gzi39lx9Y2boXlkngS10bHPRh1VPA/s1600/Chart62.jpg" height="142" width="320" /></a>The expected know-nothing mean based on 0-wrong and 1-right
with 4-option items (popular on standardized tests) is centered on 25%, 6 right
out of 24 questions (Chart 62). This is from luck on test day alone (students
only need to mark each item; they do not need to read the test) on a
traditional multiple-choice test (TMC). The mean moves to 50% if student
ability and item difficulty have equal value. It moves to 80% if students are functioning
near the mastery level as seen in the Nursing124 data. <b style="mso-bidi-font-weight: normal;">The math model will adjust to fit these data.</b></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
The know-nothing mean, with Knowledge and Judgment Scoring
(KJS) and the partial credit Rasch model (PCRM), is at 50% for a high quality
student or 25% for a low quality student (same as TMC). Scoring is 0-wrong,
1-have yet to learn, and 2-right.<span style="mso-spacerun: yes;">
</span>A high quality student accurately, honestly, and fairly reports what is
trusted to be useful in further instruction and learning. There are few, if any,
wrong marks. A low quality student performs the same on both methods of scoring
by marking an answer on all items. <b style="mso-bidi-font-weight: normal;">Students
adjust the test to fit their preparation</b>.</div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
The know-nothing mean for <b style="mso-bidi-font-weight: normal;">Knowledge Factor</b> (KF) is above 75% (near the mastery level in the
Nursing124 data, violet). KF weights knowledge and judgment as 1:3, rather than
1:1 (KJS) or 1:0 (TMC). High-risk examinees do not guess. Test takers are given
the same opportunity as teachers and test makers to produce accurate, honest,
and fair test scores.</div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg7Ara78n7J1_I5_qHtwZgleUd3M8QHHe1Yn2hXDmHkjFHUAtB_UWtO5LLQe_prBSMsXrftDuLNq-G9_pudyCzs4MSFWQeZH7-e18tEyCgTuotVLfQJfWOizoZ1d41oXJyiHRBudQr4OiQ/s1600/Chart63.jpg" imageanchor="1" style="clear: right; float: right; margin-bottom: 1em; margin-left: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg7Ara78n7J1_I5_qHtwZgleUd3M8QHHe1Yn2hXDmHkjFHUAtB_UWtO5LLQe_prBSMsXrftDuLNq-G9_pudyCzs4MSFWQeZH7-e18tEyCgTuotVLfQJfWOizoZ1d41oXJyiHRBudQr4OiQ/s1600/Chart63.jpg" height="141" width="320" /></a>The <b style="mso-bidi-font-weight: normal;">distribution of
scores</b> about the know-nothing mean are the same for TMC (green, Chart 63) and
KJS (red, Chart 63). An unprepared student can expect, on average, a score of
25% on a TMC test with 4-option items. Some 2/3 of the time the score will fall
within +/- 1 standard deviation of 25%. As a rule of thumb, the standard
deviation (SD) on a classroom test tends to be about 10%. The best an
unprepared student can hope for is a score over 35% (25 + 10) about 1/6 of the
time ((1 - 2/3)/2).</div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
The know-nothing mean (50%) for KJS and the PCRM is very
different from TMC (25%) for low quality students. The observed <b style="mso-bidi-font-weight: normal;">operational mean </b>at the mastery level
(above 80%, violet) is nearly the same for high quality students electing
either method of scoring. High quality students have the option of selecting
items they can trust they can answer correctly. There are few to no wrong
marks. [Totally unprepared high quality students could elect to not mark any item
for a score of 50%.]</div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
The <b style="mso-bidi-font-weight: normal;">mark patterns</b>
on the lower floor of the mathematical model have different meanings based on
the scoring method. TMC delivers a score that only ranks the student’s
performance on the test. KJS and the PCR deliver an assessment of what a
student knows or can do that can be trusted as the basis for further learning
and instruction. Quantity (number right) and quality (portion marked that are
right) are not linked. Any score below 50% indicates the student has not
developed a sense of judgment needed to learn and report at higher levels of
thinking.</div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
The score and item mark patterns are fed into the upper
floor of the mathematical model as the <b style="mso-bidi-font-weight: normal;">squared
deviation from the mean</b> (d^2). [A positive deviation of 3 and a negative
deviation of 3 both yield a squared deviation of 9.] The next step is to make
sense of (to visualize, to relate) the distributions of the variance (MSS) from
columns and rows.</div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal" style="margin-bottom: .0001pt; margin-bottom: 0in; mso-layout-grid-align: none; mso-pagination: none; text-autospace: none;">
<span style="color: #1f242e; font-family: Georgia; font-size: 13.0pt; mso-bidi-font-family: Georgia;">- - - -
- - - - - - - - - - - - - - - - - <o:p></o:p></span></div>
<div class="MsoNormal" style="margin-bottom: .0001pt; margin-bottom: 0in; mso-layout-grid-align: none; mso-pagination: none; text-autospace: none;">
<br /></div>
<div class="MsoNormal" style="margin-bottom: .0001pt; margin-bottom: 0in; mso-layout-grid-align: none; mso-pagination: none; text-autospace: none;">
<span style="color: #1f242e; font-family: Georgia; font-size: 13.0pt; mso-bidi-font-family: Georgia;">Free
software to help you and your students experience and understand how to break
out of traditional-multiple choice (TMC) and into </span><a href="http://www.nine-patch.com/"><span style="color: #362919; font-family: Georgia; font-size: 13.0pt; mso-bidi-font-family: Georgia;">Knowledge and Judgment
Scoring</span></a><span style="color: #1f242e; font-family: Georgia; font-size: 13.0pt; mso-bidi-font-family: Georgia;"> (KJS) (tricycle to bicycle):<o:p></o:p></span></div>
<div class="MsoNormal" style="margin-bottom: .0001pt; margin-bottom: 0in; mso-layout-grid-align: none; mso-pagination: none; text-autospace: none;">
<br /></div>
<div class="MsoNormal" style="margin-bottom: .0001pt; margin-bottom: 0in; mso-layout-grid-align: none; mso-pagination: none; text-autospace: none;">
<br /></div>
<div class="MsoNormal" style="margin-bottom: .0001pt; margin-bottom: 0in; mso-layout-grid-align: none; mso-pagination: none; tab-stops: 11.0pt .5in; text-autospace: none;">
</div>
<ul>
<li><a href="http://www.nine-patch.com/complete/BreakOut.htm"><span style="color: #0000fe; font-family: Georgia; font-size: 13.0pt; mso-bidi-font-family: Georgia; text-decoration: none; text-underline: none;">Break Out Overview</span></a></li>
<li><a href="http://www.nine-patch.com/download/BrkOutSC.zip"><span style="color: #0000fe; font-family: Georgia; font-size: 13.0pt; mso-bidi-font-family: Georgia; text-decoration: none; text-underline: none;">Download Break Out</span></a></li>
<li><a href="http://www.nine-patch.com/qstart.htm"><span style="color: #0000fe; font-family: Georgia; font-size: 13.0pt; mso-bidi-font-family: Georgia; text-decoration: none; text-underline: none;">Quick Start</span></a></li>
</ul>
<br />
<div class="MsoNormal">
<br /></div>
Richard Harthttp://www.blogger.com/profile/04962997526156185761noreply@blogger.com0tag:blogger.com,1999:blog-6676724996771468267.post-37482848354737377232014-02-05T03:00:00.000-08:002014-02-05T03:00:01.494-08:00Test Scoring Mathematical Model<div class="MsoNormal">
The seven statistics reviewed in previous posts need to be
related to the underlying mathematics. Traditional multiple-choice (TMC) data
analysis has been expressed entirely with charts and the Excel spreadsheet VESEngine.
I will need a TMC math model to compare TMC with the Rasch model IRT that is
the dominant method of data analysis for standardized tests.</div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
A mathematical model contains the relationships and
variables listed in the charts and tables. This post applies the advice in
learning discussed in the previous post. It starts with the observed variables.
The mathematical model then summarizes the relationships in the seven
statistics.</div>
<div class="MsoNormal">
<br /></div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhoDj-ckMbVaQyCqbGMN1f93fm6-Uc5UgXzo-t35Y46suYu19KldAx1TGadSz_45jJwm-i3kgtaZjjMCAc09XDyAj01O7h-XJ2sFASP6zbBbLHGDRlVo9-GeuqJ7-XiufxbTaL-4tBp4F8/s1600/TopAndBottom3a.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhoDj-ckMbVaQyCqbGMN1f93fm6-Uc5UgXzo-t35Y46suYu19KldAx1TGadSz_45jJwm-i3kgtaZjjMCAc09XDyAj01O7h-XJ2sFASP6zbBbLHGDRlVo9-GeuqJ7-XiufxbTaL-4tBp4F8/s1600/TopAndBottom3a.jpg" height="308" width="400" /></a></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
The model contains two levels (Table 25). The first floor
level contains the observed mark patterns. The second floor level contains the
squared deviations from the score and item means; the variation in the mark
patterns. The squared values are then averaged to produce the variance.
[Variance = Mean sum of squares = MSS]</div>
<div class="MsoNormal">
<br /></div>
<div align="center" class="MsoNormal" style="text-align: center;">
1. Count</div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
The right marks are counted for each student and each item
(question). TMC: 0-wrong, 1-right captures quantity only. Knowledge and
Judgment Scoring (KJS) and the partial credit Rash model (PCRM) capture
quantity and quality: 0-wrong, 1-have yet to learn this, 2-right.</div>
<div class="MsoNormal">
Hall JR Count = SUM(right marks) = 20 <span style="mso-spacerun: yes;"> </span></div>
<div class="MsoNormal">
Item 12 Count = SUM(right marks) = 21 <span style="mso-spacerun: yes;"> </span></div>
<div class="MsoNormal">
<span style="mso-spacerun: yes;"><br /></span></div>
<div align="center" class="MsoNormal" style="text-align: center;">
2. Mean (Average)</div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
The sum is divided by the number of counts. (N students, 22
and n items, 21)</div>
<div class="MsoNormal">
The SUM of scores / N = 16.77; 16.77/n = 0.80 = 80%</div>
<div class="MsoNormal">
The SUM of items / n = 17.57; 17.57/N = 0.80 = 80%</div>
<div class="MsoNormal">
<br /></div>
<div align="center" class="MsoNormal" style="text-align: center;">
3. Variance</div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
The variation within any column or row is harvested as the deviation
between the marks in a student (row) or item (column) mark pattern, or between student
scores, with respect to the mean value. The squared deviations are summed and
averaged as the variance on the top level of the mathematical model (Table 25).</div>
<div class="MsoNormal">
Variance = SUM(Deviations^2)/(N or n) = SUM of Squares/(N or
n) = Mean SS = MSS</div>
<div class="MsoNormal">
<br /></div>
<div align="center" class="MsoNormal" style="margin-bottom: .0001pt; margin-bottom: 0in; mso-layout-grid-align: none; mso-pagination: none; text-align: center; text-autospace: none;">
4. Standard Deviation</div>
<div class="MsoNormal" style="margin-bottom: .0001pt; margin-bottom: 0in; mso-layout-grid-align: none; mso-pagination: none; text-autospace: none;">
<br /></div>
<div class="MsoNormal" style="margin-bottom: .0001pt; margin-bottom: 0in; mso-layout-grid-align: none; mso-pagination: none; text-autospace: none;">
The variation within a
score, item, or probability distribution expressed as a normal value that +/-
the mean includes 2/3 of a normal, bell-shaped, distribution: 1 Standard
Deviation = 1SD.</div>
<div class="MsoNormal" style="margin-bottom: .0001pt; margin-bottom: 0in; mso-layout-grid-align: none; mso-pagination: none; text-autospace: none;">
<br /></div>
<div class="MsoNormal" style="margin-bottom: .0001pt; margin-bottom: 0in; mso-layout-grid-align: none; mso-pagination: none; text-autospace: none;">
SD = Square Root of
Variance or MSS = SQRT(MSS) = SQRT(4.08) = 2.02 </div>
<div class="MsoNormal" style="margin-bottom: .0001pt; margin-bottom: 0in; mso-layout-grid-align: none; mso-pagination: none; text-autospace: none;">
<br /></div>
<div class="MsoNormal" style="margin-bottom: .0001pt; margin-bottom: 0in; mso-layout-grid-align: none; mso-pagination: none; text-autospace: none;">
For small classroom tests
the (N-1) SD = SQRT(4.28) = 2.07 marks</div>
<div class="MsoNormal" style="margin-bottom: .0001pt; margin-bottom: 0in; mso-layout-grid-align: none; mso-pagination: none; text-autospace: none;">
<br /></div>
<div class="MsoNormal" style="margin-bottom: .0001pt; margin-bottom: 0in; mso-layout-grid-align: none; mso-pagination: none; text-autospace: none;">
The variation in student
scores and the distribution of student scores are now expressed on the same
normal scale.</div>
<div class="MsoNormal" style="margin-bottom: .0001pt; margin-bottom: 0in; mso-layout-grid-align: none; mso-pagination: none; text-autospace: none;">
<br /></div>
<div align="center" class="MsoNormal" style="margin-bottom: .0001pt; margin-bottom: 0in; mso-layout-grid-align: none; mso-pagination: none; text-align: center; text-autospace: none;">
5. Test Reliability</div>
<div class="MsoNormal" style="margin-bottom: .0001pt; margin-bottom: 0in; mso-layout-grid-align: none; mso-pagination: none; text-autospace: none;">
<br /></div>
<div class="MsoNormal" style="margin-bottom: .0001pt; margin-bottom: 0in; mso-layout-grid-align: none; mso-pagination: none; text-autospace: none;">
The ratio of the true
variance to the score variance estimates the test reliability: the Kuder-Richardson
20 (KR20). The score (marginal column) variance – the error (summed from within
Item columns) variance = the true variance.</div>
<div class="MsoNormal" style="margin-bottom: .0001pt; margin-bottom: 0in; mso-layout-grid-align: none; mso-pagination: none; text-autospace: none;">
<br /></div>
<div class="MsoNormal" style="margin-bottom: .0001pt; margin-bottom: 0in; mso-layout-grid-align: none; mso-pagination: none; text-autospace: none;">
KR 20 = ((score variance –
error variance)/score variance) x n/1-n)</div>
<div class="MsoNormal" style="margin-bottom: .0001pt; margin-bottom: 0in; mso-layout-grid-align: none; mso-pagination: none; text-autospace: none;">
KR 20 = ((4.08 –
2.96)/4.08) x 21/20 = 0.29</div>
<div class="MsoNormal" style="margin-bottom: .0001pt; margin-bottom: 0in; mso-layout-grid-align: none; mso-pagination: none; text-autospace: none;">
<br /></div>
<div class="MsoNormal" style="margin-bottom: .0001pt; margin-bottom: 0in; mso-layout-grid-align: none; mso-pagination: none; text-autospace: none;">
This ratio is returned to
the first floor of the model. An acceptable classroom test has a KR20 > 0.7.
An acceptable standardized test has a KR20 >0.9. </div>
<div class="MsoNormal" style="margin-bottom: .0001pt; margin-bottom: 0in; mso-layout-grid-align: none; mso-pagination: none; text-autospace: none;">
<br /></div>
<div align="center" class="MsoNormal" style="margin-bottom: .0001pt; margin-bottom: 0in; mso-layout-grid-align: none; mso-pagination: none; text-align: center; text-autospace: none;">
6. Traditional Standard Error of Measurement</div>
<div class="MsoNormal" style="margin-bottom: .0001pt; margin-bottom: 0in; mso-layout-grid-align: none; mso-pagination: none; text-autospace: none;">
<br /></div>
<div class="MsoNormal" style="margin-bottom: .0001pt; margin-bottom: 0in; mso-layout-grid-align: none; mso-pagination: none; text-autospace: none;">
The range of error in
which 2/3 of the time your retest score may fall is the standard error of
measurement (SEM). The traditional SEM is based on the average performance of
your class: 16.77 +/- 1SD (+/- 2.07 marks). </div>
<div class="MsoNormal" style="margin-bottom: .0001pt; margin-bottom: 0in; mso-layout-grid-align: none; mso-pagination: none; text-autospace: none;">
<br /></div>
<div class="MsoNormal" style="margin-bottom: .0001pt; margin-bottom: 0in; mso-layout-grid-align: none; mso-pagination: none; text-autospace: none;">
SEM = SQRT(1-KR20) * SD =
SQRT(1- 0.29) * 2.07 = +/-1.75 marks</div>
<div class="MsoNormal" style="margin-bottom: .0001pt; margin-bottom: 0in; mso-layout-grid-align: none; mso-pagination: none; text-autospace: none;">
<br /></div>
<div class="MsoNormal" style="margin-bottom: .0001pt; margin-bottom: 0in; mso-layout-grid-align: none; mso-pagination: none; text-autospace: none;">
On a test that is totally
reliable (KR20 = 1), the SEM is zero. You can expect to get the same score on a
retest.</div>
<div class="MsoNormal" style="margin-bottom: .0001pt; margin-bottom: 0in; mso-layout-grid-align: none; mso-pagination: none; text-autospace: none;">
<br /></div>
<div align="center" class="MsoNormal" style="margin-bottom: .0001pt; margin-bottom: 0in; mso-layout-grid-align: none; mso-pagination: none; text-align: center; text-autospace: none;">
7. Conditional Standard Error of Measurement</div>
<div class="MsoNormal" style="margin-bottom: .0001pt; margin-bottom: 0in; mso-layout-grid-align: none; mso-pagination: none; text-autospace: none;">
<br /></div>
<div class="MsoNormal" style="margin-bottom: .0001pt; margin-bottom: 0in; mso-layout-grid-align: none; mso-pagination: none; text-autospace: none;">
The range of error in which
2/3 of the time your retest score may fall based on the rank of your test score
alone (conditional on one score rank) is the conditional standard error of
measurement (CSEM). The estimate is based (conditional) on your test score
rather than on the average class test score.</div>
<div class="MsoNormal" style="margin-bottom: .0001pt; margin-bottom: 0in; mso-layout-grid-align: none; mso-pagination: none; text-autospace: none;">
<br /></div>
<div class="MsoNormal" style="margin-bottom: .0001pt; margin-bottom: 0in; mso-layout-grid-align: none; mso-pagination: none; text-autospace: none;">
CSEM = SQRT((Variance
within your Score) * n number of questions) = SQRT(MSS * n) = SQRT(SS)</div>
<div class="MsoNormal" style="margin-bottom: .0001pt; margin-bottom: 0in; mso-layout-grid-align: none; mso-pagination: none; text-autospace: none;">
CSEM = SQRT(0.15 * 21) =
SQRT(3.15) = 1.80 marks</div>
<div class="MsoNormal" style="margin-bottom: .0001pt; margin-bottom: 0in; mso-layout-grid-align: none; mso-pagination: none; text-autospace: none;">
<br /></div>
<div class="MsoNormal" style="margin-bottom: .0001pt; margin-bottom: 0in; mso-layout-grid-align: none; mso-pagination: none; text-autospace: none;">
The average CSEM values (1.75)
for all of your class (light green) also yields the test SEM. This confirms the
above calculation for 6. Traditional Standard Error of Measurement for the test.</div>
<div class="MsoNormal" style="margin-bottom: .0001pt; margin-bottom: 0in; mso-layout-grid-align: none; mso-pagination: none; text-autospace: none;">
<br /></div>
<div class="MsoNormal" style="margin-bottom: .0001pt; margin-bottom: 0in; mso-layout-grid-align: none; mso-pagination: none; text-autospace: none;">
This mathematical model (Table
25) separates the flat display in the <a href="http://www.nine-patch.com/download/VESEngine.xlsm">VESEngine</a> into two
distinct levels. The lower floor is on a normal scale. The upper floor isolates
the variation within the marking patterns on the lower floor. The resulting
variance provides insight into the extent that the marking patterns could have
occurred by luck on test day and into the performance of teachers, students,
questions, and the test makers. Limited predictions can also be made.</div>
<div class="MsoNormal" style="margin-bottom: .0001pt; margin-bottom: 0in; mso-layout-grid-align: none; mso-pagination: none; text-autospace: none;">
<br /></div>
<div class="MsoNormal" style="margin-bottom: .0001pt; margin-bottom: 0in; mso-layout-grid-align: none; mso-pagination: none; text-autospace: none;">
Predictions are limited
using traditional multiple-choice (TMC) as students have only two options:
0-wrong and 1-right. Quantity and quality are linked into a single ranking. <a href="http://www.nine-patch.com/">Knowledge and Judgment Scoring</a> (KJS) and
the <a href="http://www.winsteps.com/winsteps.htm">partial credit Rasch model</a>
(PCRM) separate quantity and quality: 0-wrong, 1-have yet to learn, and
2-right. Students are free to report what they know and can do accurately,
honestly, and fairly.</div>
<div class="MsoNormal" style="margin-bottom: .0001pt; margin-bottom: 0in; mso-layout-grid-align: none; mso-pagination: none; text-autospace: none;">
<br /></div>
<div class="MsoNormal" style="margin-bottom: .0001pt; margin-bottom: 0in; mso-layout-grid-align: none; mso-pagination: none; text-autospace: none;">
<span style="color: #1f242e; font-family: Georgia; font-size: 13.0pt; mso-bidi-font-family: Georgia;">- - - -
- - - - - - - - - - - - - - - - - <o:p></o:p></span></div>
<div class="MsoNormal" style="margin-bottom: .0001pt; margin-bottom: 0in; mso-layout-grid-align: none; mso-pagination: none; text-autospace: none;">
<br /></div>
<div class="MsoNormal" style="margin-bottom: .0001pt; margin-bottom: 0in; mso-layout-grid-align: none; mso-pagination: none; text-autospace: none;">
<span style="color: #1f242e; font-family: Georgia; font-size: 13.0pt; mso-bidi-font-family: Georgia;">Free
software to help you and your students experience and understand how to break
out of traditional-multiple choice (TMC) and into </span><a href="http://www.nine-patch.com/"><span style="color: #362919; font-family: Georgia; font-size: 13.0pt; mso-bidi-font-family: Georgia;">Knowledge and Judgment
Scoring</span></a><span style="color: #1f242e; font-family: Georgia; font-size: 13.0pt; mso-bidi-font-family: Georgia;"> (KJS) (tricycle to bicycle):<o:p></o:p></span></div>
<div class="MsoNormal" style="margin-bottom: .0001pt; margin-bottom: 0in; mso-layout-grid-align: none; mso-pagination: none; text-autospace: none;">
<br /></div>
<div class="MsoNormal" style="margin-bottom: .0001pt; margin-bottom: 0in; mso-layout-grid-align: none; mso-pagination: none; text-autospace: none;">
<br /></div>
<div class="MsoNormal" style="margin-bottom: .0001pt; margin-bottom: 0in; mso-layout-grid-align: none; mso-pagination: none; tab-stops: 11.0pt .5in; text-autospace: none;">
</div>
<ul>
<li><a href="http://www.nine-patch.com/complete/BreakOut.htm"><span style="color: #0000fe; font-family: Georgia; font-size: 13.0pt; mso-bidi-font-family: Georgia; text-decoration: none; text-underline: none;">Break Out Overview</span></a></li>
<li><a href="http://www.nine-patch.com/download/BrkOutSC.zip"><span style="color: #0000fe; font-family: Georgia; font-size: 13.0pt; mso-bidi-font-family: Georgia; text-decoration: none; text-underline: none;">Download Break Out</span></a></li>
<li><a href="http://www.nine-patch.com/qstart.htm"><span style="color: #0000fe; font-family: Georgia; font-size: 13.0pt; mso-bidi-font-family: Georgia; text-decoration: none; text-underline: none;">Quick Start</span></a></li>
</ul>
<br />
<!--[if gte mso 9]><xml>
<o:OfficeDocumentSettings>
<o:AllowPNG/>
</o:OfficeDocumentSettings>
</xml><![endif]--><!--[if gte mso 9]><xml>
<w:WordDocument>
<w:Zoom>0</w:Zoom>
<w:TrackMoves>false</w:TrackMoves>
<w:TrackFormatting/>
<w:PunctuationKerning/>
<w:DrawingGridHorizontalSpacing>18 pt</w:DrawingGridHorizontalSpacing>
<w:DrawingGridVerticalSpacing>18 pt</w:DrawingGridVerticalSpacing>
<w:DisplayHorizontalDrawingGridEvery>0</w:DisplayHorizontalDrawingGridEvery>
<w:DisplayVerticalDrawingGridEvery>0</w:DisplayVerticalDrawingGridEvery>
<w:ValidateAgainstSchemas/>
<w:SaveIfXMLInvalid>false</w:SaveIfXMLInvalid>
<w:IgnoreMixedContent>false</w:IgnoreMixedContent>
<w:AlwaysShowPlaceholderText>false</w:AlwaysShowPlaceholderText>
<w:Compatibility>
<w:BreakWrappedTables/>
<w:DontGrowAutofit/>
<w:DontAutofitConstrainedTables/>
<w:DontVertAlignInTxbx/>
</w:Compatibility>
</w:WordDocument>
</xml><![endif]--><!--[if gte mso 9]><xml>
<w:LatentStyles DefLockedState="false" LatentStyleCount="276">
</w:LatentStyles>
</xml><![endif]-->
<!--[if gte mso 10]>
<style>
/* Style Definitions */
table.MsoNormalTable
{mso-style-name:"Table Normal";
mso-tstyle-rowband-size:0;
mso-tstyle-colband-size:0;
mso-style-noshow:yes;
mso-style-parent:"";
mso-padding-alt:0in 5.4pt 0in 5.4pt;
mso-para-margin-top:0in;
mso-para-margin-right:0in;
mso-para-margin-bottom:10.0pt;
mso-para-margin-left:0in;
mso-pagination:widow-orphan;
font-size:10.0pt;
font-family:"Times New Roman";
mso-ascii-font-family:Cambria;
mso-ascii-theme-font:minor-latin;
mso-hansi-font-family:Cambria;
mso-hansi-theme-font:minor-latin;}
</style>
<![endif]-->
<!--StartFragment-->
<!--EndFragment-->Richard Harthttp://www.blogger.com/profile/04962997526156185761noreply@blogger.com0tag:blogger.com,1999:blog-6676724996771468267.post-35218659500591592342014-01-29T03:00:00.000-08:002014-01-29T03:00:04.693-08:00Test Scoring Myths for Students<div class="MsoNormal">
The <b><span style="font-family: "Arial Bold";">best test </span></b>is a test that permits
you to accurately, honestly, and fairly report what you know and can do. You
know how to question, to get answers, and to verify. You know what you know and
what you have yet to learn. This operates at two levels of thinking. It is a
myth that a forced choice multiple-choice test measures what you trust you know
and can do.</div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
At the <b style="mso-bidi-font-weight: normal;">beginning of
any learning operation</b>, you learn to repeat and to recall. Next you learn
to relate the bits you can repeat and recall. By the end of a learning
operation you have assembled a web of skills and relationships. You start at
lower levels of thinking and progress to higher levels of thinking. Practice
takes you from slow conscious operations to fast automatic responses
(multiplication or roller skating). It is a myth that learning primarily occurs
only by responding to a teacher in a classroom.</div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
Your <b style="mso-bidi-font-weight: normal;">attitude</b>
during learning and testing is important. Your maturity is indicated by your
ability to get interested in new topics or activities your teacher recommends
(during the course). As a rule of thumb, a positive attitude is worth about one
letter grade on a test. It is a myth that you can easily learn when you have a
negative attitude.</div>
<div class="MsoNormal">
<br />
<b>Your expectations are important. </b>You tend to get what you expect. A nine year study with over 3000 students indicated that students tend to get the grade they expected at the time they enrolled in the class, based on their lack of information, misinformation, and attitude. It is a myth that you cannot do better than your preconceived grade.<br />
<br />
<b>Learning and testing</b>
are one coordinated event when you can see the result of your practicing
directly (target practice or skateboarding). This situation also occurs when
you are directly tutored by a person or by a person’s software. It is a myth
that you must always take a test separately from learning.</div>
<div class="MsoNormal">
<b style="mso-bidi-font-weight: normal;"><br /></b></div>
<div class="MsoNormal">
<b style="mso-bidi-font-weight: normal;">Complex learning</b>
operations go though the same sequence of learning steps. The rule of three
applies here. Read or practice from one source to get the basic terms or
actions. Read or practice from a second set to add any additional terms or
actions. Read or practice from a third set to test your understanding, your web
of knowledge and skill relationships. It is a myth that you must always have
another person test your learning (but another person can be very helpful).</div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
That other person is usually <b style="mso-bidi-font-weight: normal;">a teacher</b> who cannot teach and test each pupil or student
individually. The teacher also selects what is to be learned rather than
letting you make the choice. The teacher also selects the test you will take.
It is a myth that your teachers have the qualities needed to introduce you to
the range of skills and knowledge required for an honest, self-supporting
citizen. </div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
Teaching usually takes place during <b style="mso-bidi-font-weight: normal;">scheduled time periods</b>. In extreme situations, only what is learned
in those scheduled time periods will be scored. This is one basis for assessing
teacher effectiveness. It is a myth that the primary goal of traditional schools
is student learning and development.</div>
<div class="MsoNormal">
<b style="mso-bidi-font-weight: normal;"><br /></b></div>
<div class="MsoNormal">
<b style="mso-bidi-font-weight: normal;">Traditional
multiple-choice is defective</b>. It was crippled when the option of no
response, “do not know”, was eliminated when adapted from its use with animal
experiments to make classroom scoring easier. It is a myth that you should not
have this option to permit accurate, honest, and fair assessment.</div>
<div class="MsoNormal">
Traditional multiple-choice <b style="mso-bidi-font-weight: normal;">promotes selecting the best right answer</b>: using the lowest levels
of thinking. The minimum requirement is making a mark for each question. It is
a myth that such a score measures what you know or can do. The score ranks you
on the test.</div>
<div class="MsoNormal" style="margin-bottom: .0001pt; margin-bottom: 0in; mso-layout-grid-align: none; mso-pagination: none; text-autospace: none;">
<br /></div>
<div class="MsoNormal" style="margin-bottom: .0001pt; margin-bottom: 0in; mso-layout-grid-align: none; mso-pagination: none; text-autospace: none;">
The <b style="mso-bidi-font-weight: normal;">average test score</b> describes the test, not you. (<a href="http://richard-hart.blogspot.com/2013/05/visual-education-statistics-visual.html">Table
15</a> or <a href="http://www.nine-patch.com/download/VESEngine.xlsm"><span style="font-family: Georgia; font-size: 13.0pt; mso-bidi-font-family: Georgia;">Download</span></a><u style="text-underline: #362919;"><span style="color: #362919; font-family: Georgia; font-size: 13.0pt; mso-bidi-font-family: Georgia;">)<o:p></o:p></span></u></div>
<div class="MsoNormal">
Your score may rank you above or below average. It is a myth
that you will always be safe with an above average score (passing).</div>
<div class="MsoNormal" style="margin-bottom: .0001pt; margin-bottom: 0in; mso-layout-grid-align: none; mso-pagination: none; text-autospace: none;">
<br /></div>
<div class="MsoNormal">
The normal distribution of multiple-choice test scores is
based on <b style="mso-bidi-font-weight: normal;">your luck on test day</b>. The
normal distribution is desired for classes in schools designed for failure. It
is a myth that a class should not have an average score of 90%.</div>
<div class="MsoNormal" style="margin-bottom: .0001pt; margin-bottom: 0in; mso-layout-grid-align: none; mso-pagination: none; text-autospace: none;">
<br /></div>
<div class="MsoNormal" style="margin-bottom: .0001pt; margin-bottom: 0in; mso-layout-grid-align: none; mso-pagination: none; text-autospace: none;">
Luck on test day will
distribute 2/3 of your classmates’ multiple-choice scores within the bubble in
the center of a normal distribution; that is <b style="mso-bidi-font-weight: normal;">one standard deviation</b> (SD) from the average. (<a href="http://richard-hart.blogspot.com/2013/05/visual-education-statistics-visual.html">Table
15</a> or <a href="http://www.ninepatch.com/download/VESEngine.xlsm"><span style="font-family: Georgia; font-size: 13.0pt; mso-bidi-font-family: Georgia;">Download</span></a><u style="text-underline: #362919;"><span style="color: #362919; font-family: Georgia; font-size: 13.0pt; mso-bidi-font-family: Georgia;">)</span></u> [SD = SQRT(Variance)
and the Variance = SUM(Deviation from the Average^2)/N = Mean Sum of Squares =
MSS]</div>
<div class="MsoNormal" style="margin-bottom: .0001pt; margin-bottom: 0in; mso-layout-grid-align: none; mso-pagination: none; text-autospace: none;">
<br /></div>
<div class="MsoNormal">
<b style="mso-bidi-font-weight: normal;">Your grade</b> (cut
score) is set by marking off the distribution of classmate scores in standard
deviations: F (<-2 b="" c="" d="" to="">+1); A (>+2). Your raw
score grade is the sum of what you know and can do, your luck on test day, and
your set of classmates.<!---2--><!---2--><!---2--><!---2--></-2></div>
<div class="MsoNormal">
<b style="mso-bidi-font-weight: normal;"><br /></b></div>
<div class="MsoNormal">
<b style="mso-bidi-font-weight: normal;">Raw scores can be adjusted</b>
by shifting their distribution, higher or lower, and by stretching (or
shrinking) the distribution to get a distribution that “looks right”. It is a
myth that your teacher, can only select the right mix of questions, to get a
raw score distribution that “looks right”.</div>
<div class="MsoNormal">
<b style="mso-bidi-font-weight: normal;"><br /></b></div>
<div class="MsoNormal">
<b style="mso-bidi-font-weight: normal;">Some questions
perform poorly</b>. They can be deleted and a new, more accurate, scored
distribution created. It is a myth that every question must be retained.</div>
<div class="MsoNormal" style="margin-bottom: .0001pt; margin-bottom: 0in; mso-layout-grid-align: none; mso-pagination: none; text-autospace: none;">
<b style="mso-bidi-font-weight: normal;"><br /></b></div>
<div class="MsoNormal" style="margin-bottom: .0001pt; margin-bottom: 0in; mso-layout-grid-align: none; mso-pagination: none; text-autospace: none;">
<b style="mso-bidi-font-weight: normal;">Discriminating questions</b> are marked right only by high scoring
classmates and marked wrong by low scoring classmates. (<a href="http://richard-hart.blogspot.com/2013/05/visual-education-statistics-visual.html">Table
15</a> or <a href="http://www.nine-patch.com/download/VESEngine.xlsm"><span style="font-family: Georgia; font-size: 13.0pt; mso-bidi-font-family: Georgia;">Download</span></a><u style="text-underline: #362919;"><span style="color: #362919; font-family: Georgia; font-size: 13.0pt; mso-bidi-font-family: Georgia;">)</span></u> It is a myth that all
questions should be discriminating.</div>
<div class="MsoNormal" style="margin-bottom: .0001pt; margin-bottom: 0in; mso-layout-grid-align: none; mso-pagination: none; text-autospace: none;">
<br /></div>
<div class="MsoNormal">
Discriminating questions<b style="mso-bidi-font-weight: normal;">
produce your class raw score distribution</b>. About 5 to 10 are needed to
create the amount of error that yields a range of five letter grades. It is a
myth that discriminating questions assess mastery.</div>
<div class="MsoNormal" style="margin-bottom: .0001pt; margin-bottom: 0in; mso-layout-grid-align: none; mso-pagination: none; text-autospace: none;">
<br /></div>
<div class="MsoNormal" style="margin-bottom: .0001pt; margin-bottom: 0in; mso-layout-grid-align: none; mso-pagination: none; text-autospace: none;">
The reliability
(reproducibility, precision) of your raw score can be predicted, but not your
final (adjusted) score. <b style="mso-bidi-font-weight: normal;">Test reliability</b>
(KR20) is based on the ratio of variation (the variance) from between student
scores (<b style="mso-bidi-font-weight: normal;">external column</b>) and within
question difficulty mark patterns (<b style="mso-bidi-font-weight: normal;">internal
columns</b>). (<a href="http://richard-hart.blogspot.com/2013/05/visual-education-statistics-visual.html">Table
15</a> or <a href="http://www.nine-patch.com/download/VESEngine.xlsm"><span style="font-family: Georgia; font-size: 13.0pt; mso-bidi-font-family: Georgia;">Download</span></a><u style="text-underline: #362919;"><span style="color: #362919; font-family: Georgia; font-size: 13.0pt; mso-bidi-font-family: Georgia;">)<o:p></o:p></span></u></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
This makes sense: The smaller the amount of error variance
within the question difficulty internal columns, with respect to the variance between
student scores in the external column, <b style="mso-bidi-font-weight: normal;">the
greater the test reliability</b>. Discriminating, difficult, questions spread
out student scores more (yield higher variance) than they increase the error
variance within the questions. If there were no error variance, a test would be
totally reliable (KR20 = 1).<span style="mso-spacerun: yes;"> </span>It is
a myth that a good informative test must maximize reliability.</div>
<div class="MsoNormal" style="margin-bottom: .0001pt; margin-bottom: 0in; mso-layout-grid-align: none; mso-pagination: none; text-autospace: none;">
<br /></div>
<div class="MsoNormal" style="margin-bottom: .0001pt; margin-bottom: 0in; mso-layout-grid-align: none; mso-pagination: none; text-autospace: none;">
The test reliability can
help predict the average test score your class would get if it were to take
another test over the same set of skills and knowledge. The <b style="mso-bidi-font-weight: normal;">Standard Error of Measurement (SEM)</b> of
your test is the range of error (from all of the above effects) for the average
test score. (<a href="http://richard-hart.blogspot.com/2013/05/visual-education-statistics-visual.html">Table
15</a> or <a href="http://www.nine-patch.com/download/VESEngine.xlsm"><span style="font-family: Georgia; font-size: 13.0pt; mso-bidi-font-family: Georgia;">Download</span></a><u style="text-underline: #362919;"><span style="color: #362919; font-family: Georgia; font-size: 13.0pt; mso-bidi-font-family: Georgia;">) </span></u>The SD of the test and
the test reliability are combined to obtain the SEM. The test reliability
extracts a portion of the SD. If the test reliability were 1 (totally
reliable), the SEM would be 0 (no error), the class would be expected to get
the same class test score on a retest.</div>
<div class="MsoNormal" style="margin-bottom: .0001pt; margin-bottom: 0in; mso-layout-grid-align: none; mso-pagination: none; text-autospace: none;">
<br /></div>
<div class="MsoNormal">
And finally what can you expect about the precision of your
score and your retest score (providing you have not learned any more). A <b style="mso-bidi-font-weight: normal;">retest is of critical importance</b> to
students needing to reach a high stakes cut score. If the SEM or CSEM ranges
widely enough, you do not need to study. Just retake the test a couple of times
and your luck on test day may get you a passing score. It is a myth that the
probability, of you getting a passing grade 2/3 of the time, will insure you
get the passing grade if you need a second trial.</div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
The <b style="mso-bidi-font-weight: normal;">Conditional</b> [on
your raw score] <b style="mso-bidi-font-weight: normal;">Standard Error of
Measurement (CSEM) </b>extracts the variance from only your mark pattern <a href="http://richard-hart.blogspot.com/2013/10/visual-education-statistics-conditional.html">(Table 22)</a>. [CSEM = SQRT(Variance within your marks X the number of questions]
Your CSEM will be very small if you have a very high or low score. This limits the prospects of a passing score by retaking a test without studying.</div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<b style="mso-bidi-font-weight: normal;">Now to study</b>, to
change testing habits, or to trust to luck on test day, before a retest. Get a
copy of the blueprint used in designing the test. A blueprint lists in detail what
will be covered and the type of questions. Question each topic or skill. It is
easier to answer questions other people have written if you have already
created and answered your own questions. Use the advice in the first five
paragraphs above and work up into higher level of thinking, meaning making (a
web of relationships that makes sense to you and visualize, sketch, draw, every
term). </div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
A <b style="mso-bidi-font-weight: normal;">change in testing
habits</b> may also be in order. Many students who do not “test well” are
bright, fast memorizers, but lacking in meaningful relationships that make
sense to themselves. They are still learning for someone else: the test and
scanning each question for the “one right answer”. With meaningful
relationships in mind you have the information in hand to answer a number of
related questions. You are not limited to just matching what you recall to the
question answers. [Mark out wrong answers and guess from the remaining answers.]</div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
And now for <b style="mso-bidi-font-weight: normal;">the “Hail
Mary” approach</b>. First, as a rule of thumb, your score on a test written by
someone other than your teacher (a standardized test for example) will be one
to two letter grades below your classroom test scores. If your failing test score
is within 1 SEM of the cut score, you can expect a retest score within this
range 2/3 of the time. The same prediction is made with your CSEM value that
can range above and below the SEM value. If your failing test score is below 1
SEM or 1 CSEM from the cut score, you have no option other than to study. It is
a myth that students passing a few points above the cut score will also pass on
a retest. [Near passes are safe. Near failures are not.]</div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
Also please keep in mind that all of the math dealing with the
variation between and within columns and rows (the variance) can be done on the
student and question mark patterns with <b style="mso-bidi-font-weight: normal;">no
knowledge of the test questions or the students</b>. It is a myth that good
statistical procedures can improve poor question or student performance.
Teacher and psychometrician judgment on the other hand can do wonders!</div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
The <b style="mso-bidi-font-weight: normal;">standardized test
paradox</b>: A good blueprint to guide calibrated question selection for the
test is the basis for low scores and a statistically reliable test. Good student
preparation is the basis for high scores (mastery) and a statistically
unreliable test (it cannot spread student scores out enough for the
distribution to “look right”).</div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
The sciences, engineering, and manufacturing use statistics
to reduce error to a minimum (low maintenance cars, aircraft, computers, and
telephones). Only in traditional institutionalized education (schools designed
for failure) is error intentionally introduced to create a score range that
“looks right” for setting grades and ranking schools. This is all non-sense for
schools designed for mastery (who advance students after they are prepared for
the next steps). It is a myth (and an entrenched excuse for failure by the
school) that student score distributions <b style="mso-bidi-font-weight: normal;">must
fit a normal, bell-shaped, curve of error</b>.</div>
<div class="MsoNormal">
<b style="mso-bidi-font-weight: normal;"><br /></b></div>
<div class="MsoNormal">
<b style="mso-bidi-font-weight: normal;">Mastery schools</b>
are now being promoted as the burden of record keeping is easily computerized.
The Internet makes mastery schools available everywhere and at anytime. This
will have a marked change in traditional schooling in the next few years. This
change can be seen in the “flipped” classroom (a modern version of assigned
[deep] reading before class discussion). It is a myth that the “flipped”
classroom is something new.</div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
Current educational software removes <b style="mso-bidi-font-weight: normal;">the time lag</b>, in the question-answer-and-verify learning cycle,
introduced by grouping students in classes, and then extended with standardized
tests. Learning and assessment are again joined to promote mastery of assigned
skills and knowledge.<span style="mso-spacerun: yes;"> </span>Students
advance when they are ready to succeed at the next levels. It is a myth that
“formative assessments” are actually functional when test results are not
available in an operational time frame (seconds to a few days).</div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
Standardized tests will continue to rank students and
schools, as the tests mature to <b style="mso-bidi-font-weight: normal;">certifying
mastery</b> for students who learn and excel anywhere and at anytime. It is a
myth that current substantive standardized tests (that do not let students
report what they trust they know or can do) can “pin point exactly what a student
knows and needs to learn”.</div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal" style="margin-bottom: .0001pt; margin-bottom: 0in; mso-layout-grid-align: none; mso-pagination: none; text-autospace: none;">
<span style="color: #1f242e; font-family: Georgia; font-size: 13.0pt; mso-bidi-font-family: Georgia;">- - - -
- - - - - - - - - - - - - - - - - <o:p></o:p></span></div>
<div class="MsoNormal" style="margin-bottom: .0001pt; margin-bottom: 0in; mso-layout-grid-align: none; mso-pagination: none; text-autospace: none;">
<br /></div>
<div class="MsoNormal" style="margin-bottom: .0001pt; margin-bottom: 0in; mso-layout-grid-align: none; mso-pagination: none; text-autospace: none;">
<span style="color: #1f242e; font-family: Georgia; font-size: 13.0pt; mso-bidi-font-family: Georgia;">Free
software to help you and your students experience and understand how to break
out of traditional-multiple choice (TMC) and into <a href="http://www.nine-patch.com/"><span style="color: #362919;">Knowledge and
Judgment Scoring</span></a> (KJS) (tricycle to bicycle):<o:p></o:p></span></div>
<div class="MsoNormal" style="margin-bottom: .0001pt; margin-bottom: 0in; mso-layout-grid-align: none; mso-pagination: none; text-autospace: none;">
<br /></div>
<div class="MsoNormal" style="margin-bottom: .0001pt; margin-bottom: 0in; mso-layout-grid-align: none; mso-pagination: none; text-autospace: none;">
<br /></div>
<div class="MsoNormal" style="margin-bottom: .0001pt; margin-bottom: 0in; mso-layout-grid-align: none; mso-pagination: none; tab-stops: 11.0pt .5in; text-autospace: none;">
</div>
<ul>
<li><a href="http://www.nine-patch.com/complete/BreakOut.htm" style="font-family: Georgia; font-size: 13pt;"><span style="color: #0000fe; text-decoration: none; text-underline: none;">Break Out
Overview</span></a></li>
<li><a href="http://www.nine-patch.com/download/BrkOutSC.zip" style="font-family: Georgia; font-size: 13pt;"><span style="color: #0000fe; text-decoration: none; text-underline: none;">Download Break
Out</span></a></li>
<li><a href="http://www.nine-patch.com/qstart.htm" style="font-family: Georgia; font-size: 13pt;"><span style="color: #0000fe; text-decoration: none; text-underline: none;">Quick Start</span></a></li>
</ul>
<br />
<!--[if gte mso 9]><xml>
<o:OfficeDocumentSettings>
<o:AllowPNG/>
</o:OfficeDocumentSettings>
</xml><![endif]--><!--[if gte mso 9]><xml>
<w:WordDocument>
<w:Zoom>0</w:Zoom>
<w:TrackMoves>false</w:TrackMoves>
<w:TrackFormatting/>
<w:PunctuationKerning/>
<w:DrawingGridHorizontalSpacing>18 pt</w:DrawingGridHorizontalSpacing>
<w:DrawingGridVerticalSpacing>18 pt</w:DrawingGridVerticalSpacing>
<w:DisplayHorizontalDrawingGridEvery>0</w:DisplayHorizontalDrawingGridEvery>
<w:DisplayVerticalDrawingGridEvery>0</w:DisplayVerticalDrawingGridEvery>
<w:ValidateAgainstSchemas/>
<w:SaveIfXMLInvalid>false</w:SaveIfXMLInvalid>
<w:IgnoreMixedContent>false</w:IgnoreMixedContent>
<w:AlwaysShowPlaceholderText>false</w:AlwaysShowPlaceholderText>
<w:Compatibility>
<w:BreakWrappedTables/>
<w:DontGrowAutofit/>
<w:DontAutofitConstrainedTables/>
<w:DontVertAlignInTxbx/>
</w:Compatibility>
</w:WordDocument>
</xml><![endif]--><!--[if gte mso 9]><xml>
<w:LatentStyles DefLockedState="false" LatentStyleCount="276">
</w:LatentStyles>
</xml><![endif]-->
<!--[if gte mso 10]>
<style>
/* Style Definitions */
table.MsoNormalTable
{mso-style-name:"Table Normal";
mso-tstyle-rowband-size:0;
mso-tstyle-colband-size:0;
mso-style-noshow:yes;
mso-style-parent:"";
mso-padding-alt:0in 5.4pt 0in 5.4pt;
mso-para-margin-top:0in;
mso-para-margin-right:0in;
mso-para-margin-bottom:10.0pt;
mso-para-margin-left:0in;
mso-pagination:widow-orphan;
font-size:10.0pt;
font-family:"Times New Roman";
mso-ascii-font-family:Cambria;
mso-ascii-theme-font:minor-latin;
mso-hansi-font-family:Cambria;
mso-hansi-theme-font:minor-latin;}
</style>
<![endif]-->
<!--StartFragment-->
<!--EndFragment-->Richard Harthttp://www.blogger.com/profile/04962997526156185761noreply@blogger.com1tag:blogger.com,1999:blog-6676724996771468267.post-44518452791917584832013-11-06T04:00:00.000-08:002013-11-06T04:00:10.611-08:00The Value and Meaning of a Mark<!--[if gte mso 9]><xml>
<o:OfficeDocumentSettings>
<o:AllowPNG/>
</o:OfficeDocumentSettings>
</xml><![endif]--><!--[if gte mso 9]><xml>
<w:WordDocument>
<w:Zoom>0</w:Zoom>
<w:TrackMoves>false</w:TrackMoves>
<w:TrackFormatting/>
<w:PunctuationKerning/>
<w:DrawingGridHorizontalSpacing>18 pt</w:DrawingGridHorizontalSpacing>
<w:DrawingGridVerticalSpacing>18 pt</w:DrawingGridVerticalSpacing>
<w:DisplayHorizontalDrawingGridEvery>0</w:DisplayHorizontalDrawingGridEvery>
<w:DisplayVerticalDrawingGridEvery>0</w:DisplayVerticalDrawingGridEvery>
<w:ValidateAgainstSchemas/>
<w:SaveIfXMLInvalid>false</w:SaveIfXMLInvalid>
<w:IgnoreMixedContent>false</w:IgnoreMixedContent>
<w:AlwaysShowPlaceholderText>false</w:AlwaysShowPlaceholderText>
<w:Compatibility>
<w:BreakWrappedTables/>
<w:DontGrowAutofit/>
<w:DontAutofitConstrainedTables/>
<w:DontVertAlignInTxbx/>
</w:Compatibility>
</w:WordDocument>
</xml><![endif]--><!--[if gte mso 9]><xml>
<w:LatentStyles DefLockedState="false" LatentStyleCount="276">
</w:LatentStyles>
</xml><![endif]-->
<!--[if gte mso 10]>
<style>
/* Style Definitions */
table.MsoNormalTable
{mso-style-name:"Table Normal";
mso-tstyle-rowband-size:0;
mso-tstyle-colband-size:0;
mso-style-noshow:yes;
mso-style-parent:"";
mso-padding-alt:0in 5.4pt 0in 5.4pt;
mso-para-margin-top:0in;
mso-para-margin-right:0in;
mso-para-margin-bottom:10.0pt;
mso-para-margin-left:0in;
mso-pagination:widow-orphan;
font-size:10.0pt;
font-family:"Times New Roman";
mso-ascii-font-family:Cambria;
mso-ascii-theme-font:minor-latin;
mso-hansi-font-family:Cambria;
mso-hansi-theme-font:minor-latin;}
</style>
<![endif]-->
<!--StartFragment-->
<br />
<div class="MsoNormal">
The <b><span style="font-family: "Arial Bold";">bet</span></b> in the title of Catherine
Gewertz’s article caught my attention: “<a href="http://www.edweek.org/ew/articles/2013/09/11/03common_ep.h33.html">One District’s Common-Core Bet</a>: Results
Are In”.
As I read, I realized that the betting that takes place in traditional
multiple-choice (TMC) was being given arbitrary valuations to justify the
difference between a test score and a classroom observation. If the two agreed,
that was good. If they did not agree, the standardized test score was
dismissed.</div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
TMC gives us the choice of a right mark and several wrong
marks. Each is traditionally given a value of 1 or 0. This simplification,
carried forward from paper and pencil days, hides the true value and the
meanings that can be assigned to each mark.</div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
The <b><span style="font-family: "Arial Bold";">value and meaning of each mark changes</span></b>
with the degree of completion of the test and the ability of the student.
Consider a test with one right answer and three wrong answers. This is now a
popular number for standardized tests.</div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
Consider a TMC test of 100 questions. The starting score is
25, on average. Every student knows this. Just mark an answer to each question.
Look at the test and change a few marks, that you can trust you know, to right.
With good luck on test day, get a score high enough to pass the test.</div>
<div class="MsoNormal">
If a student marked 60 correctly, the final score is 60. But
the quality of this passing score is also 60%. </div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
Part of that 60% represents what
a student knows and can do, and part is luck on test day. A passing score can
be obtained by a student who knows or can do less than half of what the test is
assessing; a quality below 50%. This is traditionally acceptable in the
classroom. [TMC ignores quality. A right mark on a test with a score of 100 has
the same value, but not the same meaning as a right mark on a test with a score
of 50.]</div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
A wrong mark can also be assigned different meanings. As a
rule of thumb (based on the analysis of variance, ANOVA; a time honored method
of data reduction), if fewer than five students mark a wrong answer to a
question, the marks on the question can be ignored. If fewer that five students
make the same wrong mark, the marks on that option can be ignored. This is why
Power Up Plus (PUP) does not report statistics on wrong marks, but only on
right marks. There is no need to clutter up the reports with potentially
interesting, but useless and meaningless information.</div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
PUP does include a fitness statistics not found in any other
item analysis report that I have examined. This statistic shows how well the
test fits student preparation. Students prepare for tests; but test makers also
prepare for the abilities of test takers.</div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
The fitness statistic estimates the score a student is
expected to get if, on average, as many wrong options are eliminated as are
non-functional on the test, before guessing; with NO KNOWLEDGE of the right
answer. This is the best guess score. It is always higher than the design score
of 25. The estimate ranged from 36% to 53%, with a mean of 44%, on the
Nursing124 data. Half of these
students were self-correcting scholars. The test was then a checklist of how
they were expected to perform.</div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
With the above in mind, we can understand how a single wrong
mark can be devastating to a test score. But a single wrong mark, not shared by
the rest of the class can be taken seriously or ignored (just as a right mark,
on a difficult question, by a low scoring student).</div>
<div class="MsoNormal" style="margin-bottom: .0001pt; margin-bottom: 0in; mso-layout-grid-align: none; mso-pagination: none; tab-stops: 11.0pt .5in; text-autospace: none;">
<br /></div>
<div class="MsoNormal" style="margin-bottom: .0001pt; margin-bottom: 0in; mso-layout-grid-align: none; mso-pagination: none; tab-stops: 11.0pt .5in; text-autospace: none;">
To
make sense of TMC test results requires both a <b><span style="font-family: "Arial Bold";">matrix of student marks</span></b>
and a <b><span style="font-family: "Arial Bold";">distribution
of marks for each question</span></b> (<a href="http://www.nine-patch.com/complete/BreakOut.htm"><span style="color: #0000f5; mso-bidi-font-family: Times;">Break Out Overview</span></a>). Evaluating only an
individual student report gives you no idea whither a student missed a survey
question that every student was expected to answer correctly or a question that
the class failed to understand. </div>
<div class="MsoNormal" style="margin-bottom: .0001pt; margin-bottom: 0in; mso-layout-grid-align: none; mso-pagination: none; tab-stops: 11.0pt .5in; text-autospace: none;">
<br /></div>
<div class="MsoNormal" style="margin-bottom: .0001pt; margin-bottom: 0in; mso-layout-grid-align: none; mso-pagination: none; tab-stops: 11.0pt .5in; text-autospace: none;">
Are
we dealing with a misconception? Or a lack of performance related to different
levels of thinking in class and on the test; or related to the limits of rote
memory to match an answer option to a question? [“It’s the test-taking.”] When does
a right mark also mean a right answer or just luck on test day? [“This guy
scored advanced only because he had a lucky day.”]</div>
<div class="MsoNormal" style="margin-bottom: .0001pt; margin-bottom: 0in; mso-layout-grid-align: none; mso-pagination: none; tab-stops: 11.0pt .5in; text-autospace: none;">
<br /></div>
<div class="MsoNormal">
Mikel Robinson, as an <b><span style="font-family: "Arial Bold";">individual</span></b>, failed the test by 1
point. Mikel Robinson, as <b><span style="font-family: "Arial Bold";">one student in a group</span></b> of students,
may not have failed. [We don’t really know.] His score just fell on the low
side of a statistical range (the conditional standard error of measurement; see
a previous post on CSEM). Within this range, it is not possible to
differentiate one student’s performance from another’s using current
statistical methods and a TMC test design (students are not asked if they can
use the question to report what they can trust they actually know or can do). </div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
We can say, that if he retook the test, the probability of
passing may be as high as 50%, or more, depending upon the reliability and
other characteristics of the test. [And the probability of those who passed by
1 point, of then failing by one point on a repeat of the test, would be the
same.]</div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
These problems are minimized with accurate, honest, and fair
Knowledge and Judgment Scoring (KJS). You can know when a right mark is a right
answer using <a href="http://www.nine-patch.com/">KJS</a> or the <a href="http://www.winsteps.com/winsteps.htm">partial credit Rasch model </a>IRT scoring. You can know
the extent of a student’s development: the quality score. And, perhaps more
important, is that your students can trust what they know and can do too;
during the test, as well as after the test. This is the foundation on which to
build further long lasting learning. This is student empowerment.</div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<b><span style="font-family: "Arial Bold";">Welcome to the KJS Group</span></b>: Please <b><span style="font-family: "Arial Bold";">register</span></b>
at <a href="mailto:KJSgroup@nine-patch.com">mailto:KJSgroup@nine-patch.com</a>.
Include something about yourself and your interest in student empowerment (your
name, school, classroom environment, LinkedIn, Facebook, email, phone, and
etc.). </div>
<div class="MsoNormal">
<b><span style="font-family: "Arial Bold";"><br /></span></b></div>
<div class="MsoNormal">
<b><span style="font-family: "Arial Bold";">Free anonymous download</span></b>, Power Up
Plus (PUP), version 5.22 containing both TMC and KJS: <a href="http://www.nine-patch.com/download/PUP522xlsm.zip" title="For Newer Windows Machines">PUP522xlsm.zip</a>, 606 KB or <a href="http://www.nine-patch.com/download/PUP522xls.zip" title="For All (older) Windows Machines">PUP522xls.zip</a>, 1,099 KB.</div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal" style="margin-bottom: .0001pt; margin-bottom: 0in; mso-layout-grid-align: none; mso-pagination: none; text-autospace: none;">
- - - - - - - - - - - - - - - - - - - - - <o:p></o:p></div>
<div class="MsoNormal" style="margin-bottom: .0001pt; margin-bottom: 0in; mso-layout-grid-align: none; mso-pagination: none; text-autospace: none;">
<br /></div>
<div class="MsoNormal" style="margin-bottom: .0001pt; margin-bottom: 0in; mso-layout-grid-align: none; mso-pagination: none; text-autospace: none;">
Other free software to help you and your students experience and
understand how to break out of traditional-multiple choice (TMC) and into <a href="http://www.nine-patch.com/">Knowledge
and Judgment Scoring</a> (KJS) (tricycle
to bicycle):<o:p></o:p></div>
<div class="MsoNormal" style="margin-bottom: .0001pt; margin-bottom: 0in; mso-layout-grid-align: none; mso-pagination: none; text-autospace: none;">
<br /></div>
<div class="MsoNormal" style="margin-bottom: .0001pt; margin-bottom: 0in; mso-layout-grid-align: none; mso-pagination: none; tab-stops: 11.0pt .5in; text-autospace: none;">
</div>
<ul>
<li><a href="http://www.nine-patch.com/complete/BreakOut.htm"><span style="color: #0000f5; mso-bidi-font-family: Times;">Break Out Overview</span></a></li>
<li><a href="http://www.nine-patch.com/download/BrkOutSC.zip"><span style="color: #0000f5; mso-bidi-font-family: Times;">Download Break Out</span></a></li>
<li><span style="color: #0000f5; mso-bidi-font-family: Times;"><a href="http://www.nine-patch.com/qstart.htm">Quick Start</a></span></li>
</ul>
<o:p></o:p>FOR SALE: <a href="http://raschmodelaudit.blogspot.com/2013/10/knowledge-and-judgment-scoring-kjs-for.html">raschmodelaudit.blogspot.com/2013/10/knowledge-and-judgment-scoring-kjs-for.html</a><br />
<div class="MsoNormal" style="margin-bottom: .0001pt; margin-bottom: 0in; mso-layout-grid-align: none; mso-pagination: none; tab-stops: 11.0pt .5in; text-autospace: none;">
<o:p></o:p></div>
<!--EndFragment-->Richard Harthttp://www.blogger.com/profile/04962997526156185761noreply@blogger.com0tag:blogger.com,1999:blog-6676724996771468267.post-54734365742641655132013-10-30T04:00:00.000-07:002013-11-01T08:38:24.347-07:00Growth Mindset<!--[if gte mso 9]><xml>
<o:OfficeDocumentSettings>
<o:AllowPNG/>
</o:OfficeDocumentSettings>
</xml><![endif]--><!--[if gte mso 9]><xml>
<w:WordDocument>
<w:Zoom>0</w:Zoom>
<w:TrackMoves>false</w:TrackMoves>
<w:TrackFormatting/>
<w:PunctuationKerning/>
<w:DrawingGridHorizontalSpacing>18 pt</w:DrawingGridHorizontalSpacing>
<w:DrawingGridVerticalSpacing>18 pt</w:DrawingGridVerticalSpacing>
<w:DisplayHorizontalDrawingGridEvery>0</w:DisplayHorizontalDrawingGridEvery>
<w:DisplayVerticalDrawingGridEvery>0</w:DisplayVerticalDrawingGridEvery>
<w:ValidateAgainstSchemas/>
<w:SaveIfXMLInvalid>false</w:SaveIfXMLInvalid>
<w:IgnoreMixedContent>false</w:IgnoreMixedContent>
<w:AlwaysShowPlaceholderText>false</w:AlwaysShowPlaceholderText>
<w:Compatibility>
<w:BreakWrappedTables/>
<w:DontGrowAutofit/>
<w:DontAutofitConstrainedTables/>
<w:DontVertAlignInTxbx/>
</w:Compatibility>
</w:WordDocument>
</xml><![endif]--><!--[if gte mso 9]><xml>
<w:LatentStyles DefLockedState="false" LatentStyleCount="276">
</w:LatentStyles>
</xml><![endif]-->
<!--[if gte mso 10]>
<style>
/* Style Definitions */
table.MsoNormalTable
{mso-style-name:"Table Normal";
mso-tstyle-rowband-size:0;
mso-tstyle-colband-size:0;
mso-style-noshow:yes;
mso-style-parent:"";
mso-padding-alt:0in 5.4pt 0in 5.4pt;
mso-para-margin-top:0in;
mso-para-margin-right:0in;
mso-para-margin-bottom:10.0pt;
mso-para-margin-left:0in;
mso-pagination:widow-orphan;
font-size:10.0pt;
font-family:"Times New Roman";
mso-ascii-font-family:Cambria;
mso-ascii-theme-font:minor-latin;
mso-hansi-font-family:Cambria;
mso-hansi-theme-font:minor-latin;}
</style>
<![endif]-->
<!--StartFragment-->
<br />
<div class="MsoNormal">
<span style="font-size: 11pt;">The
article by Sarah D. Sparks, </span><a href="http://www.edweek.org/ew/articles/2013/09/11/03mindset_ep.h33.html?r=545317799"><span style="font-size: 11.0pt; mso-bidi-font-size: 12.0pt;">http://www.edweek.org/ew/articles/2013/09/11/03mindset_ep.h33.html?r=545317799</span></a><span style="font-size: 11pt;">, starts with a powerful
concept: “It’s one thing to say all students can learn, but making them believe
it – and do it – can require a 180-degree shift in student’s and teacher’s
sense of themselves and of one another.”</span></div>
<div class="MsoNormal">
<span style="font-size: 11.0pt; mso-bidi-font-size: 12.0pt;"><br /></span></div>
<div class="MsoNormal">
<span style="font-size: 11.0pt; mso-bidi-font-size: 12.0pt;">The
General Studies Remedial Biology course I taught faced this challenge. The
course was scheduled at night for three consecutive hours in a 120-seat lecture
room. I refused to teach the course until the following arrangements were made:</span></div>
<div class="MsoNormal">
</div>
<ul>
<li><span style="font-size: 11pt; text-indent: -0.25in;">The entire text was
presented by cable online reading assignments in each dormitory room and by
off-campus phone service.</span></li>
<li><span style="font-size: 11pt; text-indent: -0.25in;">One hour was scheduled for
my lecture, after any student presentations related to the scheduled topic.</span><span style="font-size: 11pt; text-indent: -0.25in;"><span style="font-family: 'Times New Roman'; font-size: 7pt;"> </span></span></li>
<li><span style="font-size: 11pt; text-indent: -0.25in;">One hour was scheduled for
written assessment every other week.</span></li>
<li><span style="font-size: 11pt; text-indent: -0.25in;">One hour was scheduled for
10-minute student oral reports based on library research, actual research, or
projects.</span></li>
</ul>
<br />
<div class="MsoNormal">
<span style="font-size: 11.0pt; mso-bidi-font-size: 12.0pt;">Students
requested the assessment period be placed in the first hour instead of the
second hour, after the first few semesters. This turned the course into a
seminar for which students needed to prepare on their own before class.<o:p></o:p></span></div>
<div class="MsoNormal">
<span style="font-size: 11.0pt; mso-bidi-font-size: 12.0pt;"><br /></span></div>
<div class="MsoNormal">
<span style="font-size: 11.0pt; mso-bidi-font-size: 12.0pt;">Only
Knowledge and Judgment Scoring (KJS) was used the first few semesters, with
ready acceptance by the class. The policy of bussing in students from out of
the Northwest Missouri region brought in protestors, “Why do we have to know
what we know, when everywhere else on campus, we just mark, and the teacher
tells us how many right marks we made?”<o:p></o:p></span></div>
<div class="MsoNormal">
<span style="font-size: 11.0pt; mso-bidi-font-size: 12.0pt;"><br /></span></div>
<div class="MsoNormal">
<span style="font-size: 11.0pt; mso-bidi-font-size: 12.0pt;">Offering
both methods of scoring, traditional multiple-choice (TMC) and KJS, on the same
test solved that problem. Students could select the method they felt most
comfortable with; that matched their preparation the best.<o:p></o:p></span></div>
<div class="MsoNormal">
<span style="font-size: 11.0pt; mso-bidi-font-size: 12.0pt;"><br /></span></div>
<div class="MsoNormal">
<span style="font-size: 11.0pt; mso-bidi-font-size: 12.0pt;">The
student presentations and reports were excellent models for the rest of the
class. They showed the interest in the subject and the quality of work these
students were doing to the entire class. <o:p></o:p></span></div>
<div class="MsoNormal">
<span style="font-size: 11.0pt; mso-bidi-font-size: 12.0pt;">KJS
provided the information needed to guide passive pupils alone the path to
becoming self-correcting scholars. As a generality, that path took the shape of
a backward J. First they made fewer wrong marks, next they studied more, and
finally they switched from memorizing non-sense to making sense of each
assignment. <o:p></o:p></span></div>
<div class="MsoNormal">
<span style="font-size: 11.0pt; mso-bidi-font-size: 12.0pt;"><br /></span></div>
<div class="MsoNormal">
<span style="font-size: 11.0pt; mso-bidi-font-size: 12.0pt;">Over
time they learned they were now spending less time studying (reviewing
everything) and getting better grades by making sense as they learned; they
could actually build new learning on what they could trust they had learned.
They could monitor their progress by checking their quality score and their
quantity score. Get quality up, interest and motivation increase, and quantity
follows.<o:p></o:p></span></div>
<div class="MsoNormal">
<span style="font-size: 11.0pt; mso-bidi-font-size: 12.0pt;"><br /></span></div>
<div class="MsoNormal">
<span style="font-size: 11.0pt; mso-bidi-font-size: 12.0pt;">The
tradition of students comparing their score with that of the rest of the class
to see if they were safe, or needed to study more, or had a higher grade than expected when enrolling in the course (and could take a
vacation), was strong in the fall semester with the distraction of social
groups, football and homecoming. The results of fall and spring semesters were
always different.<o:p></o:p></span></div>
<div class="MsoNormal">
<span style="font-size: 11.0pt; mso-bidi-font-size: 12.0pt;"><br /></span></div>
<div class="MsoNormal">
<span style="font-size: 11.0pt; mso-bidi-font-size: 12.0pt;">There
was one dismal failure. With the excellent monitoring of their progress in the
course, the idea was advanced to recognize class scholars. These students, had
in one combination or another of test scores and presentations, earned a class
score that could not be changed by any further assessment. They had
demonstrated their ability to make sense of biological literature (the main
goal of the course, which, hopefully, would serve them well the rest of their
lives, as well as, the habit of making sense of assignments in their other
courses). The next semester all went as planned. Most continued in the class
and some conducted study sessions for other students. <o:p></o:p></span></div>
<div class="MsoNormal">
<span style="font-size: 11.0pt; mso-bidi-font-size: 12.0pt;"><br /></span></div>
<div class="MsoNormal">
<span style="font-size: 11.0pt; mso-bidi-font-size: 12.0pt;">The
following semester witnessed an outbreak of cheating. Today, Power Up Plus
(PUP) gets its name by the original cheat checker added to Power UP. Cheating
became manageable by the simple rule that any answer sheet that failed to pass
the cheat checker would receive a score of zero. I offered to help any student
who wished to protest the rule to the student disciplinary committee. No student
ever protested. <o:p></o:p></span></div>
<div class="MsoNormal">
<span style="font-size: 11.0pt; mso-bidi-font-size: 12.0pt;"><br /></span></div>
<div class="MsoNormal">
<span style="font-size: 11.0pt; mso-bidi-font-size: 12.0pt;">[Cheating
was handled in-class as any use of the university rules was not honored by the
administration. You must catch individual students in the act. Computer cheat
checkers had the same status as red light cameras do now. If more than one
student is caught, the problem is with the instructor, not with the student. We
cancelled the class scholar idea.]<o:p></o:p></span></div>
<div class="MsoNormal">
<span style="font-size: 11.0pt; mso-bidi-font-size: 12.0pt;"><br /></span></div>
<div class="MsoNormal">
<span style="font-size: 11.0pt; mso-bidi-font-size: 12.0pt;">We
need effective tools to manage student “growth mindset”. The tools must be easy
to use by students and faculty. Students need to see how other students
succeed, to be comfortable in taking part, and be able to easily follow their
progress when starting at the low end of academic preparation of knowledge,
skills, and judgment (quality, the use of all levels of thinking).<o:p></o:p></span></div>
<div class="MsoNormal">
<span style="font-size: 11.0pt; mso-bidi-font-size: 12.0pt;"><br /></span></div>
<div class="MsoNormal">
<span style="font-size: 11.0pt; mso-bidi-font-size: 12.0pt;">A
common thread runs through </span><b><span style="font-family: "Arial Bold"; font-size: 11.0pt; mso-bidi-font-size: 12.0pt;">successful
student empowerment programs</span></b><span style="font-size: 11.0pt; mso-bidi-font-size: 12.0pt;">: Effective instruction is based on what students
actual know, can do, and want to do or to take part in. This requires frequent
appropriate assessment at each academic level such as, in general, these recent
examples:<o:p></o:p></span></div>
<div class="MsoNormal">
<span style="font-size: 11.0pt; mso-bidi-font-size: 12.0pt;"><br /></span></div>
<div class="MsoNormal">
</div>
<ul>
<li><span style="font-size: 11pt;">Elementary
School</span><span style="font-size: 11pt;"> </span><a href="http://smartblogs.com/education/2013/09/25/closing-the-achievement-gap-in-a-high-poverty-school/" style="font-size: 11pt;">http://smartblogs.com/education/2013/09/25/closing-the-achievement-gap-in-a-high-poverty-school/</a></li>
<li><span style="font-size: 11pt;">Middle
School </span><span style="font-size: 11pt;"><a href="http://www.edweek.org/ew/articles/2013/09/11/03common_ep.h33.html">http://www.edweek.org/ew/articles/2013/09/11/03common_ep.h33.html</a></span></li>
<li><span style="font-size: 11pt;">High
School </span><span style="font-size: 11pt;"><a href="http://www.edweek.org/ew/articles/2013/09/11/03mindset_ep.h33.html?r=545317799">http://www.edweek.org/ew/articles/2013/09/11/03mindset_ep.h33.html?r=545317799</a></span></li>
<li><span style="font-size: 11pt;">College
and wherever multiple-choice is used for accurate, honest, and fair assessments
</span><span style="font-size: 11pt;"><a href="http://www.nine-patch.com/">http://www.nine-patch.com</a></span></li>
</ul>
<br />
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<b><span style="font-family: 'Arial Bold';">Welcome to the KJS Group</span></b>: Please <b><span style="font-family: "Arial Bold";">register</span></b>
at <a href="mailto:KJSgroup@nine-patch.com">mailto:KJSgroup@nine-patch.com</a>.
Include something about yourself and your interest in student empowerment (your
name, school, classroom environment, LinkedIn, Facebook, email, phone, and
etc.). </div>
<div class="MsoNormal">
<b><span style="font-family: "Arial Bold";"><br /></span></b></div>
<div class="MsoNormal">
<b><span style="font-family: "Arial Bold";">Free anonymous download</span></b>, Power Up
Plus (PUP), version 5.22 containing both TMC and KJS: <a href="http://www.nine-patch.com/download/PUP522xlsm.zip" title="For Newer Windows Machines">PUP522xlsm.zip</a>, 606 KB or <a href="http://www.nine-patch.com/download/PUP522xls.zip" title="For All (older) Windows Machines">PUP522xls.zip</a>, 1,099 KB.</div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal" style="margin-bottom: .0001pt; margin-bottom: 0in; mso-layout-grid-align: none; mso-pagination: none; text-autospace: none;">
- - - - - - - - - - - - - - - - - - - - - <o:p></o:p></div>
<div class="MsoNormal" style="margin-bottom: .0001pt; margin-bottom: 0in; mso-layout-grid-align: none; mso-pagination: none; text-autospace: none;">
<br /></div>
<div class="MsoNormal" style="margin-bottom: .0001pt; margin-bottom: 0in; mso-layout-grid-align: none; mso-pagination: none; text-autospace: none;">
Other free software to help you and your students experience and
understand how to break out of traditional-multiple choice (TMC) and into <a href="http://www.nine-patch.com/">Knowledge
and Judgment Scoring</a> (KJS) (tricycle
to bicycle):<o:p></o:p></div>
<div class="MsoNormal" style="margin-bottom: .0001pt; margin-bottom: 0in; mso-layout-grid-align: none; mso-pagination: none; text-autospace: none;">
<br /></div>
<div class="MsoNormal" style="margin-bottom: .0001pt; margin-bottom: 0in; mso-layout-grid-align: none; mso-pagination: none; tab-stops: 11.0pt .5in; text-autospace: none;">
</div>
<ul>
<li><a href="http://www.nine-patch.com/complete/BreakOut.htm"><span style="color: #0000f5; mso-bidi-font-family: Times;">Break Out Overview</span></a></li>
<li><a href="http://www.nine-patch.com/download/BrkOutSC.zip"><span style="color: #0000f5; mso-bidi-font-family: Times;">Download Break Out</span></a></li>
<li><span style="color: #0000f5; mso-bidi-font-family: Times;"><a href="http://www.nine-patch.com/qstart.htm">Quick Start</a></span></li>
</ul>
<o:p></o:p>FOR SALE: <a href="http://raschmodelaudit.blogspot.com/2013/10/knowledge-and-judgment-scoring-kjs-for.html">raschmodelaudit.blogspot.com/2013/10/knowledge-and-judgment-scoring-kjs-for.html </a><br />
<div class="MsoNormal" style="margin-bottom: .0001pt; margin-bottom: 0in; mso-layout-grid-align: none; mso-pagination: none; tab-stops: 11.0pt .5in; text-autospace: none;">
<o:p></o:p></div>
<!--EndFragment-->Richard Harthttp://www.blogger.com/profile/04962997526156185761noreply@blogger.com0tag:blogger.com,1999:blog-6676724996771468267.post-55523302611249592932013-10-23T04:00:00.000-07:002013-10-23T04:00:02.165-07:00Alternative Multiple-Choice Origins<!--[if gte mso 9]><xml>
<o:OfficeDocumentSettings>
<o:AllowPNG/>
</o:OfficeDocumentSettings>
</xml><![endif]--><!--[if gte mso 9]><xml>
<w:WordDocument>
<w:Zoom>0</w:Zoom>
<w:TrackMoves>false</w:TrackMoves>
<w:TrackFormatting/>
<w:PunctuationKerning/>
<w:DrawingGridHorizontalSpacing>18 pt</w:DrawingGridHorizontalSpacing>
<w:DrawingGridVerticalSpacing>18 pt</w:DrawingGridVerticalSpacing>
<w:DisplayHorizontalDrawingGridEvery>0</w:DisplayHorizontalDrawingGridEvery>
<w:DisplayVerticalDrawingGridEvery>0</w:DisplayVerticalDrawingGridEvery>
<w:ValidateAgainstSchemas/>
<w:SaveIfXMLInvalid>false</w:SaveIfXMLInvalid>
<w:IgnoreMixedContent>false</w:IgnoreMixedContent>
<w:AlwaysShowPlaceholderText>false</w:AlwaysShowPlaceholderText>
<w:Compatibility>
<w:BreakWrappedTables/>
<w:DontGrowAutofit/>
<w:DontAutofitConstrainedTables/>
<w:DontVertAlignInTxbx/>
</w:Compatibility>
</w:WordDocument>
</xml><![endif]--><!--[if gte mso 9]><xml>
<w:LatentStyles DefLockedState="false" LatentStyleCount="276">
</w:LatentStyles>
</xml><![endif]-->
<!--[if gte mso 10]>
<style>
/* Style Definitions */
table.MsoNormalTable
{mso-style-name:"Table Normal";
mso-tstyle-rowband-size:0;
mso-tstyle-colband-size:0;
mso-style-noshow:yes;
mso-style-parent:"";
mso-padding-alt:0in 5.4pt 0in 5.4pt;
mso-para-margin-top:0in;
mso-para-margin-right:0in;
mso-para-margin-bottom:10.0pt;
mso-para-margin-left:0in;
mso-pagination:widow-orphan;
font-size:12.0pt;
font-family:"Times New Roman";
mso-ascii-font-family:Cambria;
mso-ascii-theme-font:minor-latin;
mso-fareast-font-family:"Times New Roman";
mso-fareast-theme-font:minor-fareast;
mso-hansi-font-family:Cambria;
mso-hansi-theme-font:minor-latin;}
</style>
<![endif]-->
<!--StartFragment-->
<br />
<div class="MsoNormal">
Two alternative forms of multiple-choice (AMC) to the
traditional multiple-choice (TMC) developed from independent sources. Geoff Masters from Melbourne,
Australia, is credited as the developer of the parcel credit Rasch model (PMC),
a form of Information Response Theory (IRT) analysis in 1982 (<a href="http://www.rasch.org/bond.htm">Bond and Fox</a>). It allows students to report what they
know (2 points), what they do not know (1 point), and wrong answer (0 points).
It never became popular on classroom or standardized tests.</div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
The second form of AMC was developed at NWMSU. It started as
net yield scoring (NYS) on both essay and multiple-choice. I needed a way to
reduce the amount of reading required in scoring “blue book” essays. A 20-point
essay started with 10 points. A point was added for acceptable, related,
information bits. A point was subtracted for unacceptable, incorrect, unrelated
information bits. An information bit was basically a short sentence with
correct grammar and spelling. It could also be a relationship expressed as a
diagram, sketch, or drawing. </div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
This reduced the amount of reading by more than a 1/3 and
improved student performance. Snow, filler, and fluff had no value but
distracted a student from doing good work. Students needed to exercise good
judgment in selecting what they wrote. This was no longer the case of their
writing, and the teacher searching, for something that could earn them
sufficient credit to pass the course; a lower level of thinking operation that
is very common in high schools and colleges. NYS required students to use good
judgment as well as be knowledgeable and be skilled. </div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
This same idea was applied to computer scored
multiple-choice tests with interesting results. When both TMC and NYS were
offered on the same test, most students selected TMC on their first test. This
is what they were familiar with. Over 90% of students elected NYS on their
third test. Students also agreed that knowledge and judgment should have equal
value.</div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
By 1981 NYS was renamed knowledge and judgment scoring (KJS)
to reflect what was being assessed: good judgment and a right answer (2
points), good judgment to report what has yet to be learned with no mark (1
point), and poor judgment, a wrong mark (0 points). </div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
KJS requires and rewards students for using higher levels of
thinking. The quality score is independent from the right count score. A
struggling student with a test score of 60% may have also earned a quality
score of 90%.</div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
With TMC there is no way of knowing what a student with a
score of 60% actually knows (when a right mark is a right answer or just luck
on test day). With KJS we can know what this student knows with the same degree
of accuracy as a student earning a 90% score on a TMC test.</div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
More importantly, this reinforces the student’s sense of
self-judgment and encourages effort to do better. It is the equivalent to the
note a teacher marks on a special paragraph in an essay, “Good work!”</div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
KJS provides the information needed to tell student and
teacher what has been learned and what has yet to be learned in an easy to use
report. Often a trail of bi-weekly test scores would follow a backward J.
Reducing guessing by itself did not increase the test score but moved the score
to a higher quality. Low quality students needed to change study habits. Low
scoring high quality students needed to study more.</div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
Learning by questioning and establishing relationships
provided students the basis for answering question correctly that they had
never seen before. They then stumbled onto what I meant by, “Make things
meaningful (full of relationships) if your learning is to be really useful,
empowering and easy to remember”. They did not have to review everything for each cumulative test.</div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
The most interesting finding was that when students mastered
meaning-making, they found themselves doing better in all of their courses.
This is what inspired me to continue to promote Knowledge and Judgment Scoring.
Students learn best when they are in charge. The quality score was the “<a href="http://www.news-leader.com/article/20130913/COLUMNISTS31/309130008/David-Hough">feel good</a>” score for struggling students until their improving development produced the
high scores earned by successful self-correcting students.</div>
<div class="MsoNormal">
<b><span style="font-family: "Arial Bold";"><br /></span></b></div>
<div class="MsoNormal">
<b><span style="font-family: "Arial Bold";">Welcome to the KJS Group</span></b>: Please <b><span style="font-family: "Arial Bold";">register</span></b>
at <a href="mailto:KJSgroup@nine-patch.com">mailto:KJSgroup@nine-patch.com</a>.
Include something about yourself and your interest in student empowerment (your
name, school, classroom environment, LinkedIn, Facebook, email, phone, and
etc.). </div>
<div class="MsoNormal">
<b><span style="font-family: "Arial Bold";"><br /></span></b></div>
<div class="MsoNormal">
<b><span style="font-family: "Arial Bold";">Free anonymous download</span></b>, Power Up
Plus (PUP), version 5.22 containing both TMC and KJS: <a href="http://www.nine-patch.com/download/PUP522xlsm.zip" title="For Newer Windows Machines">PUP522xlsm.zip</a>, 606 KB or <a href="http://www.nine-patch.com/download/PUP522xls.zip" title="For All (older) Windows Machines">PUP522xls.zip</a>, 1,099 KB.</div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal" style="margin-bottom: .0001pt; margin-bottom: 0in; mso-layout-grid-align: none; mso-pagination: none; text-autospace: none;">
- - - - - - - - - - - - - - - - - - - - - <o:p></o:p></div>
<div class="MsoNormal" style="margin-bottom: .0001pt; margin-bottom: 0in; mso-layout-grid-align: none; mso-pagination: none; text-autospace: none;">
<br /></div>
<div class="MsoNormal" style="margin-bottom: .0001pt; margin-bottom: 0in; mso-layout-grid-align: none; mso-pagination: none; text-autospace: none;">
Other free software to help you and your students experience and
understand how to break out of traditional-multiple choice (TMC) and into <a href="http://www.nine-patch.com/">Knowledge
and Judgment Scoring</a> (KJS) (tricycle
to bicycle):<o:p></o:p></div>
<div class="MsoNormal" style="margin-bottom: .0001pt; margin-bottom: 0in; mso-layout-grid-align: none; mso-pagination: none; text-autospace: none;">
<br /></div>
<div class="MsoNormal" style="margin-bottom: .0001pt; margin-bottom: 0in; mso-layout-grid-align: none; mso-pagination: none; tab-stops: 11.0pt .5in; text-autospace: none;">
</div>
<ul>
<li><a href="http://www.nine-patch.com/complete/BreakOut.htm"><span style="color: #0000f5; mso-bidi-font-family: Times;">Break Out Overview</span></a></li>
<li><a href="http://www.nine-patch.com/download/BrkOutSC.zip"><span style="color: #0000f5; mso-bidi-font-family: Times;">Download Break Out</span></a></li>
<li><a href="http://www.nine-patch.com/qstart.htm"><span style="color: #0000f5; mso-bidi-font-family: Times;">Quick Start</span></a></li>
</ul>
<o:p></o:p><br />
<div class="MsoNormal" style="margin-bottom: .0001pt; margin-bottom: 0in; mso-layout-grid-align: none; mso-pagination: none; tab-stops: 11.0pt .5in; text-autospace: none;">
<o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<!--EndFragment-->Richard Harthttp://www.blogger.com/profile/04962997526156185761noreply@blogger.com0tag:blogger.com,1999:blog-6676724996771468267.post-82597763240718418842013-10-16T04:00:00.000-07:002013-10-16T04:00:07.791-07:00Knowledge and Judgment Scoring - Operational to Instructional<!--[if gte mso 9]><xml>
<o:OfficeDocumentSettings>
<o:AllowPNG/>
</o:OfficeDocumentSettings>
</xml><![endif]--><!--[if gte mso 9]><xml>
<w:WordDocument>
<w:Zoom>0</w:Zoom>
<w:TrackMoves>false</w:TrackMoves>
<w:TrackFormatting/>
<w:PunctuationKerning/>
<w:DrawingGridHorizontalSpacing>18 pt</w:DrawingGridHorizontalSpacing>
<w:DrawingGridVerticalSpacing>18 pt</w:DrawingGridVerticalSpacing>
<w:DisplayHorizontalDrawingGridEvery>0</w:DisplayHorizontalDrawingGridEvery>
<w:DisplayVerticalDrawingGridEvery>0</w:DisplayVerticalDrawingGridEvery>
<w:ValidateAgainstSchemas/>
<w:SaveIfXMLInvalid>false</w:SaveIfXMLInvalid>
<w:IgnoreMixedContent>false</w:IgnoreMixedContent>
<w:AlwaysShowPlaceholderText>false</w:AlwaysShowPlaceholderText>
<w:Compatibility>
<w:BreakWrappedTables/>
<w:DontGrowAutofit/>
<w:DontAutofitConstrainedTables/>
<w:DontVertAlignInTxbx/>
</w:Compatibility>
</w:WordDocument>
</xml><![endif]--><!--[if gte mso 9]><xml>
<w:LatentStyles DefLockedState="false" LatentStyleCount="276">
</w:LatentStyles>
</xml><![endif]-->
<!--[if gte mso 10]>
<style>
/* Style Definitions */
table.MsoNormalTable
{mso-style-name:"Table Normal";
mso-tstyle-rowband-size:0;
mso-tstyle-colband-size:0;
mso-style-noshow:yes;
mso-style-parent:"";
mso-padding-alt:0in 5.4pt 0in 5.4pt;
mso-para-margin-top:0in;
mso-para-margin-right:0in;
mso-para-margin-bottom:10.0pt;
mso-para-margin-left:0in;
line-height:115%;
mso-pagination:widow-orphan;
font-size:11.0pt;
font-family:"Times New Roman";
mso-ascii-font-family:Calibri;
mso-ascii-theme-font:minor-latin;
mso-hansi-font-family:Calibri;
mso-hansi-theme-font:minor-latin;}
</style>
<![endif]-->
<!--StartFragment-->
<br />
<div class="MsoNormal">
<span style="font-size: 12.0pt; line-height: 115%;"> 23<o:p></o:p></span></div>
<div class="MsoNormal">
<span style="font-size: 12.0pt; line-height: 115%;">This post (and the next three) introduce why we need a KJS Group. The software, Power Up Plus (PUP), that contains both Knowledge and Judgment Scoring (KJS) and traditional multiple-choice (TMC) is now free to registered KJS Group members. Version 5.22, is free to teachers and administrators. Please see instructions below. </span><br />
<span style="font-size: 12pt; line-height: 115%;"><br /></span>
<span style="font-size: 12pt; line-height: 115%;">This reflects a change in use of the software as an operational program for scoring individual classroom tests, to use as an instructional program to promote student and teacher development in preparation for the CCSS movement assessments. <b>Students and teachers can readily see the difference between lower and higher levels of thinking when students are offered the opportunity to report, in a non-threatening environment, what they actually trust they know and can do, that serves as the basis for further learning and instruction.</b> Practice riding the tricycle is poor preparation for a </span><span style="line-height: 18px;">riding test on a bicycle.</span><br />
<span style="font-size: 12.0pt; line-height: 115%;"><br /></span>
<span style="font-size: 12pt; line-height: 115%;">Last week I
finished a series of 22 posts on this Multiple-Choice Reborn blog. The series
makes clear, that no amount of “statistical work” can extract from TMC marked answer sheets, some of the claims now being marketed
about them. These tests can, at best, only do a good job of ranking students.</span></div>
<div class="MsoNormal">
<span style="font-size: 12.0pt; line-height: 115%;"><br /></span></div>
<div class="MsoNormal">
<span style="font-size: 12.0pt; line-height: 115%;">They so imperfectly
and incompletely tell us what students know and can do that <a href="http://www.starnewsonline.com/article/20130919/ARTICLES/130919553">North Carolina</a> is
now spending six months figuring out how and where to place the cut scores on
their new CCSS traditionally scored end-of-grade, multiple-choice math test
results. </span><br />
<br /></div>
<div class="MsoNormal">
<span style="font-size: 12.0pt; line-height: 115%;">[They must
guess where to put the cut score on the results from uncommitted, low scoring, improperly prepared students,
who were guessing at the right answers to questions the test maker guessed,
would produce a satisfactory score distribution, with high statistical reliability
and precision. The more nonsensical the student mark data are, the more
subjective the process.] <o:p></o:p></span></div>
<div class="MsoNormal">
<span style="font-size: 12.0pt; line-height: 115%;"><br /></span></div>
<div class="MsoNormal">
<span style="font-size: 12.0pt; line-height: 115%;">Accurate,
honest, and fair testing can be done with </span><a href="http://www.nine-patch.com/"><span style="font-size: 12.0pt; line-height: 115%;">Knowledge and Judgment Scoring</span></a><span style="font-size: 12.0pt; line-height: 115%;"> and the </span><a href="http://www.winsteps.com/winsteps.htm"><span style="font-size: 12.0pt; line-height: 115%;">partial credit Rasch model</span></a><span style="font-size: 12.0pt; line-height: 115%;"> analysis. These methods allow
students to report what they actually know and can do that is meaningful,
useful, and empowering. Student development (the judgment to appropriately use
all levels of thinking) is as important as knowledge and skills for successful
students and employees (</span><a href="http://www.knowledgefactor.com/"><span style="font-size: 12.0pt; line-height: 115%;">Knowledge Factor</span></a><span style="font-size: 12.0pt; line-height: 115%;">). <o:p></o:p></span></div>
<div class="MsoNormal">
<span style="font-size: 12.0pt; line-height: 115%;"><br /></span></div>
<div class="MsoNormal">
<span style="font-size: 12.0pt; line-height: 115%;">The NCLB
decade has laid the foundation for real change by making schools designed for
failure (that promote students beyond their abilities, rather than developing
the necessary abilities for their success) so bad and so visible, that
something had to be done. The CCSS movement has rekindled the old alternative (to TMC) testing and authentic testing methods; with the addition of CAT and elaborate
assessment methods. <o:p></o:p></span></div>
<div class="MsoNormal">
<span style="font-size: 12.0pt; line-height: 115%;"><br /></span></div>
<div class="MsoNormal">
<span style="font-size: 12.0pt; line-height: 115%;">My concern
now is that, after expending a large amount of time and money on promoting the
CCSS movement ideals, a major part of the assessments will once again be
reduced back again to traditional guess testing at the lowest levels of
thinking. <o:p></o:p></span></div>
<div class="MsoNormal">
<span style="font-size: 12.0pt; line-height: 115%;"><br /></span></div>
<div class="MsoNormal">
<span style="font-size: 12.0pt; line-height: 115%;">Both KJS and TMC scoring can use the same test questions.
In fact both methods are used on the same test to accommodate students
working at all levels of thinking and with all degrees of preparation (PUP). <o:p></o:p></span></div>
<div class="MsoNormal">
<span style="font-size: 12.0pt; line-height: 115%;"><br /></span></div>
<div class="MsoNormal">
<span style="font-size: 12.0pt; line-height: 115%;">IMHO, KJS is a practical method of achieving the CCSS movement
goals. It prepares students for standardized tests presented at all levels of thinking. [I still cannot predict when KJS or the partial credit Rash model will be used on standardized tests as current standardized tests are not designed to assess what students know or can do. They are designed, using the fewest questions, to produce an <b>acceptable spread</b> of student scores.]<o:p></o:p></span></div>
<div class="MsoNormal">
<span style="font-size: 12.0pt; line-height: 115%;"><br /></span></div>
<div class="MsoNormal">
<span style="font-size: 12.0pt; line-height: 115%;">Rather than
a rank of 60 on a test, a student may get a quality score of 90% on questions
used to report what the student actually knows and can do, as well as, a rank
of right marks on the test using KJS. We now know what a “just passing” student knows
with the same accuracy as a student earning a 90% score on a traditional test. This can be valuable formative assessment information. <o:p></o:p></span></div>
<div class="MsoNormal">
<span style="font-size: 12.0pt; line-height: 115%;"><br /></span></div>
<div class="MsoNormal">
<span style="font-size: 12.0pt; line-height: 115%;">Letting
students tell us what they know or can do makes more sense than the guessing
game now in use during preparation and assessment. And over 90% of my students preferred Knowledge and Judgment
Scoring after just two experiences with it. Even students like an honest and fair
test over gambling for a grade.<o:p></o:p></span></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<span style="font-size: 12.0pt; line-height: 115%;">Past
performance in my classroom is no guarantee of performance in your classroom
unless you are a likeminded teacher, administrator, or test maker.<o:p></o:p></span><br />
<span style="font-size: 12.0pt; line-height: 115%;"><br /></span>
<br />
<div class="MsoNormal">
<span style="font-size: 12pt; line-height: 18px;">[The </span><a href="http://www.edu-soft.org/"><span style="font-size: 12pt; line-height: 18px;">Educational Software Cooperative</span></a><span style="font-size: 12pt; line-height: 18px;">, Inc. (non-profit) closed this year (2013) after 20 years of operation during which I was the volunteer treasurer. It was founded to maximize the benefits of an <b>individual computer</b>: infinite patience, non-judgmental, and best of all, instant formative feedback. That level of instruction and record keeping has now been surpassed by the necessity for <b>district wide record keeping</b> systems operating online assessments keyed to CCSS learning objectives.]</span><br />
<br />
<br />
<div class="MsoNormal">
<b><span style="font-family: "Arial Bold";">Welcome to the KJS Group</span></b>: Please <b><span style="font-family: "Arial Bold";">register</span></b> at <a href="mailto:KJSgroup@nine-patch.com">mailto:KJSgroup@nine-patch.com</a>. Include something about yourself and your interest in student empowerment (your name, school, classroom environment, LinkedIn, Facebook, email, phone, and etc.).</div>
<div class="MsoNormal">
<b><span style="font-family: "Arial Bold";"><br /></span></b></div>
<div class="MsoNormal">
<b><span style="font-family: "Arial Bold";">Free anonymous download</span></b>, Power Up Plus (PUP), version 5.22 containing both TMC and KJS: <a href="http://www.nine-patch.com/download/PUP522xlsm.zip" title="For Newer Windows Machines">PUP522xlsm.zip</a>, 606 KB or <a href="http://www.nine-patch.com/download/PUP522xls.zip" title="For All (older) Windows Machines">PUP522xls.zip</a>, 1,099 KB.<br />
<br /></div>
<div class="MsoNormal">
</div>
<br />
<div class="MsoNormal" style="margin-bottom: 0.0001pt;">
- - - - - - - - - - - - - - - - - - - - - <o:p></o:p></div>
<div class="MsoNormal" style="margin-bottom: 0.0001pt;">
<br /></div>
<div class="MsoNormal" style="margin-bottom: 0.0001pt;">
Free software to help you and your students experience and understand how to break out of traditional-multiple choice (TMC) and into <a href="http://www.nine-patch.com/">Knowledge and Judgment Scoring</a> (KJS) (tricycle to bicycle):<o:p></o:p></div>
<div class="MsoNormal" style="margin-bottom: 0.0001pt;">
<br /></div>
<div class="MsoNormal" style="margin-bottom: 0.0001pt;">
</div>
<ul>
<li><a href="http://www.nine-patch.com/complete/BreakOut.htm"><span style="color: #0000f5;">Break Out Overview</span></a></li>
<li><a href="http://www.nine-patch.com/download/BrkOutSC.zip"><span style="color: #0000f5;">Download Break Out</span></a></li>
<li><a href="http://www.nine-patch.com/qstart.htm"><span style="color: #0000f5;">Quick Start</span></a></li>
</ul>
</div>
<div class="MsoNormal">
</div>
</div>
<div class="MsoNormal">
<br /></div>
<!--EndFragment-->Richard Harthttp://www.blogger.com/profile/04962997526156185761noreply@blogger.com0tag:blogger.com,1999:blog-6676724996771468267.post-33852012963285943482013-10-09T04:00:00.000-07:002013-10-09T04:00:09.624-07:00Multiple-Choice Test Analysis - Summary<!--[if gte mso 9]><xml>
<o:OfficeDocumentSettings>
<o:AllowPNG/>
</o:OfficeDocumentSettings>
</xml><![endif]--><!--[if gte mso 9]><xml>
<w:WordDocument>
<w:Zoom>0</w:Zoom>
<w:TrackMoves>false</w:TrackMoves>
<w:TrackFormatting/>
<w:PunctuationKerning/>
<w:DrawingGridHorizontalSpacing>18 pt</w:DrawingGridHorizontalSpacing>
<w:DrawingGridVerticalSpacing>18 pt</w:DrawingGridVerticalSpacing>
<w:DisplayHorizontalDrawingGridEvery>0</w:DisplayHorizontalDrawingGridEvery>
<w:DisplayVerticalDrawingGridEvery>0</w:DisplayVerticalDrawingGridEvery>
<w:ValidateAgainstSchemas/>
<w:SaveIfXMLInvalid>false</w:SaveIfXMLInvalid>
<w:IgnoreMixedContent>false</w:IgnoreMixedContent>
<w:AlwaysShowPlaceholderText>false</w:AlwaysShowPlaceholderText>
<w:Compatibility>
<w:BreakWrappedTables/>
<w:DontGrowAutofit/>
<w:DontAutofitConstrainedTables/>
<w:DontVertAlignInTxbx/>
</w:Compatibility>
</w:WordDocument>
</xml><![endif]--><!--[if gte mso 9]><xml>
<w:LatentStyles DefLockedState="false" LatentStyleCount="276">
</w:LatentStyles>
</xml><![endif]-->
<!--[if gte mso 10]>
<style>
/* Style Definitions */
table.MsoNormalTable
{mso-style-name:"Table Normal";
mso-tstyle-rowband-size:0;
mso-tstyle-colband-size:0;
mso-style-noshow:yes;
mso-style-parent:"";
mso-padding-alt:0in 5.4pt 0in 5.4pt;
mso-para-margin-top:0in;
mso-para-margin-right:0in;
mso-para-margin-bottom:10.0pt;
mso-para-margin-left:0in;
line-height:115%;
mso-pagination:widow-orphan;
font-size:11.0pt;
font-family:"Times New Roman";
mso-ascii-font-family:Calibri;
mso-ascii-theme-font:minor-latin;
mso-hansi-font-family:Calibri;
mso-hansi-theme-font:minor-latin;}
</style>
<![endif]-->
<!--StartFragment-->
<br />
<div class="MsoNormal">
22</div>
<div class="MsoNormal">
The past 21 posts have explored how classroom and
standardized tests are <b>traditionally</b>
analyzed. The six most commonly used statistics are made fully transparent in Post
10, Table 15, the Visual Education Statistics Engine (VESE) [Free <a href="http://www.nine-patch.com/download/VESEngine.xlsm">VESEngine.xlsm</a> or <a href="http://www.nine-patch.com/download/VESEngine.xls">VESEngine.xls</a>]. One
more statistic was added for current standardized tests. Numbers must be
meaningful, understood; to have valid, practical value. </div>
<div class="MsoNormal">
<br /></div>
<div class="MsoListParagraphCxSpFirst" style="mso-list: l0 level1 lfo1; text-indent: -.25in;">
</div>
<ul>
<li><b style="text-indent: -0.25in;"> Count:</b><span style="text-indent: -0.25in;">
The count is so obvious that it should not be a problem. But it is a problem in
education. Counting right marks is
not the same as counting what a student knows or can do. Also a cut score is
often set by selecting a point in a range from 0% to 100%. A cut score of 50
means 50%. But the test, when administered as </span><b style="text-indent: -0.25in;">traditional</b><span style="text-indent: -0.25in;"> multiple-choice starts each student at 25% with
4-option questions. [There is no way to know what low scoring students know,
only their rank.]</span></li>
<li><br /></li>
<li><b style="text-indent: -0.25in;"> Average:</b><span style="text-indent: -0.25in;">
Add up all of the individual student scores and divide by the number of
students for the class or test average score. [There is no average student.]
Classes or tests can be compared by their averages just as students can be compared
by their counts or scores.</span></li>
<li><br /></li>
<li><span style="font-family: 'Times New Roman'; font-size: 7pt; text-indent: -0.25in;"> </span><b style="text-indent: -0.25in;">Standard
Deviation (SD): </b><span style="text-indent: -0.25in;">Theoretically, 2/3 of the counts on a distribution of
scores are expected to fall within one SD of the average. A very well prepared
(or very under prepared) class will yield a small SD. A mixed class will yield
a large SD with students with both very high and very low scores (many A-B and
D-F, with few C grades).</span></li>
<li><br /></li>
<li><b style="text-indent: -0.25in;"> Item
Discrimination:</b><span style="text-indent: -0.25in;"> A discriminating question groups those who know (high
scoring students) into one group and those who do not know (low scoring
students) into another group. Every classroom test needs about ten of these to
produce a grade distribution where one SD is ten percentage points (a ten point
range for each grade).</span></li>
<li><br /></li>
<li><b style="text-indent: -0.25in;"> Test
Reliability:</b><span style="text-indent: -0.25in;"> A test has high reliability when the results are highly
reproducible. Standardized tests, therefore, use only discriminating questions.
They rarely ask a question that almost all students can answer correctly. </span><b style="text-indent: -0.25in;">Traditional</b><span style="text-indent: -0.25in;"> multiple-choice, therefore,
does not assess what students actually know and value. </span><b style="text-indent: -0.25in;">Traditional</b><span style="text-indent: -0.25in;"> standardized tests can only rank students.</span></li>
<li><br /></li>
<li><b style="text-indent: -0.25in;"> Standard
Error of Measurement (SEM):</b><span style="text-indent: -0.25in;"> Theoretically, 2/3 of the time a student retakes
the same test; the scores are expected to fall within one SEM of the average.
The SEM value fits inside the range of the SD. “Jimmy, you failed the test, but
based on your test score and your luck on test day, each time you retake the
test, you have a 20% expectation of passing without doing any more studying.”
The SEM precision is based on the reliability of the entire test.</span></li>
<li><br /></li>
<li><b style="text-indent: -0.25in;"> Conditional
Standard Error of Measurement (CSEM):</b><span style="text-indent: -0.25in;"> The CSEM is based (conditioned) on
each test score. This refinement in precision is a recent addition to </span><b style="text-indent: -0.25in;">traditional</b><span style="text-indent: -0.25in;"> multiple-choice analysis.
It has been a part of the </span><a href="http://www.winsteps.com/winsteps.htm" style="text-indent: -0.25in;">Rasch
model IRT</a><span style="text-indent: -0.25in;"> analysis for decades.</span></li>
</ul>
<br />
<div class="MsoListParagraphCxSpLast" style="mso-list: l0 level1 lfo1; text-indent: -.25in;">
<br /></div>
<div class="MsoNormal">
Even the CSEM cannot clean up the damage done by forcing
students to mark every question even when they cannot read or do not understand
the question. <a href="http://www.nine-patch.com/">Knowledge and Judgment
Scoring</a> and the <a href="http://www.winsteps.com/winsteps.htm">partial
credit Rasch model</a> do not have this flaw. Both accommodate students
functioning at all levels of thinking and all levels of preparation. These two scoring methods are in tune
with the objectives of the CCSS movement.</div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal" style="margin-bottom: 0.0001pt;">
- - - - - - - - - - - - - - - - - - - -
- <o:p></o:p></div>
<div class="MsoNormal" style="margin-bottom: 0.0001pt;">
<br /></div>
<div class="MsoNormal" style="margin-bottom: 0.0001pt;">
Free software to help you and your students
experience and understand how to break out of traditional-multiple choice (TMC)
and into <a href="http://www.nine-patch.com/">Knowledge and Judgment Scoring</a> (KJS) (tricycle to bicycle):<o:p></o:p></div>
<div class="MsoNormal" style="margin-bottom: 0.0001pt;">
<br /></div>
<div class="MsoNormal" style="margin-bottom: 0.0001pt;">
</div>
<ul>
<li><a href="http://www.nine-patch.com/complete/BreakOut.htm"><span style="color: #0000f5; mso-bidi-font-family: Times;">Break Out Overview</span></a></li>
<li><a href="http://www.nine-patch.com/download/BrkOutSC.zip"><span style="color: #0000f5; mso-bidi-font-family: Times;">Download Break Out</span></a></li>
<li><a href="http://www.nine-patch.com/qstart.htm"><span style="color: #0000f5; mso-bidi-font-family: Times;">Quick Start</span></a></li>
</ul>
<o:p></o:p><br />
<div class="MsoNormal" style="margin-bottom: 0.0001pt;">
<o:p></o:p></div>
<div class="MsoNormal">
<a href="http://www.blogger.com/blogger.g?blogID=6676724996771468267" name="_GoBack"></a></div>
<!--EndFragment-->Richard Harthttp://www.blogger.com/profile/04962997526156185761noreply@blogger.com0tag:blogger.com,1999:blog-6676724996771468267.post-38668472922561337632013-10-02T04:00:00.000-07:002014-07-08T09:33:02.303-07:00Visual Education Statistics - Conditional Standard Error of Measurement<!--[if gte mso 9]><xml>
<o:OfficeDocumentSettings>
<o:AllowPNG/>
</o:OfficeDocumentSettings>
</xml><![endif]--><!--[if gte mso 9]><xml>
<w:WordDocument>
<w:Zoom>0</w:Zoom>
<w:TrackMoves>false</w:TrackMoves>
<w:TrackFormatting/>
<w:PunctuationKerning/>
<w:DrawingGridHorizontalSpacing>18 pt</w:DrawingGridHorizontalSpacing>
<w:DrawingGridVerticalSpacing>18 pt</w:DrawingGridVerticalSpacing>
<w:DisplayHorizontalDrawingGridEvery>0</w:DisplayHorizontalDrawingGridEvery>
<w:DisplayVerticalDrawingGridEvery>0</w:DisplayVerticalDrawingGridEvery>
<w:ValidateAgainstSchemas/>
<w:SaveIfXMLInvalid>false</w:SaveIfXMLInvalid>
<w:IgnoreMixedContent>false</w:IgnoreMixedContent>
<w:AlwaysShowPlaceholderText>false</w:AlwaysShowPlaceholderText>
<w:Compatibility>
<w:BreakWrappedTables/>
<w:DontGrowAutofit/>
<w:DontAutofitConstrainedTables/>
<w:DontVertAlignInTxbx/>
</w:Compatibility>
</w:WordDocument>
</xml><![endif]--><!--[if gte mso 9]><xml>
<w:LatentStyles DefLockedState="false" LatentStyleCount="276">
</w:LatentStyles>
</xml><![endif]-->
<!--[if gte mso 10]>
<style>
/* Style Definitions */
table.MsoNormalTable
{mso-style-name:"Table Normal";
mso-tstyle-rowband-size:0;
mso-tstyle-colband-size:0;
mso-style-noshow:yes;
mso-style-parent:"";
mso-padding-alt:0in 5.4pt 0in 5.4pt;
mso-para-margin-top:0in;
mso-para-margin-right:0in;
mso-para-margin-bottom:10.0pt;
mso-para-margin-left:0in;
line-height:115%;
mso-pagination:widow-orphan;
font-size:11.0pt;
font-family:"Times New Roman";
mso-ascii-font-family:Calibri;
mso-ascii-theme-font:minor-latin;
mso-hansi-font-family:Calibri;
mso-hansi-theme-font:minor-latin;}
</style>
<![endif]-->
<!--StartFragment-->
<br />
<div class="MsoNormal">
21<br />
<br />
[[Second Pass, 8 July
2014. Equation 6.3 (cited below) in
Statistical Test Theory for the Behavioral Sciences by Dato N.M. de Gruijter and Leo J.
Th. van der Kamp, 2008, is the same as the calculation used in Table 29, in my 9
July 2014 post<span style="font-family: 'Times New Roman';">. </span>On the following page they
mention that the error variance is higher in the center and lower at the
extremes. That distribution is the green curve on Chart 73. I did not see this
relationship in the equation when this post was first posted, but do now in the
visualized mathematical model (Chart 73).<br />
<!--[if gte mso 9]><xml>
<o:OfficeDocumentSettings>
<o:AllowPNG/>
</o:OfficeDocumentSettings>
</xml><![endif]--><!--[if gte mso 9]><xml>
<w:WordDocument>
<w:Zoom>0</w:Zoom>
<w:TrackMoves>false</w:TrackMoves>
<w:TrackFormatting/>
<w:PunctuationKerning/>
<w:DrawingGridHorizontalSpacing>18 pt</w:DrawingGridHorizontalSpacing>
<w:DrawingGridVerticalSpacing>18 pt</w:DrawingGridVerticalSpacing>
<w:DisplayHorizontalDrawingGridEvery>0</w:DisplayHorizontalDrawingGridEvery>
<w:DisplayVerticalDrawingGridEvery>0</w:DisplayVerticalDrawingGridEvery>
<w:ValidateAgainstSchemas/>
<w:SaveIfXMLInvalid>false</w:SaveIfXMLInvalid>
<w:IgnoreMixedContent>false</w:IgnoreMixedContent>
<w:AlwaysShowPlaceholderText>false</w:AlwaysShowPlaceholderText>
<w:Compatibility>
<w:BreakWrappedTables/>
<w:DontGrowAutofit/>
<w:DontAutofitConstrainedTables/>
<w:DontVertAlignInTxbx/>
</w:Compatibility>
</w:WordDocument>
</xml><![endif]--><!--[if gte mso 9]><xml>
<w:LatentStyles DefLockedState="false" LatentStyleCount="276">
</w:LatentStyles>
</xml><![endif]-->
<!--[if gte mso 10]>
<style>
/* Style Definitions */
table.MsoNormalTable
{mso-style-name:"Table Normal";
mso-tstyle-rowband-size:0;
mso-tstyle-colband-size:0;
mso-style-noshow:yes;
mso-style-parent:"";
mso-padding-alt:0in 5.4pt 0in 5.4pt;
mso-para-margin-top:0in;
mso-para-margin-right:0in;
mso-para-margin-bottom:10.0pt;
mso-para-margin-left:0in;
mso-pagination:widow-orphan;
font-size:12.0pt;
font-family:"Times New Roman";
mso-ascii-font-family:Cambria;
mso-ascii-theme-font:minor-latin;
mso-fareast-font-family:"Times New Roman";
mso-fareast-theme-font:minor-fareast;
mso-hansi-font-family:Cambria;
mso-hansi-theme-font:minor-latin;}
</style>
<![endif]-->
<!--StartFragment-->
<div class="MsoNormal" style="margin-bottom: .0001pt; margin-bottom: 0in; mso-layout-grid-align: none; mso-pagination: none; text-autospace: none;">
<o:p></o:p></div>
<div class="MsoNormal" style="margin-bottom: .0001pt; margin-bottom: 0in; mso-layout-grid-align: none; mso-pagination: none; text-autospace: none;">
<br /></div>
<div class="MsoNormal" style="margin-bottom: .0001pt; margin-bottom: 0in; mso-layout-grid-align: none; mso-pagination: none; text-autospace: none;">
Also the discussion of Table 24
has been updated to match the terms and values in Table 24.]]</div>
<br />
<br /></div>
<div class="MsoNormal">
Working on the conditional standard error of measurement
(CSEM) is new territory for me. I always associated the CSEM with the Rasch
model IRT analysis commonly used by state departments of education when scoring
NCLB tests. I first had to Google for basic information.</div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
If you are interested in the details, please check out these
sources for sample (n-1) equations: (Equation 6.14 that corrects the relative
variance was not included in the 2005 version of the current 2008 version. This
represents a significant progress in applying test precision.)<br />
<br /></div>
<ul>
<li><span style="font-family: Symbol; text-indent: -0.25in;"><span style="font-family: 'Times New Roman'; font-size: 7pt;"> </span></span><span style="text-indent: -0.25in;">Absolute Error Variance</span><span style="text-indent: -0.25in;"> </span><span style="text-indent: -0.25in;"> </span><a href="http://14dejavu.files.wordpress.com/2013/05/statistical-test-theory-for-bahavioral-science.pdf" style="text-indent: -0.25in;">Equation 5.39</a><span style="text-indent: -0.25in;"> p. 73</span></li>
<li><span style="font-family: Symbol; text-indent: -0.25in;"><span style="font-family: 'Times New Roman'; font-size: 7pt;"> </span></span><span style="text-indent: -0.25in;">Relative Error Variance</span><span style="text-indent: -0.25in;"> </span><span style="text-indent: -0.25in;"> </span><a href="http://14dejavu.files.wordpress.com/2013/05/statistical-test-theory-for-bahavioral-science.pdf" style="text-indent: -0.25in;">Equation 6.3</a><span style="text-indent: -0.25in;"> p. 83</span></li>
<li><span style="font-family: Symbol; text-indent: -0.25in;"><span style="font-family: 'Times New Roman'; font-size: 7pt;"> </span></span><span style="text-indent: -0.25in;">Corrected Relative Variance</span><span style="text-indent: -0.25in;"> </span><a href="http://14dejavu.files.wordpress.com/2013/05/statistical-test-theory-for-bahavioral-science.pdf" style="text-indent: -0.25in;">Equation 6.14</a><span style="text-indent: -0.25in;"> p. 91 or GED </span><a href="http://files.eric.ed.gov/fulltext/ED510063.pdf" style="text-indent: -0.25in;">Equation
3</a><span style="text-indent: -0.25in;"> p. 9</span></li>
</ul>
<br />
<div class="MsoNormal">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh0pmyJGZxoXSkZqkkTwCAC2eWs8lusse9CLZ_DmvzbF7B2tSzLTX7b7Q-u0cdPIqfI7yETzEk_wEzvWwXzF38G3hjZYEjMMoPMO46lwKDeigZ8ud4-VtYG0PEY8fbsekySUROxD5XTiis/s1600/Table+22.jpg" imageanchor="1" style="clear: right; float: right; margin-bottom: 1em; margin-left: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh0pmyJGZxoXSkZqkkTwCAC2eWs8lusse9CLZ_DmvzbF7B2tSzLTX7b7Q-u0cdPIqfI7yETzEk_wEzvWwXzF38G3hjZYEjMMoPMO46lwKDeigZ8ud4-VtYG0PEY8fbsekySUROxD5XTiis/s200/Table+22.jpg" height="98" width="200" /></a>My first surprise was to find I had already calculated the
CSEM for the Nursing124 data when I put up <a href="http://richard-hart.blogspot.com/2013/04/visual-education-statistics-test.html">Post 5</a><!--[if !supportNestedAnchors]--><a href="http://www.blogger.com/blogger.g?blogID=6676724996771468267" name="_GoBack"></a><!--[endif]--> of this series (in Table 8. Interactions with
Columns [Items] Variance, MEAN SS = 3.33) as I discovered five ways to harvest the variance [mean sum of squares (MSS)]. Equation 6.3 n, Table 22, produces the same
result (test SEM = 1.75) when it divides by n [unknown
population] rather than n-1 [observed sample].</div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
[n = the item count. Test SEM = AVERAGE(CSEM).]</div>
<div class="MsoNormal">
<br /></div>
<div class="separator" style="clear: both; text-align: center;">
</div>
<div class="MsoNormal">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjyZtyiMJyA-Jxo-b4DXhwitXpa5OjXAjtNbJj23X_oPNopLxIiKJWQJu5kA9T4bexqlyctuRAW2smoBd1qU2QiCErRmYNBQFOXNEIUnjqSZREmSwhkh2XLs9nNfLYjh7ciabVsaudibYk/s1600/Table+23.jpg" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjyZtyiMJyA-Jxo-b4DXhwitXpa5OjXAjtNbJj23X_oPNopLxIiKJWQJu5kA9T4bexqlyctuRAW2smoBd1qU2QiCErRmYNBQFOXNEIUnjqSZREmSwhkh2XLs9nNfLYjh7ciabVsaudibYk/s200/Table+23.jpg" height="200" width="196" /></a>I then used what I learned in the last post to table data to
obtain the conditional error variance for student scores (Table 23a). The 21 items in Table 22 became the number of right marks on each of 11 item difficulties on Table 23a. The
values in this tabulation were then converted into frequencies conditional on the student scores; the sum of which added to one, for each score (Table 23b).<br />
<br />
The
absolute error variance for each score was computed by Excel (=Var.P). Multiplying the absolute error variance (0.14382) by the square of the item count (21^2) yields the relative error variance (63.42). [Equation 5.39 (0.14382) * n^2 = Equation 6.3 (63.42)] The square root
of the relative error variance of each score yields the CSEM for that score. [An alternate calculation of the absolute error variance is shaded in Table 23b. Here the variance was calculated first and that value divided by the <b>squared score</b> to obtain the absolute error variance. This helps explain multiplying the absolute error variance by the <b>squared item count</b> to obtain the relative error variance for each score.]</div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgTDFWXl0Moh-U-Ryn0qoewP-RuyeaDDztyFk7P3PwFo_XccOe0GYN-6L72pFg1nTY98mSLijBg5N9EwEWDncLT2nOAxCv4lcGLzMIK7kujc2HpUxSmSolr-YYhwoJGu99MSzduGsrDU78/s1600/Chart+61.jpg" imageanchor="1" style="clear: right; float: right; margin-bottom: 1em; margin-left: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgTDFWXl0Moh-U-Ryn0qoewP-RuyeaDDztyFk7P3PwFo_XccOe0GYN-6L72pFg1nTY98mSLijBg5N9EwEWDncLT2nOAxCv4lcGLzMIK7kujc2HpUxSmSolr-YYhwoJGu99MSzduGsrDU78/s320/Chart+61.jpg" height="248" width="320" /></a></div>
The conditional frequency estimated test SEM was 1.68 (Table
23b). The conditional frequency CSEM values for each score were
different for students with the same score. The CSEM values had to be averaged to get
results comparable with the other analyses. These values generated an irregular
curve, unlike the smooth curve for the other analyses (Chart 61). The
conditional frequency CSEM analysis is sensitive to the number of items with the same difficult (yellow bars
alternate for each change in value, Table 23b). The other analyses are not
sensitive to item difficulty (yellow bars, in Table 22, include all students with the same
score).</div>
<div class="MsoNormal">
<br /></div>
<div class="separator" style="clear: both; text-align: center;">
</div>
<div class="MsoNormal">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEicDWWdCLK5zYjdXNlUolahoVqTDPnE0tKoxRMXMrIcLP_DhjJu9cBdCj161Sson8iogamcridYXPuXzmUVf8OngyK8G7MCyHi-2r_2nb4Q9-elXb0vbxvrbFjraQPW3PsQGsBFSO0_3DE/s1600/Table+24.jpg" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEicDWWdCLK5zYjdXNlUolahoVqTDPnE0tKoxRMXMrIcLP_DhjJu9cBdCj161Sson8iogamcridYXPuXzmUVf8OngyK8G7MCyHi-2r_2nb4Q9-elXb0vbxvrbFjraQPW3PsQGsBFSO0_3DE/s320/Table+24.jpg" height="302" width="320" /></a>Complete curves were generated from Equation 6.3 for n-1 and
for GED n-1 (Table 24). The GED n-1 analysis includes a correction factor (cf) for
the <b>range of item difficulties</b> on
the test [cf = (1- KR20)/(1-KR21)]. This factor is equal to one if all items
are of equal difficulty. For the Nursing123 data it was 1.59; the difficulties
ranged from 45% to 95%, from the middle of the total possible
distribution to one extreme.</div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
The CSEM values from the six analyses are listed in Table 24.
Five are fairly close to one another. The GED n-1, with a correction for the
range of item difficulties, is far different from the other five (Chart 61).
Values could not be created for the full curve for conditional frequencies as
you must actually have student marks to calculate conditional frequency CSEM
values. The gray area shows the values calculated from an equation for which
there were no actual data. Equations produce nice looking, “look right”,
reports.</div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
The CSEM improves the reportable precision on this test over
using the test SEM. Good judgment (best practice) is to correct the CSEM values
as done on the GED n-1 analysis.</div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
[I did not transform the raw test score mean of 16.8 or
79.8% to a scale score of 50% as was done by Setzer, 2009, <a href="http://files.eric.ed.gov/fulltext/ED510063.pdf">GED</a>, p. 6 and Tables
2 and 3. The GED n-1 raw score cut point was 60% which is comparable to most
classroom tests. If 25% of the score is from luck on test day that leaves 35%
for what a student marked right as something known or could be done, as a worst
case. If half of the lucky marks were also something the student knew or could
do, the split would be about 10% for luck on test day and 50% for student
ability.]</div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
In Table 24, the GED n-1 analysis test SEM of 2.98 for the Nursing124
data is, as a range, 2.98/21 or 14.19%. For the uncorrected Equation 6.3 n-1
analysis, 1.79, the range is 1.79/21 or 8.52%. The n SEM was 1.75 or 7.95%.
The n SEM range, 1.75, fits within the uncorrected n - 1 test SEM value, 1.79. The corrected GED n-1 test
SEM value, 2.98, exceeds it.</div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
Student score CSEM values are even more sensitive than the
test SEM values. The maximum range for the GED n-1 analysis is 3.73 or 3.73/21
or 17.76% and for the Equation 6.3 n-1 analysis 2.35 or 11.19%. Both are beyond
the maximum n CSEM value of 2.29 or 10.41%. This low quality set of data fails to qualify as a means of
setting classroom grades or a standardized test cut score. </div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
[However the classroom rule of 75% for passing the course
and the rule for grades set at 10 percentage points over rule these statistics.
Here is a good example that test statistics have meaning only in relation to
how they are used. If the process of data reduction and reporting is not
transparent, the resulting statistics are suspect and can produce extended
debates over a passing score in the classroom.]</div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
The CSEM for each student score does improve test precision.
It can be calculated in several ways with close agreement. But it cannot
improve the quality of the student marks on the answer sheets made under
traditional, forced-choice, multiple-choice rules. These tests only rank
students by the number of right marks. They do not ask students, or allow
students to report, what they really know or can do; their judgment in using
what they know or can do.</div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
The CCSS movement is now promoting learning at higher levels
of thinking (problem solving) with, from which I have learned, some
de-emphasis on lower levels of
thinking that are the foundation for higher levels of thinking. A successful
student cycles through all levels of thinking, as is needed. Yet half of the CCSS
testing will be at the lowest levels of thinking, traditional multiple-choice
scoring. The other half will be as much of an over kill as traditional
multiple-choice is an under kill in assessing student knowledge, skills, and
student development to learn and apply their abilities. <a href="http://stateimpact.org/florida2013/09/05/will-new-common-standards-mean-less-teaching-to-the-test/">Others</a>
have this same concern that centralized politics (and dollars) will continue to
overshadow the reality of the classroom. </div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
There is a middle ground that makes every question function
at higher levels of thinking, allows students to report what is meaningful, of
value, and empowering, and has the speed, low cost, and precision of
traditional multiple-choice. <a href="http://www.nine-patch.com/">Knowledge and
Judgment Scoring</a> and <a href="http://www.winsteps.com/winsteps.htm">partial
credit Rasch model IRT</a> are two examples. They both accommodate students
functioning at <b>all levels</b> of thinking.
Lower ability students do not have to guess their way through a test. With
routine use, both can turn passive pupils into self-correcting highly
successful achievers in the classroom. If you are really into mastery learning,
you can also try something like <a href="http://www.knowledgefactor.com/">Knowledge
Factor</a>.</div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal" style="margin-bottom: 0.0001pt;">
- - - - - - - - - - - - - - - - - - - -
- <o:p></o:p></div>
<div class="MsoNormal" style="margin-bottom: 0.0001pt;">
<br /></div>
<div class="MsoNormal" style="margin-bottom: 0.0001pt;">
Free software to help you and your students
experience and understand how to break out of traditional-multiple choice (TMC)
and into <a href="http://www.nine-patch.com/">Knowledge and Judgment Scoring</a> (KJS) (tricycle to bicycle):<o:p></o:p></div>
<div class="MsoNormal" style="margin-bottom: 0.0001pt;">
<br /></div>
<div class="MsoNormal" style="margin-bottom: 0.0001pt;">
</div>
<ul>
<li><a href="http://www.nine-patch.com/complete/BreakOut.htm"><span style="color: #0000f5; mso-bidi-font-family: Times;">Break Out Overview</span></a></li>
<li><a href="http://www.nine-patch.com/download/BrkOutSC.zip"><span style="color: #0000f5; mso-bidi-font-family: Times;">Download Break Out</span></a></li>
<li><a href="http://www.nine-patch.com/qstart.htm"><span style="color: #0000f5; mso-bidi-font-family: Times;">Quick Start</span></a></li>
</ul>
<o:p></o:p><br />
<div class="MsoNormal" style="margin-bottom: 0.0001pt;">
<o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<!--EndFragment-->Richard Harthttp://www.blogger.com/profile/04962997526156185761noreply@blogger.com0