#6
The precision of the average test score can be obtained from
the math model in two ways: directly
from the mean sum of squares (MSS) or variance, and traditionally, by way of
the test reliability (KR20).
I obtained the precision
of each individual student test score from the math model by taking the
square root of the sum of squared deviations (SS) within each score mark
pattern (green, Table 25). The value is called the conditional standard error
of measurement (CSEM) as it sums
deviations for one student score (one condition), not for the total test.
I multiplied the mean sum of squares (MSS) by the number of
items averaged (21) to yield the SS (0.15 x 21 = 3.15 for a 17 right mark score)
(or I could have just added up the squared deviations). The SQRT(3.15) = 1.80 right
marks for the CSEM. Some 2/3 of the time a re-tested
score of 17 right marks can be expected to fall between 15.20 and 18.80 (15
and 19) right marks (Chart 70).
The test Standard Error of Measurement (SEM) is then the average of the 22 individual CSEM values (1.75 right marks or 8.31%).
The traditional
derivation of the test SEM (the error in the average test score) combines
the test reliability (KR20) and the SD
(spread) of the average test score.
The SD (2.07) is from the SQRT(MSS, 4.08) between student
scores. The test reliability (0.29) is the ratio of the true variance (MSS,
1.12) to the total variance (MSS, 4,08) between student scores (see previous
post).
The expectation
is that the greater the reliability of a test, the smaller the error in estimating
the average test score. An equation is now needed to transform variance values
on the top level of the math model to apply to the lower linear level.
SEM = SQRT(1 – KR20) * SD = SQRT(1 – 0.29) * 2.07 = SQRT(0.71)
* 2.07 = 0.84 * 2.07 = 1.75 right marks.
The operation of
“1 – KR20” aligns the value of 0.71 to extract the portion of the SD that
represents the SEM. If the test reliability goes up, the error in estimating
the average test score (SEM) goes down.
Chart 70 shows the variance (MSS), the SS, and the CSEM
based on 21 items, for each student score. It also shows the distribution of
the CSEM values that I averaged for the test SEM.
The individual CSEM
is highest (largest error, poorer precision) when the student score is 50%
(Charts 65 and 70). Higher student scores yield lower CSEM values (better
precision). This makes sense.
The test SEM (the
average of the CSEM values) is related to the distribution of student test
scores (purple dash, Chart 70). Adding easy items (easy in the sense that the
students were well prepared) decreases error, improves precision, reduces the SEM.
- - - -
- - - - - - - - - - - - - - - - -
The Best of the Blog - FREE
- The Visual Education Statistics Engine (VESEngine) presents the common education statistics on one Excel traditional two-dimensional spreadsheet. The post includes definitions. Download as .xlsm or .xls.
- This blog started seven years ago. It has meandered through several views. The current project is visualizing the VESEngine in three dimensions. The observed student mark patterns (on their answer sheets) are on one level. The variation in the mark patterns is on a second level.
- Power Up Plus (PUP) is classroom friendly software used to score and analyze what students guess (traditional multiple-choice) and what they report as the basis for further learning and instruction (knowledge and judgment scoring multiple-choice). This is a quick way to update your multiple-choice to meet Common Core State Standards (promote understanding as well as rote memory). Knowledge and judgment scoring originated as a classroom project, starting in 1980, that converted passive pupils into self-correcting highly successful achievers in two to nine months. Download as .xlsm or .xls. Quick Start