Wednesday, July 9, 2014

Small Sample Math Model - SEMs

                                                                      8
The test standard error of measurement (SEM) can be calculated in two ways: The traditional way is by relating the variance between student scores and within item difficulties; between an external column and the internal cell columns.

The second way harvests the variance conditioned on each student score and then sums the CSEM (SQRT(conditional student score error variance)) for the test. The first method links two properties: student ability and item difficulty. The second only uses one property: student ability.

I set up a model with 12 students and 11 items (see previous post and Table26.xlsm below). Extreme values of zero and 100% were excluded. Four samples with average test scores of 5, 6, 7 (Table 29), and 8 were created with the standard deviation (1.83) and the variance within item difficulties (1.83) held constant. This allowed the SEM to vary between methods.

The calculation of the test SEM (1.36) by way of reliability (KR20) is reviewed on the top level of Chart 73. The test SEM remained the same for all four tests.

My first calculation of the test SEM by way of conditional standard error of measurement (CSEM) began with the deviation of each mark from the student score (Table 29 center). I squared the deviations and summed to get the conditional variance for each score. The individual student CSEM is given as the square root of the conditional variance (the SD of the conditional variance). The test SEM (1.48) is then the sum of the student CSEM values.

[My second calculation was based on the binomial standard error of measurement given in Crocker, Linda, and James Algina, 1986, Introduction to Classical & Modern Test Theory, Wadsworth Group, pages 124-127.

By including the “correction for obtaining unbiased estimates of population variance”, (n/(n – 1), the SEM value increased from 1.48 to 1.55 (Table 29). This is a perfect match to the binomial SEM.]

The two SEMs are then based on different sample sizes and different assumptions. The traditional SEM (1.36) is based on the raggedly distributed small sample size in hand. The binomial SEM (1.55) assumes a perfectly normally distributed large theoretical population.

[Variance calculations (variance is additive):

  • Test variance: Score deviations from the test mean (as counts), squared, and summed = a sum of squares (SS). SS/N = MSS or variance: 3.33. {Test SD = SQRT(Var) = 1.83. Test SEM = 1.36.}

  • Conditional error variance: Deviations from the student score (as a percent), squared, and summed = the conditional error variance (CVar) for that student score. {Test SEM = Average SQRT(CVar) = 1.48 (n) and 1.55 (n-1)}

  • Conditional error variance: Variance Within the Score row (Excel, VAR) x (n or n - 1) = the CVar for that student score. {Test SEM VAR.P = 1.48 and VAR.S = 1.55.] 
Squaring values produces curved distributions (Chart 73). The curves represent the possible values. They do not represent the number of items or student scores having those values.

The True MSS = Total MSS – Error MSS = 3.33 -1.83 = 1.50, involves subtracting a convex distribution centered on the average test score from a concave distribution centered on the maximum value of 0.25 (not on the average item difficulty).

The student score MSS is at a maximum when the item error SS is at a minimum. The error MSS is at a maximum (0.25) when the student score MSS is at a minimum (0.00). This makes sense. This item is perfectly aligned with the student score distribution at a point where there is not differing from the average test score.

The KR20 is then a ratio of the True MSS/Total MSS, 1.50/3.33 = 0.50. [KR20 ranges from 0 to 1, not reproducible to fully reproducible]. The test SEM is then a portion, SQRT(1 – KR20) of the SD [also 1.83 in this example, SQRT(3.33)] = SQRT(1 – 0.50) * 1.83 = 1.36.

I was able to set the test SEM estimates using KR20 all to 1.36 for all four tests by setting the SD of student scores and the item error MSS to constant values by switching a 0 and 1 pair in student mark patterns. [The SD and the item error MSS do not have to be the same values.]

All possible individual student score binomial CSEM values for a test with 11 items are listed in Table 30. The CSEM is given as the SQRT(conditional variance). The conditional variance is: (X * (n – X))/(n – 1) or n*(pg) * (n/(n - 1)). There is then no need to administer a test to calculate a student score binomial CSEM value. There is a need to administer a test to find the test SEM. The test SEM (Table 29) is the sum of these values, 1.55.

The student CSEM and thus the test SEM values are derived only from student mark patterns. They differ from the test SEM values derived from the KR20 (Table 31). With KR20 derived values held constant, the binomial CSEM derived values for SEM decreased with higher test scores. This makes sense. There is less room for chance events. Precision increases with higher test scores.

Given a choice, a testing company would select the KR20 method using CTT analysis to report test SEM results.

[The same SEM values for tests with 5 right and 6 right resulted from the fact that the median score was 5.5. The values for 5 right and 6 right fall an equal distance from the mean on either side. Therefore 5 and 6 or 6 and 5 both add up to 11.]

I positioned the green curve on Chart 73 using the above information.

A CSEM value is independent from the average test score and item difficulties. (Swapping paired 0s and 1s in student mark patterns to adjust the item error variance made no difference in the CSEM value.) The average of the CSEM values, the test SEM, is dependent on the number of items on the test with each value. If all scores are the same, the CSEMs and the SEM will be the same (Tables 30 and 31).

I hope at this stage to have a visual mathematical model that is robust enough to make meaningful comparisons with the Rasch IRT model. I would like to return to this model and do two things (or have someone volunteer do it):

  1. Combine all the features that have been teased out, in Chart 72 and Chart 73, into one model.
  2. Animate the model in a meaningful way with change gages and history graphs.
Now to return to the Nursing data that represent the real classroom, filled with successful instruction, learning, and assessment.

- - - - - - - - - - - - - - - - - - - - -

Table26.xlsm, is now available free by request. (Files hosted at nine-patch.com are also being relocated now that Nine-Patch Multiple-Choice, Inc has been dissolved.)

The Best of the Blog - FREE

The Visual Education Statistics Engine (VESEngine) presents the common education statistics on one Excel traditional two-dimensional spreadsheet. The post includes definitions. Download as .xlsm or .xls.

This blog started five years ago. It has meandered through several views. The current project is visualizing the VESEngine in three dimensions. The observed student mark patterns (on their answer sheets) are on one level. The variation in the mark patterns (variance) is on the second level.


Power Up Plus (PUP) is classroom friendly software used to score and analyze what students guess (traditional multiple-choice) and what they report as the basis for further learning and instruction (knowledge and judgment scoring multiple-choice). This is a quick way to update your multiple-choice to meet Common Core State Standards (promote understanding as well as rote memory). Knowledge and judgment scoring originated as a classroom project, starting in 1980, that converted passive pupils into self-correcting highly successful achievers in two to nine months. Download as .xlsm or .xls.