Wednesday, April 23, 2014

Test Scoring Math Model - Reliability

5
An estimate of the reliability or reproducibility of a test can be extracted from the variation within the tabled right marks (Table 25). The variance from within the item columns is related to the variance from within the student score column.

The error within items variance (2.96) and total variance (MSS) between student scores (4.08) are both obtained from columns in Table 25b (blue, Chart 68). The true variance is then 4.08 – 2.96 = 1.12.

The ratio of true variance to the total variance between scores (1.12/4.08) becomes an indicator of test reliability (0.28). This makes sense.

A test with perfect reliability (4.08/4.08 = 1.0) would have no variation, error variance = 0, within the item columns in Table 25. A test with no reliability (0.0/4.08) would show equal values (4.08) for within item columns, and between test scores.

The KR20 formula then adjusts the above value (0.28 x 21/20) to 0.29 [from a large population (n) to a small sample value (n-1)]. The KR20 ratio has no unit labels (“var/var” = “”). All of the above takes place on the upper (variance) level of the math model.

Doubling the number of students taking the test (Chart 69) has no effect on reliability. Doubling the number of items doubles the error variance but increases the total variance by the square. The test reliability increases from 0.29 to 0.64.

The square root of the total variance between scores (4.08) yields the standard deviation (SD) for the score distribution [(2.02 for (n) and 2.07 for (n-1)] on the lower floor of the math model.

- - - - - - - - - - - - - - - - - - - - -

The Best of the Blog - FREE
• The Visual Education Statistics Engine (VESEngine) presents the common education statistics on one Excel traditional two-dimensional spreadsheet. The post includes definitions. Download as .xlsm or .xls.
• This blog started seven years ago. It has meandered through several views. The current project is visualizing the VESEngine in three dimensions. The observed student mark patterns (on their answer sheets) are on one level. The variation in the mark patterns is on a second level.
• Power Up Plus (PUP) is classroom friendly software used to score and analyze what students guess (traditional multiple-choice) and what they report as the basis for further learning and instruction (knowledge and judgment scoring multiple-choice). This is a quick way to update your multiple-choice to meet Common Core State Standards (promote understanding as well as rote memory). Knowledge and judgment scoring originated as a classroom project, starting in 1980, that converted passive pupils into self-correcting highly successful achievers in two to nine months. Download as .xlsm or .xlsQuick Start