The seven statistics reviewed in previous posts need to be
related to the underlying mathematics. Traditional multiple-choice (TMC) data
analysis has been expressed entirely with charts and the Excel spreadsheet VESEngine.
I will need a TMC math model to compare TMC with the Rasch model IRT that is
the dominant method of data analysis for standardized tests.
A mathematical model contains the relationships and
variables listed in the charts and tables. This post applies the advice in
learning discussed in the previous post. It starts with the observed variables.
The mathematical model then summarizes the relationships in the seven
statistics.
The model contains two levels (Table 25). The first floor
level contains the observed mark patterns. The second floor level contains the
squared deviations from the score and item means; the variation in the mark
patterns. The squared values are then averaged to produce the variance.
[Variance = Mean sum of squares = MSS]
1. Count
The right marks are counted for each student and each item
(question). TMC: 0-wrong, 1-right captures quantity only. Knowledge and
Judgment Scoring (KJS) and the partial credit Rash model (PCRM) capture
quantity and quality: 0-wrong, 1-have yet to learn this, 2-right.
Hall JR Count = SUM(right marks) = 20
Item 12 Count = SUM(right marks) = 21
2. Mean (Average)
The sum is divided by the number of counts. (N students, 22
and n items, 21)
The SUM of scores / N = 16.77; 16.77/n = 0.80 = 80%
The SUM of items / n = 17.57; 17.57/N = 0.80 = 80%
3. Variance
The variation within any column or row is harvested as the deviation
between the marks in a student (row) or item (column) mark pattern, or between student
scores, with respect to the mean value. The squared deviations are summed and
averaged as the variance on the top level of the mathematical model (Table 25).
Variance = SUM(Deviations^2)/(N or n) = SUM of Squares/(N or
n) = Mean SS = MSS
4. Standard Deviation
The variation within a
score, item, or probability distribution expressed as a normal value that +/-
the mean includes 2/3 of a normal, bell-shaped, distribution: 1 Standard
Deviation = 1SD.
SD = Square Root of
Variance or MSS = SQRT(MSS) = SQRT(4.08) = 2.02
For small classroom tests
the (N-1) SD = SQRT(4.28) = 2.07 marks
The variation in student
scores and the distribution of student scores are now expressed on the same
normal scale.
5. Test Reliability
The ratio of the true
variance to the score variance estimates the test reliability: the Kuder-Richardson
20 (KR20). The score (marginal column) variance – the error (summed from within
Item columns) variance = the true variance.
KR 20 = ((score variance –
error variance)/score variance) x n/1-n)
KR 20 = ((4.08 –
2.96)/4.08) x 21/20 = 0.29
This ratio is returned to
the first floor of the model. An acceptable classroom test has a KR20 > 0.7.
An acceptable standardized test has a KR20 >0.9.
6. Traditional Standard Error of Measurement
The range of error in
which 2/3 of the time your retest score may fall is the standard error of
measurement (SEM). The traditional SEM is based on the average performance of
your class: 16.77 +/- 1SD (+/- 2.07 marks).
SEM = SQRT(1-KR20) * SD =
SQRT(1- 0.29) * 2.07 = +/-1.75 marks
On a test that is totally
reliable (KR20 = 1), the SEM is zero. You can expect to get the same score on a
retest.
7. Conditional Standard Error of Measurement
The range of error in which
2/3 of the time your retest score may fall based on the rank of your test score
alone (conditional on one score rank) is the conditional standard error of
measurement (CSEM). The estimate is based (conditional) on your test score
rather than on the average class test score.
CSEM = SQRT((Variance
within your Score) * n number of questions) = SQRT(MSS * n) = SQRT(SS)
CSEM = SQRT(0.15 * 21) =
SQRT(3.15) = 1.80 marks
The average CSEM values (1.75)
for all of your class (light green) also yields the test SEM. This confirms the
above calculation for 6. Traditional Standard Error of Measurement for the test.
This mathematical model (Table
25) separates the flat display in the VESEngine into two
distinct levels. The lower floor is on a normal scale. The upper floor isolates
the variation within the marking patterns on the lower floor. The resulting
variance provides insight into the extent that the marking patterns could have
occurred by luck on test day and into the performance of teachers, students,
questions, and the test makers. Limited predictions can also be made.
Predictions are limited
using traditional multiple-choice (TMC) as students have only two options:
0-wrong and 1-right. Quantity and quality are linked into a single ranking. Knowledge and Judgment Scoring (KJS) and
the partial credit Rasch model
(PCRM) separate quantity and quality: 0-wrong, 1-have yet to learn, and
2-right. Students are free to report what they know and can do accurately,
honestly, and fairly.
- - - -
- - - - - - - - - - - - - - - - -
Free
software to help you and your students experience and understand how to break
out of traditional-multiple choice (TMC) and into Knowledge and Judgment
Scoring (KJS) (tricycle to bicycle):
No comments:
Post a Comment