Test Scores Consider the following table comparing the grade averages and mathematics SAT scores of high school students in 1988 and [998, % Students 1988 1998 SAT Score 1988 1998 Grade Average A+ A A B C Overall average Change 632 586 556 490 431 504 629 582 554 487 428 514 11 13 53 19 15 16 48 14 -3 +10 Source: Cited in Chance, Vol: 12, No. 2,1999, from data in New York Times_ September 2, 1999.

The Correct Answer and Explanation is:1
Correct Answer
The data reveals a statistical phenomenon known as Simpson’s Paradox. While the average mathematics SAT score for students within every specific grade average category (A+, A, A-, B, C) declined from 1988 to 1998, the overall average SAT score for all students increased by 10 points.
This occurred because the distribution of students across the grade categories changed significantly. A much larger percentage of students achieved higher grades in 1998 than in 1988. This shift in weighting toward the higher-performing groups was substantial enough to raise the overall average, even though performance within each individual group slightly decreased.
Explanation
The table presents a fascinating statistical puzzle: how can the overall average score increase when the average score for every single subgroup decreases? This counterintuitive result is a classic example of Simpson’s Paradox, which occurs when a trend that appears in different groups of data disappears or reverses when the groups are combined.
The key to understanding this paradox lies in the changing distribution of students, as shown in the “% Students” columns. The overall SAT average is a weighted average. The score of each grade group is weighted by the percentage of students who earned that grade.
- Shift in Student Distribution: Between 1988 and 1998, there was significant “grade inflation.” The percentage of students earning grades of A-, A, or A+ increased from 28% (4+11+13) to 38% (7+15+16). Consequently, the percentage of students earning B’s and C’s decreased.
- The Power of Weighting: Students with higher grades (like A+) have much higher SAT scores than students with lower grades (like C). In 1998, a larger proportion of the student population was in these high-scoring groups. Even though an “A” student in 1998 scored slightly lower than an “A” student in 1988 (582 vs. 586), the fact that there were more “A” students in the 1998 cohort gave their high scores more weight in the overall calculation.
In essence, the increase in the overall average SAT score is not due to students becoming better prepared within their respective achievement levels. Instead, it is a compositional effect. The overall student body in 1998 was composed of a greater proportion of high-achieving (and thus higher-scoring) students, and this structural shift was powerful enough to pull the combined average up, masking the slight decline in scores within each fixed grade category.thumb_upthumb_down
