What measures of central tendency and dispersion ore the most appropriate to use with the following sets of data?
a. Salaries of 125 physicians in a clinic
b. The test scores of all medical students taking USLME Step I of the National Board Examination in a given year
c. Serum sodium levels of healthy individuals
d. Number of tender joints in 30 joints evaluated on a standard examination for disease activity in rheumatoid arthritis patients
e. Presence of diarrhea in a group of infants
f. The disease stages for a group of patients with Reye’s syndrome (six stages, ranging from O = alert wakefulness to 5 = unorousable, flaccid paralysis, areflexia, pupils unresponsive)
g. The age at onset of breast cancer in females h. The number of pills left in subjects’ medicine bottles when investigators in a study counted the pills to evaluate compliance in taking medication
The Correct Answer and Explanation is :
Here is a breakdown of the appropriate measures of central tendency and dispersion for each data set, along with an explanation:
a. Salaries of 125 physicians in a clinic
- Central Tendency: Median (less affected by outliers like extremely high salaries).
- Dispersion: Range or Interquartile Range (IQR), and Standard Deviation (if distribution is normal).
Salaries can have outliers (e.g., a few highly paid specialists), so the median is a better measure than the mean in such cases. For dispersion, the IQR provides a robust measure unaffected by extreme values.
b. Test scores of medical students (USMLE Step I)
- Central Tendency: Mean (if scores are normally distributed) or Median (if skewed).
- Dispersion: Standard Deviation (common for test scores) or Range.
Test scores often approximate a normal distribution, making the mean and standard deviation appropriate.
c. Serum sodium levels of healthy individuals
- Central Tendency: Mean (assuming normal distribution).
- Dispersion: Standard Deviation.
Serum sodium levels in healthy individuals are typically tightly regulated and normally distributed, so the mean and standard deviation are appropriate.
d. Number of tender joints in rheumatoid arthritis patients
- Central Tendency: Median (if distribution is skewed, as this count may include zeros or low numbers).
- Dispersion: IQR (better for skewed data) or Range.
Count data often have a skewed distribution, and the median and IQR are robust measures.
e. Presence of diarrhea in infants (yes/no)
- Central Tendency: Proportion (percentage of infants with diarrhea).
- Dispersion: None (nominal data doesn’t have a meaningful measure of spread).
Binary outcomes are best summarized with proportions or percentages.
f. Disease stages for Reye’s syndrome
- Central Tendency: Median (ordinal data).
- Dispersion: Range (e.g., from Stage 0 to Stage 5).
Ordinal data like stages are not suited for means but can use medians and ranges.
g. Age at onset of breast cancer in females
- Central Tendency: Median (accounts for skewness, as age data may include outliers).
- Dispersion: IQR or Range.
Age data is often skewed, and robust measures like the median and IQR are better.
h. Number of pills left in medicine bottles
- Central Tendency: Median (if skewed, as counts are often skewed).
- Dispersion: Range or IQR.
Counts can have variability and skewness, making the median and IQR appropriate.
Explanation (300 words):
The choice of measures of central tendency and dispersion depends on the type and distribution of the data. For continuous data (e.g., salaries, test scores, serum sodium levels, ages), the mean and standard deviation are suitable when the data is normally distributed. For skewed data, the median and IQR are better because they are less influenced by outliers. For ordinal data (e.g., disease stages), the median provides a meaningful center, while the range captures variability. For nominal data (e.g., presence of diarrhea), proportions or percentages are used, as these data lack an inherent numerical order. For count data, the median and IQR are often robust choices due to potential skewness. Always assess the data distribution before selecting appropriate measures.