• wonderlic tests
  • EXAM REVIEW
  • NCCCO Examination
  • Summary
  • Class notes
  • QUESTIONS & ANSWERS
  • NCLEX EXAM
  • Exam (elaborations)
  • Study guide
  • Latest nclex materials
  • HESI EXAMS
  • EXAMS AND CERTIFICATIONS
  • HESI ENTRANCE EXAM
  • ATI EXAM
  • NR AND NUR Exams
  • Gizmos
  • PORTAGE LEARNING
  • Ihuman Case Study
  • LETRS
  • NURS EXAM
  • NSG Exam
  • Testbanks
  • Vsim
  • Latest WGU
  • AQA PAPERS AND MARK SCHEME
  • DMV
  • WGU EXAM
  • exam bundles
  • Study Material
  • Study Notes
  • Test Prep

Solutions Manual for Foundations of Statistics for Data

Testbanks Dec 30, 2025 ★★★★☆ (4.0/5)
Loading...

Loading document viewer...

Page 0 of 0

Document Text

3925 1 Solutions Manual for Foundations of Statistics for Data Scientists With R and Python, 1e by Alan Agresti, Maria Kater (All Chapters) Chapter 1 1.1 (a) (i) an individual voter, (ii) the 1882 voters in the exit poll, (iii) the 11.1 million people who voted (b)

Statistic: Sample percentage of 52.5% who voted for Feinstein

Parameter: Population percentage of 54.2% who voted for Feinstein

1.2 (a) Use a command such as in R, > Students <- read.table(" + header=TRUE) (b) (i) What proportion of the students in this sample responded yes for whether abortion should be legal in the first three months; (ii) Same question but for some population, such as all social science graduate students at the University of Florida 1.3 (a) Quantitative; (b) categorical; (c) categorical; (d) quantitative 1.4 (a) Religious affiffiliation (possible categories Christianity, Islam, Jewish, Hinduism, Buddhism, other, none) (b) Body/mass index (BMI = (weight in kg)/(height in meters) 2 (c) Number of children in family (d) Height of a person 1.5 Ordinal, because categories have natural ordering 1.6 (a) College board score (e.g., SAT between 200 and 800) (b) Time spent in college (measure by integer number of years) 1.7 In R, for students numbered 00001 to 52000, >

sample(1:52000, 10)

[1]

1687 18236 26783 35366 14244 11429 20973 31436 48476

1.8 (a) observational, (b) experiment (c) observational, (d) experiment 1.9 Median = 4, mode = 2, expect mean larger than median because distribution is skewed right 1.10 (a) 1 / 4

2Solutions Manual: Foundations of Statistical Science for Data Scientists

> Carbon <- read.table("http://stat4ds.rwth-aachen.de/data/Carbon_West.dat", +header=TRUE) > breaks <- seq(2.0, 18.0, by=2.0) > freq <- table(cut(Carbon$CO2, breaks, right=FALSE)) > cbind(freq, freq/nrow(Carbon)) freq

[2,4) 4 0.11428571

[4,6) 15 0.42857143

[6,8) 7 0.20000000

[8,10) 6 0.17142857

[10,12) 0 0.00000000

[12,14) 0 0.00000000

[14,16) 2 0.05714286

[16,18) 1 0.02857143

> hist(Carbon$CO2) (b) Mean = 6.72, median = 5.90, standard deviation = 3.36 mean(Carbon$CO2); median(Carbon$CO2); sd(Carbon$CO2) 1.11Skewed to the right, because the mean is much larger than the median.

1.12Number of times you went to a gym in the last week; median = 0 if more than half of persons in the sample never went.

1.13(a) 63,000 to 75,000; (b) 57,000 to 81,000; (c) 51,000 to 87,000. 100,000 would be unusualbecause it is more than 5 standard deviations above the mean.

1.14A quarter of the states had less that 6% without insurance, and a quarter had more than9.5% without insurance. Half the states had between 6% and 9.5% without insurance,encompassing an interquartile range of 3.5%.

1.15Skewed to the right, because distances of median from LQ and minimum are less than from UQ and maximum.

1.16(a)The percentages in 2018 (with the default composite weight) for (0, 1, 2, 3, 4, 5, 6,≥7) are (9.4, 24.8, 24.9, 14.8, 10.7, 5.3, 3.5, 6.7), somewhat skewed to the right.(b)Mode = 2, median = 2 (c)Mean = 2.8, standard deviation = 2.6. The lowest possible observation is onlyslightly more than a standard deviation below the mean, whereas in bell-shaped distributions, observations can occur two or three standard deviations from the mean in each direction.

1.17 > Murder <- read.table("http://stat4ds.rwth-aachen.de/data/Murder.dat", header=TRUE) > Murder1 <- Murder[Murder$state!="DC",] # data frame without D.C.(a)Mean = 4.87, standard deviation = 2.59 > mean(Murder1$murder); sd(Murder1$murder) (b)Minimum = 1.0, LQ = 2.6, median = 4.85, UQ = 6.2, maximum = 12.4, somewhatskewed right > summary(Murder1$murder); boxplot(Murder1$murder) (c)Repeat the analysis above forMurder1$murder. The DC is a large outlier, causing the mean to increase (from 4.87 to 5.25) and the range to increase dramatically(from 11.4 to 23.2).

1.18(a)Histogram is skewed right. 2 / 4

Solutions Manual: Foundations of Statistical Science for Data Scientists 3

> Income <- read.table("http://stat4ds.rwth-aachen.de/data/Income.dat", +header=TRUE); attach(Income) > hist(income) (b)Five number summary is min. = 16, lower quartile = 22, median = 30, upper quartile = 465, max. = 120; also mean = 37.52 and standard deviation = 20.67.> summary(income); sd(income) (c)Density approximation with default bandwidth = 6.85 is skewed right. Increasingthe bandwidth (such as to 12) makes the curve smoother and bell-shaped, but stillskewed. Decreasing it (such as to 3) makes it much bumpier and probably a poorerportrayal of a corresponding population distribution.> plot(density(income)) # default bandwidth = 6.85 > plot(density(income, bw=12)) (d) > boxplot(income ~ race, xlab="Income", horizontal=TRUE) > tapply(income, race, summary) $B Min. 1st Qu. Median Mean 3rd Qu. Max.

16.00 19.50 24.00 27.75 31.00 66.00

$H Min. 1st Qu. Median Mean 3rd Qu. Max.

16.0 20.5 30.0 31.0 32.0 58.0

$W Min. 1st Qu. Median Mean 3rd Qu. Max.

18.00 24.00 37.00 42.48 50.00 120.00

> install.packages("tidyverse") > library(tidyverse) > Income %>% group_by(race) %>% summarize(n=n(),mean=mean(income),sd=sd(income)) race n mean sd

1 B 16 27.8 13.3

2 H 14 31 12.8

3 W 50 42.5 22.9

1.19(a)Highly skewed right > Houses <- read.table("http://stat4ds.rwth-aachen.de/data/Houses.dat", +header=TRUE); attach(Houses) > PriceH <- hist(price); hist(price) # save histogram to use its breaks > breaks <- PriceH$breaks# breaks used in histogram > freq <- table(cut(Houses$price,breaks, right=FALSE)) > cbind(freq,freq/nrow(Houses))# frequency table (not shown) (b)y= 233.0,s= 151.9; 85%, not close to 68% because not bell-shaped but highly skewed > length(case[mean(price)-sd(price)

  • nrow(Houses)
  • (c)The boxplot shows many large observations that are outliers.> boxplot(price) (d) > tapply(Houses$price, Houses$new, summary) $`0` Min. 1st Qu. Median Mean 3rd Qu. Max.

31.5 135.0 190.8 207.9 240.0 880.5

$`1` Min. 1st Qu. Median Mean 3rd Qu. Max.

158.8 256.9 427.5 436.4 519.7 866.2

New homes tend to have higher selling prices.

1.20(a)Clear trend that price tends to increase as size increases. 3 / 4

4Solutions Manual: Foundations of Statistical Science for Data Scientists

> plot(size, price) (b)0.834, strong positive association > cor(size, price) (c)Predicted price =−76.39 + 0.19(size), which is 113.5 thousand dollars at 1000 square feet and 683.2 thousand dollars at 4000 square feet.

> summary(lm(price ~ size)) # linear model: read the coefficients estimates

> pred <- function(x){-76.3894+0.1899*x}; pred(1000); pred(4000) 1.21Correlation = 0.278 (positive but weak), predicted college GPA is 2.75 + 0.22(high school GPA), which is 3.6 for high school GPA of 4.0.

1.22 > Happy <- read.table("http://stat4ds.rwth-aachen.de/data/Happy.dat", header=TRUE) > Happiness <- factor(Happy$happiness); Marital <- factor(Happy$marital) > levels(Happiness) <- c("Very happy", "Pretty happy", "Not too happy") > levels(Marital) <- c("Married", "Divorced/Separated", "Never married") > table(Marital, Happiness)# forms contingency table Happiness MaritalVery happy Pretty happy Not too happy Married432 504 61 Divorced/Separated 92 282 103 Never married124 409 135 > prop.table(table(Marital,Happiness), 1) Happiness MaritalVery happy Pretty happy Not too happy Married 0.43329990 0.50551655 0.06118355 Divorced/Separated 0.19287212 0.59119497 0.21593291 Never married 0.18562874 0.61227545 0.20209581 Married subjects are more likely to be very happy and less likely to be not too happythan the other subjects.

1.23 > attach(Students) > table(relig, abor) abor relig 0 1

0 1 14

1 4 25

  • 1 6
  • 7 2
  • The very religious (attending every week) are less likely to support legal abortion (only2 of the 9 observations in support).

    1.24(a)Values are skewed right, with mean 153.9 and median 119.8 and a very high outlierof 716 for the U.S.(b)0.90 between GDP and HDI.(c)correlation = 0.674, predicted CO2 = 1.926 + 0.178(GDP), which increases dramat-ically between 2.71 at the minimum GDP = 4.4 and 13.11 at the maximum.GDP= 62.9.

    1.25 > Races <- read.table("http://stat4ds.rwth-aachen.de/data/ScotsRaces.dat", header=TRUE) > attach(Races) > par(mfrow=c(2,2))# a matrix of 2x2 plots in one graph > boxplot(timeM); boxplot(timeW) > hist(timeM); hist(timeW) > summary(timeM) Min. 1st Qu. Median Mean 3rd Qu. Max.

    15.10 47.63 67.17 84.88 113.91 439.15

  • / 4

User Reviews

★★★★☆ (4.0/5 based on 1 reviews)
Login to Review
S
Student
May 21, 2025
★★★★☆

I was amazed by the practical examples in this document. It made learning easy. Truly remarkable!

Download Document

Buy This Document

$1.00 One-time purchase
Buy Now
  • Full access to this document
  • Download anytime
  • No expiration

Document Information

Category: Testbanks
Added: Dec 30, 2025
Description:

Solutions Manual for Foundations of Statistics for Data Scientists With R and Python, 1e by Alan Agresti, Maria Kater (All Chapters) Chapter 1.1 (a) (i) an individual voter, (ii) the 1882 voters in...

Unlock Now
$ 1.00