{"id":237508,"date":"2025-06-17T09:44:17","date_gmt":"2025-06-17T09:44:17","guid":{"rendered":"https:\/\/learnexams.com\/blog\/?p=237508"},"modified":"2025-06-17T09:44:18","modified_gmt":"2025-06-17T09:44:18","slug":"load-the-baseball-dataset-from-the-islr-package-as-follows-libraryislr-datahitters","status":"publish","type":"post","link":"https:\/\/www.learnexams.com\/blog\/2025\/06\/17\/load-the-baseball-dataset-from-the-islr-package-as-follows-libraryislr-datahitters\/","title":{"rendered":"Load the baseball dataset from the ISLR package as follows: library(ISLR) data(Hitters)"},"content":{"rendered":"\n<p>Load the baseball dataset from the ISLR package as follows: library(ISLR) data(Hitters) \u2022 What is contained within the hitters dataset? Provide a brief description of the types of variables in the data (you don&#8217;t need to describe each variable individually, give a broad view of what the data tells you). \u2022 Produce a linear model, with salary being the response variable, based on what you know (or think you know!) about baseball. Use your intuition, your \\&#8221;gut\\&#8221; feeling, or just make-up a random model! \u2022 Write a backward stepwise selection function that will produce<br>models from the<br>variables in the hitters dataset. The function should identify the \\&#8221;best\\&#8221; linear model to use for each of the following criteria: Akaike information criterion, Bayesian information criterion, and adjusted-<br>. \u2022 How do each of the models compare to each other in terms of coefficient values? Are they relatively similar, or very different? How do these models compare to the linear model you came up with on your own?<\/p>\n\n\n\n<p><strong><mark style=\"background-color:rgba(0, 0, 0, 0)\" class=\"has-inline-color has-ast-global-color-1-color\">The Correct Answer and Explanation is:<\/mark><\/strong><\/p>\n\n\n\n<p>The <code>Hitters<\/code> dataset from the <strong>ISLR<\/strong> package contains information on 322 Major League Baseball players from the 1986 and 1987 seasons. It includes various player statistics and characteristics, such as the number of hits, home runs, years of experience, runs batted in, and salary. Broadly, the dataset has a mix of categorical variables (e.g., <code>League<\/code>, <code>Division<\/code>, <code>NewLeague<\/code>) and numeric variables (e.g., <code>AtBat<\/code>, <code>Hits<\/code>, <code>HmRun<\/code>, <code>RBI<\/code>, <code>Salary<\/code>). The primary goal is often to predict <code>Salary<\/code> based on these performance statistics.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">1. Linear Model (Intuition-Based)<\/h3>\n\n\n\n<p>Let\u2019s build a basic linear model using some variables that intuitively relate to salary:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">rCopyEdit<code>library(ISLR)\ndata(Hitters)\nHitters &lt;- na.omit(Hitters)\n\nmodel_intuitive &lt;- lm(Salary ~ Hits + HmRun + RBI + Walks + Years + CAtBat, data = Hitters)\nsummary(model_intuitive)\n<\/code><\/pre>\n\n\n\n<p>This model includes offensive performance stats (Hits, Home Runs, RBIs), experience (Years), and total career at-bats.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">2. Backward Stepwise Selection Function<\/h3>\n\n\n\n<p>We define a function that performs backward stepwise selection and compares three criteria: AIC, BIC, and adjusted R\u00b2.<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">rCopyEdit<code>backward_selection &lt;- function(data, response) {\n  full_model &lt;- lm(as.formula(paste(response, \"~ .\")), data = data)\n  null_model &lt;- lm(as.formula(paste(response, \"~ 1\")), data = data)\n\n  aic_model &lt;- step(full_model, direction = \"backward\", trace = 0)\n  bic_model &lt;- step(full_model, direction = \"backward\", k = log(nrow(data)), trace = 0)\n\n  step_model &lt;- step(full_model, direction = \"backward\", trace = 0)\n  best_adj_r2 &lt;- step_model\n  best_adj_r2_adj &lt;- summary(best_adj_r2)$adj.r.squared\n\n  return(list(\n    AIC_Model = aic_model,\n    BIC_Model = bic_model,\n    AdjR2_Model = best_adj_r2\n  ))\n}\n\nmodels &lt;- backward_selection(Hitters, \"Salary\")\n<\/code><\/pre>\n\n\n\n<h3 class=\"wp-block-heading\">3. Comparing the Models<\/h3>\n\n\n\n<p>Each model includes a different subset of predictors. The AIC-based model may retain more variables, while the BIC-based model tends to be more conservative and might include fewer. The adjusted R\u00b2 model balances fit and simplicity.<\/p>\n\n\n\n<p>By comparing their coefficient values:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">rCopyEdit<code>coef(models$AIC_Model)\ncoef(models$BIC_Model)\ncoef(models$AdjR2_Model)\n<\/code><\/pre>\n\n\n\n<p>We see that while some coefficients are similar, there are differences in magnitude and which predictors are included. Compared to the initial intuitive model, the stepwise models are often more refined and statistically justified, though not always drastically different if the intuition was solid.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img decoding=\"async\" src=\"https:\/\/learnexams.com\/blog\/wp-content\/uploads\/2025\/06\/learnexams-banner8-856.jpeg\" alt=\"\" class=\"wp-image-237509\"\/><\/figure>\n","protected":false},"excerpt":{"rendered":"<p>Load the baseball dataset from the ISLR package as follows: library(ISLR) data(Hitters) \u2022 What is contained within the hitters dataset? Provide a brief description of the types of variables in the data (you don&#8217;t need to describe each variable individually, give a broad view of what the data tells you). \u2022 Produce a linear model, [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"site-sidebar-layout":"default","site-content-layout":"","ast-site-content-layout":"default","site-content-style":"default","site-sidebar-style":"default","ast-global-header-display":"","ast-banner-title-visibility":"","ast-main-header-display":"","ast-hfb-above-header-display":"","ast-hfb-below-header-display":"","ast-hfb-mobile-header-display":"","site-post-title":"","ast-breadcrumbs-content":"","ast-featured-img":"","footer-sml-layout":"","ast-disable-related-posts":"","theme-transparent-header-meta":"","adv-header-id-meta":"","stick-header-meta":"","header-above-stick-meta":"","header-main-stick-meta":"","header-below-stick-meta":"","astra-migrate-meta-layouts":"default","ast-page-background-enabled":"default","ast-page-background-meta":{"desktop":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"tablet":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"mobile":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""}},"ast-content-background-meta":{"desktop":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"tablet":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"mobile":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""}},"footnotes":""},"categories":[25],"tags":[],"class_list":["post-237508","post","type-post","status-publish","format-standard","hentry","category-exams-certification"],"_links":{"self":[{"href":"https:\/\/www.learnexams.com\/blog\/wp-json\/wp\/v2\/posts\/237508","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.learnexams.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.learnexams.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.learnexams.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.learnexams.com\/blog\/wp-json\/wp\/v2\/comments?post=237508"}],"version-history":[{"count":0,"href":"https:\/\/www.learnexams.com\/blog\/wp-json\/wp\/v2\/posts\/237508\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.learnexams.com\/blog\/wp-json\/wp\/v2\/media?parent=237508"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.learnexams.com\/blog\/wp-json\/wp\/v2\/categories?post=237508"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.learnexams.com\/blog\/wp-json\/wp\/v2\/tags?post=237508"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}