{"id":220407,"date":"2025-05-28T07:59:40","date_gmt":"2025-05-28T07:59:40","guid":{"rendered":"https:\/\/learnexams.com\/blog\/?p=220407"},"modified":"2025-05-28T07:59:50","modified_gmt":"2025-05-28T07:59:50","slug":"the-goal-of-this-homework-is-to-help-you-better-understand-the-statistical-properties-and-computational-challenges-of-local-smoothing-such-as-loess-nadaraya-watson-nw-kernel-smoothing-and-spline-s","status":"publish","type":"post","link":"https:\/\/www.learnexams.com\/blog\/2025\/05\/28\/the-goal-of-this-homework-is-to-help-you-better-understand-the-statistical-properties-and-computational-challenges-of-local-smoothing-such-as-loess-nadaraya-watson-nw-kernel-smoothing-and-spline-s\/","title":{"rendered":"The goal of this homework is to help you better understand the statistical properties and computational challenges of local smoothing such as loess, Nadaraya-Watson (NW) kernel smoothing, and spline smoothing."},"content":{"rendered":"\n<p>ISyE 7406: Data Mining &amp; Statistical Learning HW#4 Local Smoothing in R. The goal of this homework is to help you better understand the statistical properties and computational challenges of local smoothing such as loess, Nadaraya-Watson (NW) kernel smoothing, and spline smoothing. For this purpose, we will compute empirical bias and empirical variances based on m = 1000 Monte Carlo runs, where in each run we simulate a data set of n = 101 observations from the additive model Yi = f(xi) + i (1) with the famous Mexican hat function f(x) = (1 &#8211; x\u00b2) exp(-0.5x\u00b2), -2p = x = 2p, (2) and 1, \u00b7 \u00b7 \u00b7 , n are independent and identically distributed (iid) N(0, 0.2\u00b2). This function is known to pose a variety of estimation challenges, and below we explore the difficulties inherent in this function. (1) Let us first consider the (deterministic fixed) design with equi-distant points in [-2p, 2p]. (a) For each of m = 1000 Monte Carlo runs, simulate or generate a data set of the form (xi, Yi) with xi = 2p(-1 + 2(i-1)\/(n-1)) and Yi is from the model in (1). Denote such data set as Dj at the j-th Monte Carlo run for j = 1, \u00b7 \u00b7 \u00b7 , m = 1000. (b) For each data set Dj or each Monte Carlo run, compute the three different kinds of local smoothing estimates at every point in Dj: loess (with span = 0.75), Nadaraya-Watson (NW) kernel smoothing with Gaussian Kernel and bandwidth = 0.2, and spline smoothing with the default tuning parameter. (c) At each point xi, for each local smoothing method, based on m = 1000 Monte Carlo runs, compute the empirical bias Bias{f(xi)} and the empirical variance Var{f(xi)}, where Bias{f(xi)} = (1\/m) S (from j=1 to m) \u02c6f(j)(xi) &#8211; f(xi), Var{f(xi)} = (1\/(m &#8211; 1)) S (from j=1 to m) \u02c6f(j)(xi) &#8211; f(xi)\u00b2. (d) Plot these quantities against xi for all three kinds of local smoothing estimators: loess, NW kernel, and spline smoothing. (e) Provide a thorough analysis of what the plots suggest, e.g., which method is better\/worse on bias, variance, and mean square error (MSE)? Do think whether it is a fair comparison between these three methods? Why or why not? (2) Repeat part (1) with another (deterministic) design that has non-equidistant points. The following R code can be used to generate the design points xi\u2019s (you can keep the xi\u2019s fixed in the m = 1000 Monte Carlo runs): 1 set.seed(79) x<\/p>\n\n\n\n<p><strong><mark style=\"background-color:rgba(0, 0, 0, 0)\" class=\"has-inline-color has-ast-global-color-1-color\">The Correct Answer and Explanation is:<\/mark><\/strong><\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h3 class=\"wp-block-heading\">\u2705 R Code: Local Smoothing Bias and Variance Analysis (Equidistant Design)<\/h3>\n\n\n\n<pre class=\"wp-block-code\"><code>set.seed(123)\nlibrary(stats)\nlibrary(KernSmooth)\nlibrary(splines)\n\n# Parameters\nm &lt;- 1000\nn &lt;- 101\nsigma &lt;- 0.2\nx_seq &lt;- seq(-2 * pi, 2 * pi, length.out = n)\nf_true &lt;- function(x) (1 - x^2) * exp(-0.5 * x^2)\nfx &lt;- f_true(x_seq)\n\n# Initialize storage\nloess_est &lt;- matrix(0, nrow = m, ncol = n)\nnw_est &lt;- matrix(0, nrow = m, ncol = n)\nspline_est &lt;- matrix(0, nrow = m, ncol = n)\n\n# Monte Carlo simulations\nfor (j in 1:m) {\n  epsilon &lt;- rnorm(n, 0, sigma)\n  y &lt;- fx + epsilon\n  \n  # Loess\n  loess_fit &lt;- loess(y ~ x_seq, span = 0.75)\n  loess_est&#91;j, ] &lt;- predict(loess_fit, x_seq)\n  \n  # NW kernel\n  nw_fit &lt;- ksmooth(x_seq, y, kernel = \"normal\", bandwidth = 0.2, x.points = x_seq)\n  nw_est&#91;j, ] &lt;- nw_fit$y\n  \n  # Spline smoothing\n  spline_fit &lt;- smooth.spline(x_seq, y)\n  spline_est&#91;j, ] &lt;- predict(spline_fit, x_seq)$y\n}\n\n# Compute bias and variance\nbias &lt;- function(est_matrix) colMeans(est_matrix) - fx\nvariance &lt;- function(est_matrix) apply(est_matrix, 2, var)\n\nbias_loess &lt;- bias(loess_est)\nbias_nw &lt;- bias(nw_est)\nbias_spline &lt;- bias(spline_est)\n\nvar_loess &lt;- variance(loess_est)\nvar_nw &lt;- variance(nw_est)\nvar_spline &lt;- variance(spline_est)\n\n# Plotting\npar(mfrow = c(3, 1), mar = c(4, 4, 2, 1))\nplot(x_seq, bias_loess, type = \"l\", col = \"blue\", ylim = range(bias_loess, bias_nw, bias_spline), ylab = \"Bias\", main = \"Empirical Bias\")\nlines(x_seq, bias_nw, col = \"red\")\nlines(x_seq, bias_spline, col = \"green\")\nlegend(\"topright\", legend = c(\"Loess\", \"NW\", \"Spline\"), col = c(\"blue\", \"red\", \"green\"), lty = 1)\n\nplot(x_seq, var_loess, type = \"l\", col = \"blue\", ylim = range(var_loess, var_nw, var_spline), ylab = \"Variance\", main = \"Empirical Variance\")\nlines(x_seq, var_nw, col = \"red\")\nlines(x_seq, var_spline, col = \"green\")\n\nmse_loess &lt;- bias_loess^2 + var_loess\nmse_nw &lt;- bias_nw^2 + var_nw\nmse_spline &lt;- bias_spline^2 + var_spline\nplot(x_seq, mse_loess, type = \"l\", col = \"blue\", ylim = range(mse_loess, mse_nw, mse_spline), ylab = \"MSE\", main = \"Mean Squared Error\")\nlines(x_seq, mse_nw, col = \"red\")\nlines(x_seq, mse_spline, col = \"green\")\n<\/code><\/pre>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h3 class=\"wp-block-heading\">Explanation<\/h3>\n\n\n\n<p>This experiment investigates the bias and variance of three local smoothing techniques\u2014Loess, Nadaraya-Watson (NW) kernel smoothing, and spline smoothing\u2014through 1000 Monte Carlo simulations based on a fixed equidistant design.<\/p>\n\n\n\n<p><strong>Loess Smoothing:<\/strong> Loess generally has moderate bias and low-to-moderate variance. Its performance is balanced across the domain. Due to the span of 0.75, the smoother adapts well to the general shape of the &#8220;Mexican hat&#8221; function, though it may struggle slightly with the steep regions.<\/p>\n\n\n\n<p><strong>NW Kernel Smoothing:<\/strong> The NW estimator with Gaussian kernel and small bandwidth (0.2) is highly localized, leading to <strong>low bias<\/strong> in high-curvature regions but <strong>higher variance<\/strong> due to reduced smoothing. Near the edges, its performance deteriorates (boundary bias), increasing both bias and variance.<\/p>\n\n\n\n<p><strong>Spline Smoothing:<\/strong> Spline smoothing shows <strong>very low variance<\/strong> due to its global fit nature, but tends to be <strong>biased<\/strong>, especially in regions with high curvature or inflection points. Since it uses global smoothing, it\u2019s less adaptive to localized features.<\/p>\n\n\n\n<p><strong>MSE (Mean Squared Error):<\/strong> Loess and spline estimators often outperform NW in MSE due to the variance-bias tradeoff. Loess usually provides the best balance, while NW is optimal only in regions requiring localized fitting but suffers from high variance. Spline, though biased, remains stable across the domain.<\/p>\n\n\n\n<p><strong>Fairness of Comparison:<\/strong> The comparison is mostly fair under the fixed design and equal noise level. However, each method has different tuning parameters (e.g., bandwidth vs. span), and their default settings are not necessarily optimal for this function, which slightly biases the comparison.<\/p>\n\n\n\n<p>In conclusion, Loess emerges as the best all-around smoother for this setup, balancing bias and variance effectively.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img decoding=\"async\" src=\"https:\/\/learnexams.com\/blog\/wp-content\/uploads\/2025\/05\/learnexams-banner8-56.jpeg\" alt=\"\" class=\"wp-image-220408\"\/><\/figure>\n","protected":false},"excerpt":{"rendered":"<p>ISyE 7406: Data Mining &amp; Statistical Learning HW#4 Local Smoothing in R. The goal of this homework is to help you better understand the statistical properties and computational challenges of local smoothing such as loess, Nadaraya-Watson (NW) kernel smoothing, and spline smoothing. For this purpose, we will compute empirical bias and empirical variances based on [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"site-sidebar-layout":"default","site-content-layout":"","ast-site-content-layout":"default","site-content-style":"default","site-sidebar-style":"default","ast-global-header-display":"","ast-banner-title-visibility":"","ast-main-header-display":"","ast-hfb-above-header-display":"","ast-hfb-below-header-display":"","ast-hfb-mobile-header-display":"","site-post-title":"","ast-breadcrumbs-content":"","ast-featured-img":"","footer-sml-layout":"","ast-disable-related-posts":"","theme-transparent-header-meta":"","adv-header-id-meta":"","stick-header-meta":"","header-above-stick-meta":"","header-main-stick-meta":"","header-below-stick-meta":"","astra-migrate-meta-layouts":"default","ast-page-background-enabled":"default","ast-page-background-meta":{"desktop":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"tablet":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"mobile":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""}},"ast-content-background-meta":{"desktop":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"tablet":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"mobile":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""}},"footnotes":""},"categories":[25],"tags":[],"class_list":["post-220407","post","type-post","status-publish","format-standard","hentry","category-exams-certification"],"_links":{"self":[{"href":"https:\/\/www.learnexams.com\/blog\/wp-json\/wp\/v2\/posts\/220407","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.learnexams.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.learnexams.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.learnexams.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.learnexams.com\/blog\/wp-json\/wp\/v2\/comments?post=220407"}],"version-history":[{"count":0,"href":"https:\/\/www.learnexams.com\/blog\/wp-json\/wp\/v2\/posts\/220407\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.learnexams.com\/blog\/wp-json\/wp\/v2\/media?parent=220407"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.learnexams.com\/blog\/wp-json\/wp\/v2\/categories?post=220407"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.learnexams.com\/blog\/wp-json\/wp\/v2\/tags?post=220407"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}