Non Parametric Regression

Sandeep Sharma
4 min readJun 18, 2022

--

Non Parametric Regression is a form of regression analysis in which none of the predictors take predetermined forms with the response but are constructed according to information derived from the data.

Non Parametric Regression is used for prediction and is reliable even if hypothesis of linear regression is not verified.

It can be used when we are mainly interested in Predictive quality and not on structure of regression models.

Difference between parametric and non-parametric regressions

A non-parametric algorithm is computationally slower but takes fewer assumptions about the data. Parametric methods assume a form for the model.

Parametric Models — linear regression, Polynomial Regression, logistic regression, linear discriminant analysis etc.

Non-parametric approaches - Decision trees and neural networks etc.

Example: — Assume that we have a response variable Y and two explanatory variables, x1 and x2. General regression model can be written as

Y = f1(x1) + f2(x2) + e

If we don't know f1 and f2 functions, we need to use a Non-parametric regression model.

​The aim of a regression analysis is to produce a reasonable analysis to the unknown response function m, where for n data points ( ), the relationship can be modeled as

​Unlike parametric approach where the function m is fully described by a finite set of parameters, nonparametric modeling accommodate a very flexible form of the regression curve.

Motivation

  1. It provides a versatile method of exploring a general relationship between variables.
  2. It gives predictions of observations yet to be made without reference to a fixed parametric model.
  3. It provides a tool for finding spurious observations by studying the influence of isolated points.
  4. It constitutes a flexible method of substituting for missing values or interpolating between adjacent X-values.

Estimation Method

  1. Kernel Regression
  2. Loess Regression

Kernel Regression — Kernel regression is a non parametric technique in statistics to estimate the conditional expectations of a random variable. The objective is to find a non-linear relation between a pair of random variables X and Y.

In any nonparametric regression, the conditional expectation of a variable Y relative to a variable X may be written as:

  1. Kernel regression is a modeling technique which belongs to family of smoothing methods.
  2. Unlike linear regression which is both used to explain the phenomena and for prediction ,kernel regression is mostly used for prediction.
  3. The structure of model is variable and complex.

​Nadaraya-Watson Kernel regression

Nadaraya and Watson 1964 proposed a method to estimate f(x0) at a given value x0 as a locally weighted average of all y’s associated to the values around x.

k is a kernel function (weight function) with a bandwidth h
Some popular choice of Kernel Functions

How to choose the bandwidth

Rule of thumb: If we use gaussian then it can be shown that the optimal choice for h is.

sigma hat sign is for standard deviation of the samples

Loess Regression — ​locally estimated scatterplot smoothing

​Loess regression is a nonparametric technique that uses local weighted regression to fit a smooth curve through points in a scatter plot.​

It combine multiple regression models in a k-nearest-neighbor-based meta-mode.

​Loess curves are can reveal trends and cycles in data that might be difficult to model with a parametric curve.

​LOESS combines much of the simplicity of linear least squares regression with the flexibility of nonlinear regression.

​It does this by fitting simple models to localized subsets of the data to build up a function that describes the variation in the data, point by point.

A linear function is fitted only on a local set of points delimited by a region, using weighted least squares.

  1. More weights to points near the target point x0 whose response is being estimated.
  2. Less weight to points further away

Thank you for reading. Links to other blogs: —

Central Limit Theorem — Statistics
General Linear Model — 2
General and Generalized Linear Models
The Poisson Distribution
Uniform Distribution
Normal Distribution
Binomial Distribution
10 alternatives for Cloud based Jupyter notebook!!

--

--

Sandeep Sharma

Manager Data Science — Coffee Lover — Machine Learning — Statistics — Management Consultant — Product Management — Business Analyst