General Linear Model — 3 — Residual Analysis

Sandeep Sharma
3 min readMay 30, 2022

--

A residual is a measure of how far away a point is vertically from the regression line. In simple words, difference between actual and predicted value.

Residual is a diagnostic measure used when assessing the quality of a model. They are also known as errors.

We can check the assumptions of regression by examining the residuals
Examine for linearity assumption (Linearity)
Evaluate independence assumption (
Independence of errors)
Evaluate normal distribution assumption (
Normality of errors)
Examine for constant variance for all levels of X (
homoscedasticity/ Equal variance)

Linearity: The relationship between x and y must be linear. Check this assumption by examining a scatterplot of x and y.

Independence of errors: There should not a relationship between the residuals and the predicted values. We can check this assumption by examining a scatterplot of “residuals versus fits”. The correlation should be approximately 0.

Normality of errors: The residuals must be approximately normally distributed. We can check this assumption by examining a normal probability plot.

Equal variances: The variance of the residuals should be consistent across all predicted values. We can check this assumption by examining the scatterplot of “residuals versus fits”.

Residual Analysis for Linearity

A residual plot is a graph that shows the residuals on the vertical axis and the independent variable on the horizontal axis. If the points in a residual plot are randomly dispersed around the horizontal axis, a linear regression model is appropriate for the data; otherwise, a nonlinear model is more appropriate.

Linear and Non-Linear

Residual Analysis for Independence

we can plot the residuals vs. the sequential number of the data point. If we notice a pattern, we say that there is an autocorrelation effect among the residuals and the independence assumption is not valid.

Residual Analysis for Independence — Another Method — Durbin Watson (D-W) Statistics

It’s the sum of the squares of the differences between consecutive errors divided by the sum of the squares of all errors.

Another way to look at the Durbin-Watson Statistic is: D = 2(1-ρ)
where ρ = the correlation between consecutive errors.

There are 3 important values for D:
D=0: This means that ρ=1, indicating a positive correlation.
D=2: In this case, ρ=0, indicating no correlation.
D=4: ρ=-1, indicating a negative correlation

Residual Analysis for Equal Variance

  1. White test statistics
  2. Residual vs fitted value plot

Checking for Normality

  1. Examine the Stem-and-Leaf Display of the Residuals.
  2. Examine the Boxplot of the Residuals
  3. Examine the Histogram of the Residuals
  4. Construct a Normal Probability Plot of the Residuals

Residual Analysis for Normality

When using a normal probability plot, normal errors will approximately display in a straight line.

Thank you for reading. Links to other blogs: —

General Linear Model — 2
General and Generalized Linear Models
The Poisson Distribution
Uniform Distribution
Normal Distribution
Binomial Distribution
10 alternatives for Cloud based Jupyter notebook!!

--

--

Sandeep Sharma
Sandeep Sharma

Written by Sandeep Sharma

Manager Data Science — Coffee Lover — Machine Learning — Statistics — Management Consultant — Product Management — Business Analyst

No responses yet