General Linear Model — 3 — Residual Analysis

3 min readMay 30, 2022

A residual is a measure of how far away a point is vertically from the regression line. In simple words, difference between actual and predicted value.

Residual is a diagnostic measure used when assessing the quality of a model. They are also known as errors.

We can check the assumptions of regression by examining the residuals
Examine for linearity assumption (Linearity)
Evaluate independence assumption (Independence of errors)
Evaluate normal distribution assumption (Normality of errors)
Examine for constant variance for all levels of X (homoscedasticity/ Equal variance)

Linearity: The relationship between x and y must be linear. Check this assumption by examining a scatterplot of x and y.
Independence of errors: There should not a relationship between the residuals and the predicted values. We can check this assumption by examining a scatterplot of “residuals versus fits”. The correlation should be approximately 0.
Normality of errors: The residuals must be approximately normally distributed. We can check this assumption by examining a normal probability plot.
Equal variances: The variance of the residuals should be consistent across all predicted values. We can check this assumption by examining the scatterplot of “residuals versus fits”.

Residual Analysis for Linearity

A residual plot is a graph that shows the residuals on the vertical axis and the independent variable on the horizontal axis. If the points in a residual plot are randomly dispersed around the horizontal axis, a linear regression model is appropriate for the data; otherwise, a nonlinear model is more appropriate.

Residual Analysis for Independence

we can plot the residuals vs. the sequential number of the data point. If we notice a pattern, we say that there is an autocorrelation effect among the residuals and the independence assumption is not valid.

Residual Analysis for Independence — Another Method — Durbin Watson (D-W) Statistics

It’s the sum of the squares of the differences between consecutive errors divided by the sum of the squares of all errors.

Another way to look at the Durbin-Watson Statistic is: D = 2(1-ρ)
where ρ = the correlation between consecutive errors.

There are 3 important values for D:
D=0: This means that ρ=1, indicating a positive correlation.
D=2: In this case, ρ=0, indicating no correlation.
D=4: ρ=-1, indicating a negative correlation