Lift and Cumulative lift in statistics, Machine Learning and Python

6 min readOct 22, 2023

In previous articles, we delved into the significance of statistics in machine learning, focusing on the K-S statistic. In this blog, we’ll explore the concepts of LIFT and cumulative lift and their importance. First, let’s clarify what LIFT and Cumulative lift actually are.

Lift is a measure of the effectiveness of a predictive model calculated as the ratio between the results obtained with and without the predictive model. The idea is to evaluate how much better one can expect to do with the predictive model compared to without it.

LIFT

For clarity: “LIFT” does not have a “full form” in this context. Instead, the term itself describes the metric’s purpose, which is to measure the “lift” or improvement that a model provides over a no-model (random) approach.

- Lift highlights the degree to which an association rule surpasses mere random coincidence of items A and B.
- It informs us about the altered likelihood of encountering item B when item A is present.
- When lift values exceed 1.0, it suggests that if a transaction includes item B, it’s more likely to also have item A than transactions without item A

Imagine you have a magic box that helps you find your favorite toys faster than searching randomly. Without the box, maybe you find 1 toy in 10 minutes. With the magic box, you find 5 toys in the same time. So, with the box, you’re finding toys 5 times faster! That’s what “lift” is like — it helps things be faster or better!

The range of lift can be:

Less than 1: This means your method is worse than just guessing. It’s like if the magic box made you find fewer toys.
Equal to 1: This means your method is just as good as random guessing. The magic box doesn’t help, but it doesn’t hurt either.
Greater than 1: This is the good part! The higher the number, the better your method is. If the lift is 2, it’s like finding toys twice as fast. If it’s 3, three times as fast, and so on.

So, a higher lift score is better! You’d want a magic box with the highest lift to find toys super quickly!

Cumulative Lift

Cumulative lift is an extension of the lift metric that is assessed over a series of increasing data fractions. Essentially, as you consider more and more of the data (typically sorted by the model’s predicted probabilities), how does the lift value accumulate?

A common way to visualize cumulative lift is through a lift chart:

1. Rank data points based on the predicted probability from the highest to the lowest.
2. Split the data into deciles (or another fraction, like quintiles).
3. For each decile, compute the lift.
4. Plot these lift values cumulatively across the deciles.

Cumulative lift is calculated by evaluating the model’s performance over progressively larger portions of the dataset, which has been sorted based on the model’s predicted probabilities or scores. Here’s a step-by-step method to calculate cumulative lift:

Predict Probabilities or Scores: Use your model to predict the probabilities or scores for each instance in your validation/test set.
Sort Data: Sort your dataset in descending order based on these predicted probabilities or scores.
Divide Data into Bins: Divide the sorted dataset into equal-sized bins (or deciles, quintiles, etc.). The top bin should contain the instances with the highest predicted probabilities or scores.
Calculate Cumulative Lift for Each Bin

Now lets Jump to python code:

Lift for Regression:

import numpy as np

def regression_lift(y_true, y_pred):
    """
    Compute the lift for regression.
    
    :param y_true: Array-like, true target values.
    :param y_pred: Array-like, predicted values by the model.
    :return: Lift value.
    """
    baseline_prediction = np.mean(y_true)
    mean_model_prediction = np.mean(y_pred)
    return mean_model_prediction / baseline_prediction

y_true = [100, 200, 300, 400, 500]
y_pred = [110, 210, 310, 410, 510]  # Let's say our model just adds 10 to each true value

print(regression_lift(y_true, y_pred))

Output --> 1.0333333333333334

Lift python code for Classification

import numpy as np

def classification_lift(y_true, y_pred):
    """
    Compute the lift for classification.
    
    :param y_true: Array-like, true binary labels (0 or 1).
    :param y_pred: Array-like, predicted binary labels (0 or 1).
    :return: Lift value.
    """
    # Percentage of actual positives in the entire dataset
    baseline = np.mean(y_true)
    
    # Indices where the prediction is positive
    positive_pred_indices = np.where(y_pred == 1)
    
    # Percentage of actual positives in the predicted positive group
    model_success_rate = np.mean(y_true[positive_pred_indices])
    
    return model_success_rate / baseline

y_true = np.array([0, 1, 1, 0, 1, 0, 1, 0, 0, 0])
y_pred = np.array([0, 1, 1, 0, 1, 0, 0, 0, 1, 0])  # Model has predicted 1 where the actual is 1, but also has one false positive

print(classification_lift(y_true, y_pred))

# output --> 1.875

Cumulative lift for regression

import numpy as np

def cumulative_regression_lift(y_true, y_pred):
    """
    Compute the cumulative lift for regression.
    
    :param y_true: Array-like, true target values.
    :param y_pred: Array-like, predicted values by the model.
    :return: List of cumulative lift values.
    """
    # Sort y_true and y_pred by y_pred values
    sorted_indices = np.argsort(y_pred)
    y_true_sorted = y_true[sorted_indices]
    y_pred_sorted = y_pred[sorted_indices]
    
    cumulative_lifts = []
    for i in range(1, len(y_true) + 1):
        mean_true_up_to_i = np.mean(y_true_sorted[:i])
        mean_pred_up_to_i = np.mean(y_pred_sorted[:i])
        
        lift = mean_pred_up_to_i / mean_true_up_to_i if mean_true_up_to_i != 0 else 0
        cumulative_lifts.append(lift)
        
    return cumulative_lifts

y_true = np.array([100, 200, 300, 400, 500])
y_pred = np.array([110, 190, 290, 420, 510])  # Some values are overestimations, some are underestimations

cumulative_lift_values = cumulative_regression_lift(y_true, y_pred)
print(cumulative_lift_values)

# output --> [1.1, 1.0, 0.9833333333333333, 1.01, 1.0133333333333334]

Cumulative lift for Classification

import numpy as np

def cumulative_classification_lift(y_true, y_prob):
    """
    Compute the cumulative lift for classification.
    
    :param y_true: Array-like, true binary labels (0 or 1).
    :param y_prob: Array-like, predicted probabilities for the positive class.
    :return: List of cumulative lift values.
    """
    # Sort y_true based on predicted probabilities in descending order
    sorted_indices = np.argsort(y_prob)[::-1]
    y_true_sorted = y_true[sorted_indices]
    
    n = len(y_true)
    cumulative_positive = np.cumsum(y_true_sorted)
    
    # Calculate cumulative percentage of actual positives
    cumulative_positive_percentage = cumulative_positive / np.arange(1, n + 1)
    
    # Overall percentage of actual positives in the dataset
    overall_positive_percentage = np.sum(y_true) / n
    
    cumulative_lifts = cumulative_positive_percentage / overall_positive_percentage
    
    return cumulative_lifts

y_true = np.array([0, 1, 1, 0, 1, 0, 1, 0, 0, 0])
y_prob = np.array([0.2, 0.9, 0.8, 0.1, 0.85, 0.3, 0.7, 0.4, 0.5, 0.2])  # Predicted probabilities for the positive class

cumulative_lift_values = cumulative_classification_lift(y_true, y_prob)
print(cumulative_lift_values)

# output --> [2.5        2.5        2.5        2.5        2.         1.66666667
 1.42857143 1.25       1.11111111 1.        ]

Visualization code

plt.figure(figsize=(8, 6))
plt.plot(np.arange(1, len(y_true) + 1) / len(y_true), cumulative_lift_values, marker='o', linestyle='-', color='b')
plt.axhline(y=1, color='red', linestyle='--')
plt.xlabel('Proportion of Samples')
plt.ylabel('Cumulative Lift')
plt.title('Cumulative Lift Curve')
plt.legend(['Model', 'Random'])
plt.grid(True)
plt.show()

Thank you for reading. Links to other blogs: —

Kolmogorov-Smirnov (K-S) statistic
Statistics importance in Regression Modeling
First order and Second order — Calculus
Statistical Inference 2 — Hypothesis Testing
Statistical Inference
Hessian Matrix
First order and Second order — Calculus
Statistical Inference 2 — Hypothesis Testing
Statistical Inference
Central Limit Theorem — Statistics

Lift and Cumulative lift in statistics, Machine Learning and Python

LIFT

Cumulative Lift

Now lets Jump to python code:

Thank you for reading. Links to other blogs: —

Written by Sandeep Sharma