Probability Distributions 4— The Geometric and Exponential Distributions

Sandeep Sharma
3 min readApr 5, 2022

The geometric and exponential distributions model the time it takes for an event to occur. The geometric distribution is discrete and models the number of trials it takes to achieve a success in repeated experiments with a given probability of success. The exponential distribution is a continuous analog of the geometric distribution and models the amount of time you have to wait before an event occurs given a certain occurrence rate.

Geometric
Exponential
random.seed(12)

flips_till_heads = stats.geom.rvs(size=10000, # Generate geometric data
p=0.5) # With success prob 0.5

# Print table of counts
print( pd.crosstab(index="counts", columns= flips_till_heads))

# Plot histogram
pd.DataFrame(flips_till_heads).hist(range=(-0.5,max(flips_till_heads)+0.5)
, bins=max(flips_till_heads)+1);

Result: -

col_0     1     2     3    4    5    6   7   8   9   10  11  12  13
row_0
counts 5046 2458 1240 640 304 172 70 34 14 12 5 4 1

The distribution looks similar to what we’d expect: it is very likely to get a heads in 1 or 2 flips, while it is very unlikely for it to take more than 5 flips to get a heads. In the 10,000 trails we generated, the longest it took to get a heads was 13 flips.

Let’s use cdf() to check the probability of needing 6 flips or more to get a success:

first_five = stats.geom.cdf(k=5,   # Prob of success in first 5 flips
p=0.5)

1 - first_five

Out[]: 0.03125

Use pmf() to check the probability of seeing a specific number of flips before a successes:

stats.geom.pmf(k=2,   # Prob of needing exactly 2 flips to get first success
p=0.5)

Out[]: 0.25

The scipy name for the exponential distribution is “expon”. Let’s investigate the exponential distribution:

# Get the probability of waiting more than 1 time unit before a success

prob_1 = stats.expon.cdf(x=1,
scale=1) # Arrival rate

1 - prob_1

Out[]: 0.36787944117144233

Note: The average arrival time for the exponential distribution is equal to 1/arrival_rate.

plt.fill_between(x=np.arange(0,1,0.01), 
y1= stats.expon.pdf(np.arange(0,1,0.01)) ,
facecolor='blue',
alpha=0.35)

plt.fill_between(x=np.arange(1,7,0.01),
y1= stats.expon.pdf(np.arange(1,7,0.01)) ,
facecolor='red',
alpha=0.35)


plt.text(x=0.3, y=0.2, s= round(prob_1,3))
plt.text(x=1.5, y=0.08, s= round(1 - prob_1,3));

Similar to the geometric distribution, the exponential starts high and has a long tail that trails off to the right that contains rare cases where you have to wait much longer than average for an arrival.

Links to some other blogs: —
Uniform Distribution
Normal Distribution
Binomial Distribution
Central Limit Theorem
10 alternatives for Cloud based Jupyter notebook!!

--

--

Sandeep Sharma

Manager Data Science — Coffee Lover — Machine Learning — Statistics — Management Consultant — Product Management — Business Analyst