Probability Distributions 4— The Geometric and Exponential Distributions

Sandeep Sharma
3 min readApr 5, 2022

--

The geometric and exponential distributions model the time it takes for an event to occur. The geometric distribution is discrete and models the number of trials it takes to achieve a success in repeated experiments with a given probability of success. The exponential distribution is a continuous analog of the geometric distribution and models the amount of time you have to wait before an event occurs given a certain occurrence rate.

Geometric
Exponential
random.seed(12)

flips_till_heads = stats.geom.rvs(size=10000, # Generate geometric data
p=0.5) # With success prob 0.5

# Print table of counts
print( pd.crosstab(index="counts", columns= flips_till_heads))

# Plot histogram
pd.DataFrame(flips_till_heads).hist(range=(-0.5,max(flips_till_heads)+0.5)
, bins=max(flips_till_heads)+1);

Result: -

col_0     1     2     3    4    5    6   7   8   9   10  11  12  13
row_0
counts 5046 2458 1240 640 304 172 70 34 14 12 5 4 1

The distribution looks similar to what we’d expect: it is very likely to get a heads in 1 or 2 flips, while it is very unlikely for it to take more than 5 flips to get a heads. In the 10,000 trails we generated, the longest it took to get a heads was 13 flips.

Let’s use cdf() to check the probability of needing 6 flips or more to get a success:

first_five = stats.geom.cdf(k=5,   # Prob of success in first 5 flips
p=0.5)

1 - first_five

Out[]: 0.03125

Use pmf() to check the probability of seeing a specific number of flips before a successes:

stats.geom.pmf(k=2,   # Prob of needing exactly 2 flips to get first success
p=0.5)

Out[]: 0.25

The scipy name for the exponential distribution is “expon”. Let’s investigate the exponential distribution:

# Get the probability of waiting more than 1 time unit before a success

prob_1 = stats.expon.cdf(x=1,
scale=1) # Arrival rate

1 - prob_1

Out[]: 0.36787944117144233

Note: The average arrival time for the exponential distribution is equal to 1/arrival_rate.

plt.fill_between(x=np.arange(0,1,0.01), 
y1= stats.expon.pdf(np.arange(0,1,0.01)) ,
facecolor='blue',
alpha=0.35)

plt.fill_between(x=np.arange(1,7,0.01),
y1= stats.expon.pdf(np.arange(1,7,0.01)) ,
facecolor='red',
alpha=0.35)


plt.text(x=0.3, y=0.2, s= round(prob_1,3))
plt.text(x=1.5, y=0.08, s= round(1 - prob_1,3));

Similar to the geometric distribution, the exponential starts high and has a long tail that trails off to the right that contains rare cases where you have to wait much longer than average for an arrival.

Links to some other blogs: —
Uniform Distribution
Normal Distribution
Binomial Distribution
Central Limit Theorem
10 alternatives for Cloud based Jupyter notebook!!

--

--

Sandeep Sharma
Sandeep Sharma

Written by Sandeep Sharma

Manager Data Science — Coffee Lover — Machine Learning — Statistics — Management Consultant — Product Management — Business Analyst

No responses yet