Probability Distributions 4— The Geometric and Exponential Distributions
The geometric and exponential distributions model the time it takes for an event to occur. The geometric distribution is discrete and models the number of trials it takes to achieve a success in repeated experiments with a given probability of success. The exponential distribution is a continuous analog of the geometric distribution and models the amount of time you have to wait before an event occurs given a certain occurrence rate.
random.seed(12)
flips_till_heads = stats.geom.rvs(size=10000, # Generate geometric data
p=0.5) # With success prob 0.5
# Print table of counts
print( pd.crosstab(index="counts", columns= flips_till_heads))
# Plot histogram
pd.DataFrame(flips_till_heads).hist(range=(-0.5,max(flips_till_heads)+0.5)
, bins=max(flips_till_heads)+1);
Result: -
col_0 1 2 3 4 5 6 7 8 9 10 11 12 13
row_0
counts 5046 2458 1240 640 304 172 70 34 14 12 5 4 1
The distribution looks similar to what we’d expect: it is very likely to get a heads in 1 or 2 flips, while it is very unlikely for it to take more than 5 flips to get a heads. In the 10,000 trails we generated, the longest it took to get a heads was 13 flips.
Let’s use cdf() to check the probability of needing 6 flips or more to get a success:
first_five = stats.geom.cdf(k=5, # Prob of success in first 5 flips
p=0.5)
1 - first_five
Out[]: 0.03125
Use pmf() to check the probability of seeing a specific number of flips before a successes:
stats.geom.pmf(k=2, # Prob of needing exactly 2 flips to get first success
p=0.5)
Out[]: 0.25
The scipy name for the exponential distribution is “expon”. Let’s investigate the exponential distribution:
# Get the probability of waiting more than 1 time unit before a success
prob_1 = stats.expon.cdf(x=1,
scale=1) # Arrival rate
1 - prob_1
Out[]: 0.36787944117144233
Note: The average arrival time for the exponential distribution is equal to 1/arrival_rate.
plt.fill_between(x=np.arange(0,1,0.01),
y1= stats.expon.pdf(np.arange(0,1,0.01)) ,
facecolor='blue',
alpha=0.35)
plt.fill_between(x=np.arange(1,7,0.01),
y1= stats.expon.pdf(np.arange(1,7,0.01)) ,
facecolor='red',
alpha=0.35)
plt.text(x=0.3, y=0.2, s= round(prob_1,3))
plt.text(x=1.5, y=0.08, s= round(1 - prob_1,3));
Similar to the geometric distribution, the exponential starts high and has a long tail that trails off to the right that contains rare cases where you have to wait much longer than average for an arrival.
Links to some other blogs: —
Uniform Distribution
Normal Distribution
Binomial Distribution
Central Limit Theorem
10 alternatives for Cloud based Jupyter notebook!!