Probability Distributions 3 — The Binomial Distribution
The binomial distribution is a discrete probability distribution that models the outcomes of a given number of random trails of some experiment or event. The binomial is defined by two parameters: the probability of success in any given trial and the number of trials. The binomial distribution tells you how likely it is to achieve a given number of successes in n trials of the experiment.
For example, we could model flipping a fair coin 10 times with a binomial distribution where the number of trials is set to 10 and the probability of success is set to 0.5. In this case the distribution would tell us how likely it is to get zero heads, 1 head, 2 heads and so on.
Properties:-
- Each trail has only two possible outcomes — success and failure.
- Total number of trails are fixed.
- Probability of success and failure remains same through out all the trails.
- The trails are independent of each other.
In above mentioned: —
p = probability of success
1-p = probability of failure
k = number of successes
n-k = number of failures
fair_coin_flips = stats.binom.rvs(n=10, # Number of flips
p=0.5, # Success probability
size=10000) # Number of trials
print( pd.crosstab(index="counts", columns= fair_coin_flips))
pd.DataFrame(fair_coin_flips).hist(range=(-0.5,10.5), bins=11);
Note: — since the binomial distribution is discrete, it only takes on integer values so we can summarize binomial data with a frequency table and its distribution with a histogram. The histogram shows us that a binomial distribution with a 50% probability of success is roughly symmetric, with the most likely outcomes lying at the center. This is reminiscent of the normal distribution, but if we alter the success probability, the distribution won’t be symmetric.
biased_coin_flips = stats.binom.rvs(n=10, # Number of flips per trial
p=1, # Success probability
size=10000) # Number of trials
# Print table of counts
print( pd.crosstab(index="counts", columns= biased_coin_flips))
# Plot histogram
pd.DataFrame(biased_coin_flips).hist(range=(-0.5,10.5), bins=11);
cdf(cumulative distribution function) function lets us check the probability of achieving a number of successes within a certain range.
stats.binom.cdf(k=5, # Probability of k = 5 successes or less
n=10, # With 10 flips
p=0.8) # And success probability 0.8
Out[]:0.03279349759999996
1 - stats.binom.cdf(k=8, # Probability of k = 9 successes or more
n=10, # With 10 flips
p=0.8) # And success probability 0.8
Out[]:0.37580963840000003
For continuous probability density functions, you use pdf() to check the probability density at a given x value. For discrete distributions like the binomial, use stats.distribution.pmf() (probability mass function) to check the mass (proportion of observations) at given number of successes k:
stats.binom.pmf(k=5, # Probability of k = 5 successes
n=10, # With 10 flips
p=0.5) # And success probability 0.5
Out[]:0.24609375000000025
stats.binom.pmf(k=8, # Probability of k = 8 successes
n=10, # With 10 flips
p=0.8) # And success probability 0.8
Out[]:0.30198988799999998
Links to some other blogs: —
Uniform Distribution
Normal Distribution
Central Limit Theorem
10 alternatives for Cloud based Jupyter notebook!!
Number System in Python