機率分布 (Probability Distributions)
套路 4: 機率分布 (Probability Distributions)
什麼是資料的機率分布? 說白了就是描述不同結果可能發生的機率的數學函數 (probability density function,pdf)。以下是舉例使用Python模擬幾種機率分布並畫圖。
Normal distribution公式 (probability density function):
其中μ是母體(population)的算數平均數,s是母體標準差 (population standard deviation),p與e是常數數。由此公式可知常態分佈由兩個參數(μ, s )決定。
(μ = 0, s = 1 ) 平均值為0、標準差為1的標準常態分布(standard normal distribution) 。
scipy.stats.norm程式範例: https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.norm.html#scipy.stats.norm
from scipy.stats
import norm
import numpy as
np
import
matplotlib.pyplot as plt
fig, ax =
plt.subplots(1, 1)
mean, var, skew,
kurt = norm.stats(moments='mvsk')
x =
np.linspace(norm.ppf(0.01), norm.ppf(0.99), 100) # ppf: percent point function
(percentiles)
ax.plot(x,
norm.pdf(x), 'r-', lw=5, alpha=0.6, label='norm pdf') # pdf: probability density function
rv = norm()
ax.plot(x,
rv.pdf(x), 'k-', lw=2, label='frozen pdf')
r =
norm.rvs(size=1000)
ax.hist(r,
density=True, histtype='stepfilled', alpha=0.2)
ax.legend(loc='best',
frameon=False)
plt.show()
結果: 平均值為0、標準差為1的標準常態分布(standard normal distribution) 。
t distribution公式 (probability
density function):
其中n是自由度 (degree of freedom) ,G是gamma function。由此公式可知t分佈由參數n決定。
scipy.stats.t程式範例: https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.t.html#scipy.stats.t
from scipy.stats
import t
import numpy as
np
import
matplotlib.pyplot as plt
fig, ax =
plt.subplots(1, 1)
df = 2.74
mean, var, skew,
kurt = t.stats(df, moments='mvsk')
x =
np.linspace(t.ppf(0.01, df), t.ppf(0.99, df), 100)
ax.plot(x,
t.pdf(x, df), 'r-', lw=5, alpha=0.6, label='t pdf')
rv = t(df)
ax.plot(x,
rv.pdf(x), 'k-', lw=2, label='frozen pdf')
vals =
t.ppf([0.001, 0.5, 0.999], df)
np.allclose([0.001,
0.5, 0.999], t.cdf(vals, df))
r = t.rvs(df,
size=1000)
ax.hist(r,
density=True, histtype='stepfilled', alpha=0.2)
ax.legend(loc='best',
frameon=False)
plt.show()
結果: df = 2.74的t分布
F distribution公式 (probability density function):
其中d1 d2是自由度 (degree of freedom) ,B是beta function。由此公式可知F分佈由參數d1 d2決定。
scipy.stats.f程式範例:
https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.f.html#scipy.stats.f
from scipy.stats
import f
import numpy as
np
import
matplotlib.pyplot as plt
fig, ax =
plt.subplots(1, 1)
dfn, dfd = 29,
18
mean, var, skew,
kurt = f.stats(dfn, dfd, moments='mvsk')
x =
np.linspace(f.ppf(0.01, dfn, dfd), f.ppf(0.99, dfn, dfd), 100)
ax.plot(x,
f.pdf(x, dfn, dfd), 'r-', lw=5, alpha=0.6, label='f pdf')
rv = f(dfn, dfd)
ax.plot(x,
rv.pdf(x), 'k-', lw=2, label='frozen pdf')
vals =
f.ppf([0.001, 0.5, 0.999], dfn, dfd)
np.allclose([0.001,
0.5, 0.999], f.cdf(vals, dfn, dfd))
r = f.rvs(dfn,
dfd, size=1000)
ax.hist(r,
density=True, histtype='stepfilled', alpha=0.2)
ax.legend(loc='best',
frameon=False)
plt.show()
結果: df = 29, 18的F分布
c2 (Chi-square) distribution公式 (probability density function):
其中k是自由度 (degree of freedom) ,G是gamma function。由此公式可知c2分佈由參數k決定。
scipy.stats.chi2程式範例: https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.chi2.html#scipy.stats.chi2
from scipy.stats
import chi2
import numpy as
np
import
matplotlib.pyplot as plt
fig, ax =
plt.subplots(1, 1)
df = 55
mean, var, skew,
kurt = chi2.stats(df, moments='mvsk')
x =
np.linspace(chi2.ppf(0.01, df), chi2.ppf(0.99, df), 100)
ax.plot(x,
chi2.pdf(x, df), 'r-', lw=5, alpha=0.6, label='chi2 pdf')
rv = chi2(df)
ax.plot(x,
rv.pdf(x), 'k-', lw=2, label='frozen pdf')
vals =
chi2.ppf([0.001, 0.5, 0.999], df)
np.allclose([0.001,
0.5, 0.999], chi2.cdf(vals, df))
r = chi2.rvs(df,
size=1000)
ax.hist(r,
density=True, histtype='stepfilled', alpha=0.2)
ax.legend(loc='best',
frameon=False)
plt.show()
結果: df = 55的c2分布
Uniform distribution:
scipy.stats.uniform程式範例: https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.uniform.html#scipy.stats.uniform
from scipy.stats
import uniform
import numpy as
np
import
matplotlib.pyplot as plt
fig, ax =
plt.subplots(1, 1)
mean, var, skew,
kurt = uniform.stats(moments='mvsk')
x =
np.linspace(uniform.ppf(0.01), uniform.ppf(0.99), 100)
ax.plot(x,
uniform.pdf(x), 'r-', lw=5, alpha=0.6, label='uniform pdf')
rv = uniform()
ax.plot(x,
rv.pdf(x), 'k-', lw=2, label='frozen pdf')
vals =
uniform.ppf([0.001, 0.5, 0.999])
np.allclose([0.001,
0.5, 0.999], uniform.cdf(vals))
r =
uniform.rvs(size=1000)
ax.hist(r,
density=True, histtype='stepfilled', alpha=0.2)
ax.legend(loc='best',
frameon=False)
plt.show()
結果: uniform分布
Binomial distribution公式 (probability density function):
其中p (1
- p)是事件出現的機率, k Î {0,
1,..., n}。由此公式可知二項式分佈由參數p及n決定。
scipy.stats.binom程式範例:
https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.binom.html#scipy.stats.binom
from scipy.stats
import binom
import numpy as
np
import matplotlib.pyplot
as plt
fig, ax =
plt.subplots(1, 1)
n, p = 5, 0.4
mean, var, skew,
kurt = binom.stats(n, p, moments='mvsk')
x =
np.arange(binom.ppf(0.01, n, p), binom.ppf(0.99, n, p))
ax.plot(x,
binom.pmf(x, n, p), 'bo', ms=8, label='binom pmf')
ax.vlines(x, 0,
binom.pmf(x, n, p), colors='b', lw=5, alpha=0.5)
rv = binom(n, p)
ax.vlines(x, 0,
rv.pmf(x), colors='k', linestyles='-', lw=1, label='frozen pmf')
ax.legend(loc='best',
frameon=False)
plt.show()
結果: n, p = 5, 0.4的二項式分布
Poisson distribution公式 (probability density function):
其中l是事件出現機率的期望值, k Î {0, 1,..., n}。由此公式可知布ㄚ松分佈由參數l決定。
scipy.stats.poisson程式範例:
https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.poisson.html#scipy.stats.poisson
from scipy.stats
import poisson
import numpy as
np
import
matplotlib.pyplot as plt
fig, ax =
plt.subplots(1, 1)
mu = 0.6
mean, var, skew,
kurt = poisson.stats(mu, moments='mvsk')
x =
np.arange(poisson.ppf(0.01, mu), poisson.ppf(0.99, mu))
ax.plot(x,
poisson.pmf(x, mu), 'bo', ms=8, label='poisson pmf')
ax.vlines(x, 0,
poisson.pmf(x, mu), colors='b', lw=5, alpha=0.5)
rv = poisson(mu)
ax.vlines(x, 0,
rv.pmf(x), colors='k', linestyles='-', lw=1, label='frozen pmf')
ax.legend(loc='best',
frameon=False)
plt.show()
結果: l = 0.6的布ㄚ松分佈
留言
張貼留言