機率分布 (Probability Distributions)

套路 4: 機率分布 (Probability Distributions)

什麼是資料的機率分布? 說白了就是描述不同結果可能發生的機率的數學函數 (probability density functionpdf)。以下是舉例使用Python模擬幾種機率分布並畫圖。

Normal distribution公式 (probability density function):
其中μ母體(population)的算數平均數s母體標準差 (population standard deviation)pe常數數。由此公式可知常態分佈由兩個參數(μ, s )決定。
(μ = 0, s = 1 ) 平均值為0標準差為1標準常態分布(standard normal distribution)

from scipy.stats import norm
import numpy as np
import matplotlib.pyplot as plt
fig, ax = plt.subplots(1, 1)
mean, var, skew, kurt = norm.stats(moments='mvsk')
x = np.linspace(norm.ppf(0.01), norm.ppf(0.99), 100)   # ppf: percent point function (percentiles)
ax.plot(x, norm.pdf(x), 'r-', lw=5, alpha=0.6, label='norm pdf')   # pdf: probability density function
rv = norm()
ax.plot(x, rv.pdf(x), 'k-', lw=2, label='frozen pdf')
r = norm.rvs(size=1000)
ax.hist(r, density=True, histtype='stepfilled', alpha=0.2)
ax.legend(loc='best', frameon=False)
plt.show()
結果: 平均值為0標準差為1標準常態分布(standard normal distribution)

t distribution公式 (probability density function):
其中n自由度 (degree of freedom) Ggamma function。由此公式可知t分佈由參數n決定。

scipy.stats.t程式範例: https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.t.html#scipy.stats.t
from scipy.stats import t
import numpy as np
import matplotlib.pyplot as plt
fig, ax = plt.subplots(1, 1)
df = 2.74
mean, var, skew, kurt = t.stats(df, moments='mvsk')
x = np.linspace(t.ppf(0.01, df), t.ppf(0.99, df), 100)
ax.plot(x, t.pdf(x, df), 'r-', lw=5, alpha=0.6, label='t pdf')
rv = t(df)
ax.plot(x, rv.pdf(x), 'k-', lw=2, label='frozen pdf')
vals = t.ppf([0.001, 0.5, 0.999], df)
np.allclose([0.001, 0.5, 0.999], t.cdf(vals, df))
r = t.rvs(df, size=1000)
ax.hist(r, density=True, histtype='stepfilled', alpha=0.2)
ax.legend(loc='best', frameon=False)
plt.show()

結果: df = 2.74t分布

F distribution公式 (probability density function):
其中d1 d2自由度 (degree of freedom) Bbeta function。由此公式可知F分佈由參數d1 d2決定。

scipy.stats.f程式範例: https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.f.html#scipy.stats.f
from scipy.stats import f
import numpy as np
import matplotlib.pyplot as plt
fig, ax = plt.subplots(1, 1)
dfn, dfd = 29, 18
mean, var, skew, kurt = f.stats(dfn, dfd, moments='mvsk')
x = np.linspace(f.ppf(0.01, dfn, dfd), f.ppf(0.99, dfn, dfd), 100)
ax.plot(x, f.pdf(x, dfn, dfd), 'r-', lw=5, alpha=0.6, label='f pdf')
rv = f(dfn, dfd)
ax.plot(x, rv.pdf(x), 'k-', lw=2, label='frozen pdf')
vals = f.ppf([0.001, 0.5, 0.999], dfn, dfd)
np.allclose([0.001, 0.5, 0.999], f.cdf(vals, dfn, dfd))
r = f.rvs(dfn, dfd, size=1000)
ax.hist(r, density=True, histtype='stepfilled', alpha=0.2)
ax.legend(loc='best', frameon=False)
plt.show()

結果: df = 29, 18F分布

c2 (Chi-square) distribution公式 (probability density function):
其中k自由度 (degree of freedom) Ggamma function。由此公式可知c2分佈由參數k決定。

scipy.stats.chi2程式範例: https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.chi2.html#scipy.stats.chi2
from scipy.stats import chi2
import numpy as np
import matplotlib.pyplot as plt
fig, ax = plt.subplots(1, 1)
df = 55
mean, var, skew, kurt = chi2.stats(df, moments='mvsk')
x = np.linspace(chi2.ppf(0.01, df), chi2.ppf(0.99, df), 100)
ax.plot(x, chi2.pdf(x, df), 'r-', lw=5, alpha=0.6, label='chi2 pdf')
rv = chi2(df)
ax.plot(x, rv.pdf(x), 'k-', lw=2, label='frozen pdf')
vals = chi2.ppf([0.001, 0.5, 0.999], df)
np.allclose([0.001, 0.5, 0.999], chi2.cdf(vals, df))
r = chi2.rvs(df, size=1000)
ax.hist(r, density=True, histtype='stepfilled', alpha=0.2)
ax.legend(loc='best', frameon=False)
plt.show()

結果: df = 55c2分布

Uniform distribution:

scipy.stats.uniform程式範例: https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.uniform.html#scipy.stats.uniform
from scipy.stats import uniform
import numpy as np
import matplotlib.pyplot as plt
fig, ax = plt.subplots(1, 1)
mean, var, skew, kurt = uniform.stats(moments='mvsk')
x = np.linspace(uniform.ppf(0.01), uniform.ppf(0.99), 100)
ax.plot(x, uniform.pdf(x), 'r-', lw=5, alpha=0.6, label='uniform pdf')
rv = uniform()
ax.plot(x, rv.pdf(x), 'k-', lw=2, label='frozen pdf')
vals = uniform.ppf([0.001, 0.5, 0.999])
np.allclose([0.001, 0.5, 0.999], uniform.cdf(vals))
r = uniform.rvs(size=1000)
ax.hist(r, density=True, histtype='stepfilled', alpha=0.2)
ax.legend(loc='best', frameon=False)
plt.show()

結果: uniform分布
 

Binomial distribution公式 (probability density function):
 
其中p (1 - p)事件出現的機率 k Î {0, 1,..., n}。由此公式可知二項式分佈由參數pn決定。

scipy.stats.binom程式範例: https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.binom.html#scipy.stats.binom
from scipy.stats import binom
import numpy as np
import matplotlib.pyplot as plt
fig, ax = plt.subplots(1, 1)
n, p = 5, 0.4
mean, var, skew, kurt = binom.stats(n, p, moments='mvsk')
x = np.arange(binom.ppf(0.01, n, p), binom.ppf(0.99, n, p))
ax.plot(x, binom.pmf(x, n, p), 'bo', ms=8, label='binom pmf')
ax.vlines(x, 0, binom.pmf(x, n, p), colors='b', lw=5, alpha=0.5)
rv = binom(n, p)
ax.vlines(x, 0, rv.pmf(x), colors='k', linestyles='-', lw=1, label='frozen pmf')
ax.legend(loc='best', frameon=False)
plt.show()

結果: n, p = 5, 0.4的二項式分布

Poisson distribution公式 (probability density function):
其中l事件出現機率的期望值 k Î {0, 1,..., n}。由此公式可知布ㄚ松分佈由參數l決定。

scipy.stats.poisson程式範例: https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.poisson.html#scipy.stats.poisson
from scipy.stats import poisson
import numpy as np
import matplotlib.pyplot as plt
fig, ax = plt.subplots(1, 1)
mu = 0.6
mean, var, skew, kurt = poisson.stats(mu, moments='mvsk')
x = np.arange(poisson.ppf(0.01, mu), poisson.ppf(0.99, mu))
ax.plot(x, poisson.pmf(x, mu), 'bo', ms=8, label='poisson pmf')
ax.vlines(x, 0, poisson.pmf(x, mu), colors='b', lw=5, alpha=0.5)
rv = poisson(mu)
ax.vlines(x, 0, rv.pmf(x), colors='k', linestyles='-', lw=1, label='frozen pmf')
ax.legend(loc='best', frameon=False)
plt.show()

結果: l = 0.6的布ㄚ松分佈

留言

這個網誌中的熱門文章

三因子變異數分析 (Three-Way ANOVA)

比較多組不同變異數獨立樣本平均值檢定 (Welch's Test for Analysis of Variance,parametric)

雙因子變異數分析 (Two-Way ANOVA)