比較多組相同變異數獨立樣本平均值檢定 (One-Way Analysis of Variances,One-Way ANOVA,parametric)
套路 14: 比較多組相同變異數獨立樣本平均值檢定
(One-Way Analysis of Variances,One-Way ANOVA,parametric)
什麼是比較多組相同變異數獨立樣本平均值檢定? 說白了就是多組分別獨立取樣的資料做比較的假設檢定。**注意** ”比較變異數” (comparing variances)和”變異數分析” (analysis of variance)不同。變異數分析是多組資料比較平均值。統計假設檢定檢定什麼?看H0。例如多組獨立樣本假設檢定H0 : μ1
= μ2 = … = μk,HA : 至少有一組平均值不同,是檢定多組資料的平均值是否相同。假設相等時為雙尾 (two-tailed test) 檢定。
1. 使用時機: 用於比較多組相同變異數獨立樣本平均值(mean)。若自變項只有一個,就是單因子變異數分析。
2. 分析類型: 母數(parametric)分析。直接使用資料數值算統計叫parametric方法,把資料排序之後用排序的名次算統計叫non-parametric方法。
3. 前提假設: 使用母數(parametric)分析時,資料須為常態分布(normal distribution)。使用ANOVA多組資料須相同變異數。
4. 範例資料: 咪路調查餵食不同飼料的肉雞體重(g),資料如下:
飼料1
|
飼料2
|
飼料3
|
飼料4
|
61.8
|
78.8
|
70.5
|
60.3
|
65.1
|
79.5
|
72.6
|
63.8
|
61.7
|
76.0
|
71.7
|
64.1
|
63.3
|
73.4
|
72.0
|
61.4
|
|
77.3
|
71.1
|
60.9
|
不同飼料是否效果不同? H0: μ1 = μ2 = μ3 = μ4。HA: 餵食不同飼料的肉雞體重平均值不完全相同。
5. 畫圖看資料分布:
wt = [61.8,65.1,61.7,63.3,78.8,79.5,76.0,73.4,77.3,70.5,72.6,71.7,72.0,71.1,60.3,63.8,64.1,61.4,60.9]
cl = ["F1","F1","F1","F1","F2","F2","F2","F2","F2","F3","F3","F3","F3","F3","F4","F4","F4","F4","F4"]
dat = {'Weight':wt,'Feed':cl}
import pandas as pd
df = pd.DataFrame(dat)
import seaborn as sns
sns.set(style="whitegrid")
ax = sns.boxplot(x = "Feed",
y = "Weight", data = df, width=0.2, palette="Set3")
ax = sns.swarmplot(x = "Feed",
y = "Weight", data = df, color = "red")
結果:
6. 檢查資料是否為常態分布 (H0:資料為常態分佈):
dat1 = [61.8,65.1,61.7,63.3]
dat2 = [78.8,79.5,76.0,73.4,77.3]
dat3 = [70.5,72.6,71.7,72.0,71.1]
dat4 = [60.3,63.8,64.1,61.4,60.9]
import scipy.stats
scipy.stats.shapiro(dat1)
結果: (0.8768853545188904,
0.32550376653671265)
p = 0.3255 > 0.05,接受H0:資料為常態分佈。
scipy.stats.shapiro(dat2)
結果: (0.9504634737968445,
0.7404994964599609)
p = 0.7405 > 0.05,接受H0:資料為常態分佈。
scipy.stats.shapiro(dat3)
結果: (0.9899775981903076,
0.9796159863471985)
p = 0.9796 > 0.05,接受H0:資料為常態分佈。
scipy.stats.shapiro(dat4)
結果: (0.8627133369445801,
0.23815500736236572)
p = 0.2381 > 0.05,接受H0:資料為常態分佈。
7. 檢查資料是否為相同變異數 (H0: s12 = s22 = s32 = s42):
方法: Levene test for
equal variances (parametric test)
dat1 = [61.8,65.1,61.7,63.3]
dat2 = [78.8,79.5,76.0,73.4,77.3]
dat3 = [70.5,72.6,71.7,72.0,71.1]
dat4 = [60.3,63.8,64.1,61.4,60.9]
import scipy.stats
scipy.stats.levene(dat1, dat2, dat3,
dat4, center = 'mean')
結果: LeveneResult(statistic=1.9584062218957583,
pvalue=0.16363937651728697)
p = 0.1636 > 0.05,接受H0: s12 = s22 = s32
= s42。
# 相同變異數表示樣本來自相同母體(population),不同變異數表示樣本取樣自不同母體。
8. 使用Python計算多組相同變異數獨立樣本ANOVA檢定:
方法一: 使用SciPy (scipy.stats.f_oneway)
dat1 = [61.8,65.1,61.7,63.3]
dat2 = [78.8,79.5,76.0,73.4,77.3]
dat3 = [70.5,72.6,71.7,72.0,71.1]
dat4 = [60.3,63.8,64.1,61.4,60.9]
import scipy.stats
scipy.stats.f_oneway(dat1, dat2, dat3,
dat4)
結果: F_onewayResult(statistic=80.12392188505216,
pvalue=1.8454859967040819e-09)
p = 1.845e-9 < 0.05,不接受H0: μ1 = μ2
= μ3 = μ4,飼料效果不同。
方法二: 使用Statsmodels
wt =
[61.8,65.1,61.7,63.3,78.8,79.5,76.0,73.4,77.3,70.5,72.6,71.7,72.0,71.1,60.3,63.8,64.1,61.4,60.9]
cl = ["F1","F1","F1","F1","F2","F2","F2","F2","F2","F3","F3","F3","F3","F3","F4","F4","F4","F4","F4"]
dat = {'Weight':wt,'Feed':cl}
import pandas as pd
df = pd.DataFrame(dat)
import statsmodels.api as sm
from statsmodels.formula.api import ols
mod = ols('Weight ~ Feed', data =
df).fit()
sm.stats.anova_lm(mod, typ = 2)
結果: ANOVA table如下
sum_sq df
F PR(>F)
Feed 734.8245
3.0 80.123922 1.845486e-09
Residual 45.8555
15.0 NaN NaN
p = 1.845e-9 < 0.05,不接受H0: μ1 = μ2
= μ3 = μ4,飼料效果不同。
飼料效果不同是誰跟誰有差呢?要進一步做multiple comparison:
from
statsmodels.stats.multicomp import pairwise_tukeyhsd
from
statsmodels.stats.multicomp import MultiComparison
mc =
MultiComparison(df['Weight'], df['Feed'])
tkresult =
mc.tukeyhsd()
print(tkresult)
結果如下表:
Multiple
Comparison of Means - Tukey HSD,FWER=0.05
===============================================
group1 group2
meandiff lower upper
reject
-----------------------------------------------
F1
F2 14.025 10.6443
17.4057 True (有差)
F1
F3 8.605 5.2243
11.9857 True (有差)
F1
F4 -0.875 -4.2557
2.5057 False (沒差)
F2
F3 -5.42 -8.6073
-2.2327 True (有差)
F2
F4 -14.9 -18.0873 -11.7127 True (有差)
F3
F4 -9.48 -12.6673 -6.2927 True (有差)
-----------------------------------------------
留言
張貼留言