雙因子變異數分析 (Two-Way ANOVA)
套路17: 雙因子變異數分析 (Two-Way ANOVA)
應用雙因子變異數分析的資料有二影響因子,就是有兩個自變項。此檢定有三組H0和HA。
H0: m因子1-1 = m因子1-2 = … = m因子1-m,HA: 至少有一組平均值不同。
H0: m因子2-1 = m因子2-2 = … = m因子2-n,HA: 至少有一組平均值不同。
H0: 因子1與因子2互不影響,HA: 因子1與因子2互相影響。
1. 使用時機: 用於比較在二不同因子對樣本平均值 (mean)有無影響及二因子有無交互作用。
2. 分析類型: 母數(parametric)分析。直接使用資料數值算統計叫parametric方法,把資料排序之後用排序的名次算統計叫non-parametric方法。
3. 前提假設: 使用母數(parametric)分析時,資料須為常態分布(normal
distribution)。使用ANOVA多組資料須相同變異數。
4. 範例資料: 咪路調查人類血漿中鉀離子濃度(mg/100 ml)資料如下:
沒注射賀爾蒙
|
注射賀爾蒙
|
||
雌
|
雄
|
雌
|
雄
|
16.3
|
15.3
|
38.1
|
34.0
|
20.4
|
17.4
|
26.2
|
22.8
|
12.4
|
10.9
|
32.3
|
27.8
|
15.8
|
10.3
|
35.8
|
25.0
|
9.5
|
6.7
|
30.2
|
29.3
|
H0: 注射賀爾蒙沒影響。 HA: 注射賀爾蒙有影響。
H0: 性別沒影響。 HA: 性別有影響。
H0: 性別及注射賀爾蒙兩因子互不影響。 HA: 性別及注射賀爾蒙兩因子互相影響。
5. 畫圖看資料分布:
co =
[16.3,20.4,12.4,15.8,9.5,15.3,17.4,10.9,10.3,6.7,38.1,26.2,32.3,35.8,30.2,34,22.8,27.8,25,29.3]
ho =
["N","N","N","N","N","N","N","N","N","N","H","H","H","H","H","H","H","H","H","H"]
se =
["F","F","F","F","F","M","M","M","M","M","F","F","F","F","F","M","M","M","M","M"]
dat = {'Conc':co,'Hor':ho, 'Sex':se}
import pandas as pd
df = pd.DataFrame(dat)
import seaborn as sns
sns.set(style="whitegrid")
ax = sns.boxplot(x = "Hor",
y = "Conc", hue="Sex", data = df, palette =
"Set3")
ax = sns.swarmplot(x =
"Hor", y = "Conc", hue="Sex", data = df, dodge =
True, palette = "Set1")
結果:
6. 檢查資料是否為常態分布 (H0:資料為常態分佈):
dat1 = [16.3,20.4,12.4,15.8,9.5]
dat2 = [15.3,17.4,10.9,10.3,6.7]
dat3 = [38.1,26.2,32.3,35.8,30.2]
dat4 = [34,22.8,27.8,25,29.3]
import scipy.stats
scipy.stats.shapiro(dat1)
結果: (0.9792556166648865,
0.9305974245071411)
p = 0.93 > 0.05,接受H0:資料為常態分佈。
scipy.stats.shapiro(dat2)
結果: (0.9588245153427124,
0.7997748255729675)
p = 0.799 > 0.05,接受H0:資料為常態分佈。
scipy.stats.shapiro(dat3)
結果: (0.9827524423599243,
0.9487878680229187)
p = 0.948 > 0.05,接受H0:資料為常態分佈。
scipy.stats.shapiro(dat4)
結果: (0.9784173369407654,
0.9259798526763916)
p = 0.925 > 0.05,接受H0:資料為常態分佈。
7. 檢查資料是否為相同變異數 (H0: s12 = s22 = s32 = s42):
方法: Levene test for
equal variances (parametric test)
dat1 = [16.3,20.4,12.4,15.8,9.5]
dat2 = [15.3,17.4,10.9,10.3,6.7]
dat3 = [38.1,26.2,32.3,35.8,30.2]
dat4 = [34,22.8,27.8,25,29.3]
import scipy.stats
scipy.stats.levene(dat1, dat2, dat3,
dat4, center = 'mean')
結果: LeveneResult(statistic=0.041144557517671064,
pvalue=0.9884508158578803)
p = 0.988 > 0.05,接受H0: s12 = s22 = s32
= s42。
# 相同變異數表示樣本來自相同母體(population),不同變異數表示樣本取樣自不同母體。
8. 使用Python計算雙因子變異數分析:
方法一: statsmodels
co = [16.3,20.4,12.4,15.8,9.5,15.3,17.4,10.9,10.3,6.7,38.1,26.2,32.3,35.8,30.2,34,22.8,27.8,25,29.3]
ho =
["N","N","N","N","N","N","N","N","N","N","H","H","H","H","H","H","H","H","H","H"]
se =
["F","F","F","F","F","M","M","M","M","M","F","F","F","F","F","M","M","M","M","M"]
dat = {'Conc':co,'Hor':ho, 'Sex':se}
import pandas as pd
df = pd.DataFrame(dat)
import statsmodels.api as sm
from statsmodels.formula.api import
ols
mod = ols('Conc ~ Hor*Sex', data =
df).fit()
sm.stats.anova_lm(mod, typ = 2)
結果:
sum_sq df F PR(>F)
Hor
1386.1125 1.0 73.584568
2.217191e-07 # p = 2.217e-7
< 0.05,Hor有影響
Sex
70.3125 1.0 3.732680
7.126377e-02 # p = 0.071 > 0.05,Sex沒影響
Hor:Sex
4.9005 1.0 0.260153
6.169788e-01 # p =
0.616 > 0.05,Sex與Hor兩因子互不影響
Residual
301.3920 16.0 NaN NaN
方法二: pingouin
conda install -c
conda-forge pingouin # 安裝pingouin
# 安裝成功之後執行下列程式
co = [16.3,20.4,12.4,15.8,9.5,15.3,17.4,10.9,10.3,6.7,38.1,26.2,32.3,35.8,30.2,34,22.8,27.8,25,29.3]
ho =
["N","N","N","N","N","N","N","N","N","N","H","H","H","H","H","H","H","H","H","H"]
se =
["F","F","F","F","F","M","M","M","M","M","F","F","F","F","F","M","M","M","M","M"]
dat = {'Conc':co,'Hor':ho, 'Sex':se}
import pandas as pd
df = pd.DataFrame(dat)
import pingouin
df.anova(dv = "Conc",
between = ["Hor", "Sex"], detailed = True)
結果:
Source SS DF
MS F p-unc np2
0 Hor
1386.112 1 1386.112
73.584541 2.217196e-07 0.821398
# p = 2.217e-7
< 0.05,Hor有影響
1
Sex 70.312 1
70.312 3.732654 7.126468e-02
0.189161 # p = 0.071 > 0.05,Sex沒影響
2
Hor * Sex 4.901 1
4.901 0.260206 6.169432e-01
0.016003 # p =
0.616 > 0.05,兩因子互不影響
3
residual 301.392 16
18.837 NaN NaN NaN
留言
張貼留言