雙因子變異數分析 (Two-Way ANOVA)


套路17: 雙因子變異數分析 (Two-Way ANOVA)

應用雙因子變異數分析的資料有二影響因子,就是有兩個自變項。此檢定有三組H0HA
H0: m因子1-1 = m因子1-2 = … = m因子1-mHA: 至少有一組平均值不同。
H0: m因子2-1 = m因子2-2 = … = m因子2-nHA: 至少有一組平均值不同。
H0: 因子1與因子2互不影響HA: 因子1與因子2互相影響。

1. 使用時機: 用於比較在二不同因子對樣本平均值 (mean)有無影響及二因子有無交互作用。
2. 分析類型: 母數(parametric)分析。直接使用資料數值算統計叫parametric方法,把資料排序之後用排序的名次算統計叫non-parametric方法。
3. 前提假設: 使用母數(parametric)分析時資料須為常態分布(normal distribution)。使用ANOVA多組資料須相同變異數
4. 範例資料: 咪路調查人類血漿中鉀離子濃度(mg/100 ml)資料如下:
沒注射賀爾蒙
注射賀爾蒙
16.3
15.3
38.1
34.0
20.4
17.4
26.2
22.8
12.4
10.9
32.3
27.8
15.8
10.3
35.8
25.0
9.5
6.7
30.2
29.3
H0: 注射賀爾蒙沒影響 HA: 注射賀爾蒙有影響
H0: 性別沒影響HA: 性別有影響
H0: 性別及注射賀爾蒙兩因子互不影響HA: 性別及注射賀爾蒙兩因子互相影響

5. 畫圖看資料分布:
co = [16.3,20.4,12.4,15.8,9.5,15.3,17.4,10.9,10.3,6.7,38.1,26.2,32.3,35.8,30.2,34,22.8,27.8,25,29.3]
ho = ["N","N","N","N","N","N","N","N","N","N","H","H","H","H","H","H","H","H","H","H"]
se = ["F","F","F","F","F","M","M","M","M","M","F","F","F","F","F","M","M","M","M","M"]
dat = {'Conc':co,'Hor':ho, 'Sex':se}
import pandas as pd
df = pd.DataFrame(dat)
import seaborn as sns
sns.set(style="whitegrid")
ax = sns.boxplot(x = "Hor", y = "Conc", hue="Sex", data = df, palette = "Set3")
ax = sns.swarmplot(x = "Hor", y = "Conc", hue="Sex", data = df, dodge = True, palette = "Set1")
結果:


6. 檢查資料是否為常態分布 (H0:資料為常態分佈):
dat1 = [16.3,20.4,12.4,15.8,9.5]
dat2 = [15.3,17.4,10.9,10.3,6.7]
dat3 = [38.1,26.2,32.3,35.8,30.2]
dat4 = [34,22.8,27.8,25,29.3]
import scipy.stats
scipy.stats.shapiro(dat1)
結果: (0.9792556166648865, 0.9305974245071411)
p = 0.93 > 0.05,接受H0:資料為常態分佈。
scipy.stats.shapiro(dat2)
結果: (0.9588245153427124, 0.7997748255729675)
p = 0.799 > 0.05,接受H0:資料為常態分佈。
scipy.stats.shapiro(dat3)
結果: (0.9827524423599243, 0.9487878680229187)
p = 0.948 > 0.05,接受H0:資料為常態分佈。
scipy.stats.shapiro(dat4)
結果: (0.9784173369407654, 0.9259798526763916)
p = 0.925 > 0.05,接受H0:資料為常態分佈。

7. 檢查資料是否為相同變異數 (H0: s12 = s22 = s32  = s42):
方法: Levene test for equal variances (parametric test)
dat1 = [16.3,20.4,12.4,15.8,9.5]
dat2 = [15.3,17.4,10.9,10.3,6.7]
dat3 = [38.1,26.2,32.3,35.8,30.2]
dat4 = [34,22.8,27.8,25,29.3]
import scipy.stats
scipy.stats.levene(dat1, dat2, dat3, dat4, center = 'mean')
結果: LeveneResult(statistic=0.041144557517671064, pvalue=0.9884508158578803)
p = 0.988 > 0.05,接受H0: s12 = s22 = s32  = s42
# 相同變異數表示樣本來自相同母體(population),不同變異數表示樣本取樣自不同母體。

8. 使用Python計算雙因子變異數分析:

方法一: statsmodels
co = [16.3,20.4,12.4,15.8,9.5,15.3,17.4,10.9,10.3,6.7,38.1,26.2,32.3,35.8,30.2,34,22.8,27.8,25,29.3]
ho = ["N","N","N","N","N","N","N","N","N","N","H","H","H","H","H","H","H","H","H","H"]
se = ["F","F","F","F","F","M","M","M","M","M","F","F","F","F","F","M","M","M","M","M"]
dat = {'Conc':co,'Hor':ho, 'Sex':se}
import pandas as pd
df = pd.DataFrame(dat)
import statsmodels.api as sm
from statsmodels.formula.api import ols
mod = ols('Conc ~ Hor*Sex', data = df).fit()
sm.stats.anova_lm(mod, typ = 2)
結果:
             sum_sq    df          F        PR(>F)
Hor       1386.1125   1.0  73.584568  2.217191e-07       # p = 2.217e-7 < 0.05Hor有影響
Sex         70.3125   1.0   3.732680  7.126377e-02          #  p = 0.071 > 0.05Sex沒影響
Hor:Sex      4.9005   1.0   0.260153  6.169788e-01        # p = 0.616 > 0.05SexHor兩因子互不影響
Residual   301.3920  16.0        NaN           NaN

方法二: pingouin
conda install -c conda-forge pingouin   # 安裝pingouin
# 安裝成功之後執行下列程式
co = [16.3,20.4,12.4,15.8,9.5,15.3,17.4,10.9,10.3,6.7,38.1,26.2,32.3,35.8,30.2,34,22.8,27.8,25,29.3]
ho = ["N","N","N","N","N","N","N","N","N","N","H","H","H","H","H","H","H","H","H","H"]
se = ["F","F","F","F","F","M","M","M","M","M","F","F","F","F","F","M","M","M","M","M"]
dat = {'Conc':co,'Hor':ho, 'Sex':se}
import pandas as pd
df = pd.DataFrame(dat)
import pingouin
df.anova(dv = "Conc", between = ["Hor", "Sex"], detailed = True)
結果:
      Source        SS  DF        MS          F         p-unc       np2
0        Hor  1386.112   1  1386.112  73.584541  2.217196e-07  0.821398   # p = 2.217e-7 < 0.05Hor有影響
1        Sex    70.312   1    70.312   3.732654  7.126468e-02  0.189161        #  p = 0.071 > 0.05Sex沒影響
2  Hor * Sex     4.901   1     4.901   0.260206  6.169432e-01  0.016003    # p = 0.616 > 0.05兩因子互不影響
3   residual   301.392  16    18.837        NaN           NaN       NaN


留言

這個網誌中的熱門文章

三因子變異數分析 (Three-Way ANOVA)

比較多組不同變異數獨立樣本平均值檢定 (Welch's Test for Analysis of Variance,parametric)