多元線性回歸分析 (Multiple Linear Regression)
套路28: 多元線性回歸分析 (Multiple Linear Regression)
1. 使用時機: 以多個獨立自變項預測一個應變項。
2. 分析類型: 母數分析(parametric analysis)。
3. 範例資料: 咪路測量菌菌的生長條件與菌數,資料如下表。咪路希望能得到一回歸方程式可用來預測菌菌生長。
item
|
Temp
|
pH
|
hour
|
ml
|
CFU(106)
|
1
|
6
|
5.7
|
1.6
|
2.12
|
9.9
|
2
|
1
|
6.4
|
3
|
3.39
|
9.3
|
3
|
2
|
5.7
|
3.4
|
3.61
|
9.4
|
4
|
11
|
6.1
|
3.4
|
1.72
|
9.1
|
5
|
1
|
6
|
3
|
1.8
|
6.9
|
6
|
2
|
5.7
|
4.4
|
3.21
|
9.3
|
7
|
5
|
5.9
|
2.2
|
2.59
|
7.9
|
8
|
1
|
6.2
|
2.2
|
3.25
|
7.4
|
9
|
1
|
5.5
|
1.9
|
2.86
|
7.3
|
10
|
3
|
5.2
|
0.2
|
2.32
|
8.8
|
11
|
11
|
5.7
|
4.2
|
1.57
|
9.8
|
12
|
9
|
6.1
|
2.4
|
1.5
|
10.5
|
13
|
5
|
6.4
|
3.4
|
2.69
|
9.1
|
14
|
3
|
5.5
|
3
|
4.06
|
10.1
|
15
|
1
|
5.5
|
0.2
|
1.98
|
7.2
|
16
|
8
|
6
|
3.9
|
2.29
|
11.7
|
17
|
2
|
5.5
|
2.2
|
3.55
|
8.7
|
18
|
3
|
6.2
|
4.4
|
3.31
|
7.6
|
19
|
6
|
5.9
|
0.2
|
1.83
|
8.6
|
20
|
10
|
5.6
|
2.4
|
1.69
|
10.9
|
21
|
4
|
5.8
|
2.4
|
2.42
|
7.6
|
22
|
5
|
5.8
|
4.4
|
2.98
|
7.3
|
23
|
5
|
5.2
|
1.6
|
1.84
|
9.2
|
24
|
3
|
6
|
1.9
|
2.48
|
7
|
25
|
8
|
5.5
|
1.6
|
2.83
|
7.2
|
26
|
8
|
6.4
|
4.1
|
2.41
|
7
|
27
|
6
|
6.2
|
1.9
|
1.78
|
8.8
|
28
|
6
|
5.4
|
2.2
|
2.22
|
10.1
|
29
|
3
|
5.4
|
4.1
|
2.72
|
12.1
|
30
|
5
|
6.2
|
1.6
|
2.36
|
7.7
|
31
|
1
|
6.8
|
2.4
|
2.81
|
7.8
|
32
|
8
|
6.2
|
1.9
|
1.64
|
11.5
|
33
|
10
|
6.4
|
2.2
|
1.82
|
10.4
|
求多元線性回歸方程式。
4. 建立資料
from pandas import DataFrame
dat = {'Temp': [6, 1, 2, 11, 1, 2, 5, 1, 1, 3, 11, 9, 5, 3, 1, 8,
2, 3, 6, 10, 4, 5, 5, 3, 8, 8, 6, 6, 3, 5, 1, 8, 10],
'pH': [5.7, 6.4, 5.7,
6.1, 6, 5.7, 5.9, 6.2, 5.5, 5.2, 5.7, 6.1, 6.4, 5.5, 5.5, 6, 5.5, 6.2, 5.9, 5.6,
5.8, 5.8, 5.2, 6, 5.5, 6.4, 6.2, 5.4, 5.4, 6.2, 6.8, 6.2, 6.4],
'hour': [1.6, 3, 3.4, 3.4, 3, 4.4, 2.2, 2.2,
1.9, 0.2, 4.2, 2.4, 3.4, 3, 0.2, 3.9, 2.2, 4.4, 0.2, 2.4, 2.4, 4.4, 1.6, 1.9, 1.6,
4.1, 1.9, 2.2, 4.1, 1.6, 2.4, 1.9, 2.2],
'ml': [2.12, 3.39, 3.61,
1.72, 1.8, 3.21, 2.59, 3.25, 2.86, 2.32, 1.57, 1.5, 2.69, 4.06, 1.98, 2.29, 3.55,
3.31, 1.83, 1.69, 2.42, 2.98, 1.84, 2.48, 2.83, 2.41, 1.78, 2.22, 2.72, 2.36, 2.81,
1.64, 1.82],
'CFU': [9.9, 9.3, 9.4,
9.1, 6.9, 9.3, 7.9, 7.4, 7.3, 8.8, 9.8, 10.5, 9.1, 10.1, 7.2, 11.7, 8.7, 7.6, 8.6,
10.9, 7.6, 7.3, 9.2, 7, 7.2, 7, 8.8, 10.1, 12.1, 7.7, 7.8, 11.5, 10.4]
}
df = DataFrame(dat,columns=['Temp','pH','hour','ml','CFU'])
5. 執行回歸
X = df[['Temp','pH','hour','ml']]
Y = df['CFU']
import statsmodels.api as sm
X = sm.add_constant(X) # adding a constant
model = sm.OLS(Y, X).fit()
predictions = model.predict(X)
res = model.summary()
res
結果:
OLS Regression Results
==============================================================================
Dep. Variable: CFU R-squared: 0.252
Model: OLS Adj. R-squared: 0.146
Method:
Least Squares F-statistic: 2.363
Date: Mon,
29 Jul 2019 Prob (F-statistic): 0.0772
Time:
15:00:43 Log-Likelihood: -54.579
No. Observations: 33 AIC: 119.2
Df Residuals: 28 BIC: 126.6
Df Model: 4
Covariance Type: nonrobust
==============================================================================
coef std err t
P>|t| [0.025 0.975]
------------------------------------------------------------------------------
const
12.6254 4.135 3.053
0.005 4.155 21.096
Temp
0.1946 0.110 1.771
0.087 -0.030 0.420
pH
-0.8865 0.646 -1.373
0.181 -2.209
0.436
hour
0.2477 0.247 1.003
0.324 -0.258 0.753
ml
-0.0474 0.541 -0.088
0.931 -1.156 1.061
==============================================================================
Omnibus:
0.100 Durbin-Watson: 1.434
Prob(Omnibus):
0.951 Jarque-Bera (JB): 0.282
Skew:
0.097 Prob(JB): 0.869
Kurtosis:
2.591 Cond. No. 153.
==============================================================================
只有const及Temp的p value 小於0.05。
方程式: CFU = 12.6254 + 0.1946*Temp
留言
張貼留言