'data.frame': 10 obs. of 1 variable:
$ ppm: int 3 4 5 3 4 5 4 4 2 5
廣義線性模型 (HDAS7009-02)
假設檢定和平均值的比較 (Hypothesis Testing and Comparisons of Means)
1 Student’s t-test
t.test()
-
alternative: alternative hypothesis-
"two.sided"\((H_0: \mu_1 = \mu_2;~~~ H_1: \mu_1 \ne \mu_2)\) -
"greater"\(~~~(H_0: \mu_1 \le \mu_2;~~~ H_1: \mu_1 > \mu_2)\) -
"less"\(~~~~~~~~(H_0: \mu_1 \ge \mu_2;~~~ H_1: \mu_1 < \mu_2)\)
-
-
mu = 0:- for one-sample test, true value of the mean
- for two-sample test, difference in the two means
-
paired = FALSE: whether two samples are related (paired) -
var.equal = FALSE: whether two population variances are equal -
conf.level = 0.95: confidence level
See ?t.test for more details.
1.1 One-sample t-test
Exercise (1)
欲調查某廠牌泡麵防腐劑含量是否超標(\(\le 3\) ppm), 隨機抽取該廠牌 10 包泡麵,並測量每包泡麵的防腐劑含量。
-
請以「單尾檢定」檢驗此泡麵防腐劑含量是否符合政府所制定的國家標準(3 ppm)。
令該廠牌泡麵的防腐劑平均含量為 \(\mu\),欲檢定之假說為:
\[ \begin{cases} H_0: \mu \le 3 \text{ ppm} \\ H_1: \mu > 3 \text{ ppm} \end{cases} \]
t.test(pres$ppm, alternative = "greater", mu = 3)One Sample t-test data: pres$ppm t = 2.862, df = 9, p-value = 0.00936 alternative hypothesis: true mean is greater than 3 95 percent confidence interval: 3.323548 Inf sample estimates: mean of x 3.9# How to get p-value? 1 - pt(2.862, df = 9)[1] 0.009359624結論: 此廠牌泡麵防腐劑含量之樣本平均值為 3.9 ppm,在顯著水準 0.05 下
- 95% “單尾”信賴區間 (3.32, \(\infty\)) 未包含 3
- p-value = 0.0094 小於顯著水準
可以拒絕 \(H_0\)。此廠牌泡麵防腐劑平均含量顯著高於政府所制定的國家標準。
-
請以「雙尾檢定」檢驗此泡麵防腐劑含量是否符合政府所制定的國家標準(3 ppm)。
令該廠牌泡麵的防腐劑平均含量為 \(\mu\),欲檢定之假說為:
\[ \begin{cases} H_0: \mu = 3 \text{ ppm} \\ H_1: \mu \ne 3 \text{ ppm} \end{cases} \]
t.test(pres$ppm, alternative = "two.sided", mu = 3)One Sample t-test data: pres$ppm t = 2.862, df = 9, p-value = 0.01872 alternative hypothesis: true mean is not equal to 3 95 percent confidence interval: 3.188628 4.611372 sample estimates: mean of x 3.9# How to get p-value? (1 - pt(2.862, df = 9)) * 2[1] 0.01871925結論: 此廠牌泡麵防腐劑含量之樣本平均值為 3.9 ppm,在顯著水準 0.05 下
- 95% “雙尾”信賴區間 (3.19, 4.61) 未包含 3
- p-value = 0.0187 小於顯著水準
可以拒絕 \(H_0\)。此廠牌泡麵防腐劑平均含量與政府所制定的國家標準有顯著差異。
1.2 Paired two-sample t-test
Exercise (2)
欲探討牙周再生手術是否能改善牙周病人口腔狀況, 研究者收集 10 位牙周病人接受牙周再生手術前後之牙周囊袋深度 (probing pocket depth)。
'data.frame': 10 obs. of 3 variables:
$ ppd0: int 10 7 8 7 8 7 8 7 9 8
$ ppd1: int 3 1 2 2 1 3 3 2 2 3
$ diff: int 7 6 6 5 7 4 5 5 7 5
-
ppd0: 治療前牙周囊袋深度 (單位: mm)。 -
ppd1: 治療後牙周囊袋深度 (單位: mm)。 -
diff: 治療前減治療後的牙周囊袋深度 (單位: mm)。
牙周囊袋深度變化是指同一位病人在治療前後的變化, 因此同一病人 2 次牙周囊袋深度具有相關性。
令治療前與治療後牙周囊袋平均深度分別為 \(\mu_1\) 和 \(\mu_2\),欲檢定之假說為:
\[ \begin{cases} H_0: \mu_1 - \mu_2 \le 0 \\ H_1: \mu_1 - \mu_2 > 0 \end{cases} \]
# Traditional interface
t.test(ppd$ppd0, ppd$ppd1, alternative = "greater", paired = TRUE)
Paired t-test
data: ppd$ppd0 and ppd$ppd1
t = 17.015, df = 9, p-value = 1.879e-08
alternative hypothesis: true mean difference is greater than 0
95 percent confidence interval:
5.085915 Inf
sample estimates:
mean difference
5.7
Paired t-test
data: Pair(ppd0, ppd1)
t = 17.015, df = 9, p-value = 1.879e-08
alternative hypothesis: true mean difference is greater than 0
95 percent confidence interval:
5.085915 Inf
sample estimates:
mean difference
5.7
結論: 接受牙周再生手術的牙周病人,牙周囊袋深度平均減少 5.7 mm,在顯著水準 0.05 下
- 95% “單尾”信賴區間 (5.09, \(\infty\)) 未包含 0
- p-value = \(1.88 \times 10^{-8}\) 小於顯著水準
可以拒絕 \(H_0\)。接受牙周再生手術能顯著減少牙周囊袋平均深度。
將同一病人治療前後之牙周囊袋深度相減,再進行 one-sample t-test,結果一樣。
t.test(ppd$diff, alternative = "greater")
One Sample t-test
data: ppd$diff
t = 17.015, df = 9, p-value = 1.879e-08
alternative hypothesis: true mean is greater than 0
95 percent confidence interval:
5.085915 Inf
sample estimates:
mean of x
5.7
1.3 Independent two-sample t-test
- 兩母體變異數相同 (\(\sigma_1^2 = \sigma_2^2 = \sigma^2\))
- Sample variances: \[ s_1^2 = \frac{\sum_{i = 1}^{n_1} (X_i - \bar{X})^2}{n_1 - 1} ~~~\text{and}~~~ s_2^2 = \frac{\sum_{j = 1}^{n_2} (Y_j - \bar{Y})^2}{n_2 - 1} \]
- Pooled sample variance: \[ s_p^2 = \frac{\sum_{i=1}^{n_1}(X_i-\bar{X})^2 + \sum_{j=1}^{n_2}(Y_j-\bar{Y})^2}{n_1+n_2-2} = \frac{(n_1-1)s_1^2 + (n_2-1)s_2^2}{n_1+n_2-2} \]
- Test statistic: \[ T = \frac{\bar{X} -\bar{Y}}{s_p \sqrt{\frac{1}{n_1} + \frac{1}{n_2}}} \sim t_{df}, ~~~ df = n_1 + n_2 - 2 \]
- 兩母體變異數不同 (\(\sigma_1^2 \ne \sigma_2^2\)) — Welch’s t-test
- Test statistic: \[ T = \frac{\bar{X} -\bar{Y}}{\sqrt{\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}}} \sim t_{df}, ~~~ df = \frac{(\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2})^2} {\frac{(\frac{s_1^2}{n_1})^2}{n_1-1} + \frac{(\frac{s_2^2}{n_2})^2}{n_2-1}} \]
Exercise (3)
欲研究懷孕婦女吸菸是否會影響嬰兒體重, 隨機抽取 14 名吸菸的孕婦和 15 名無吸菸的孕婦,測量嬰兒出生體重。
tibble [29 × 2] (S3: tbl_df/tbl/data.frame)
$ smoking : chr [1:29] "smokers" "smokers" "smokers" "smokers" ...
$ birthweight: num [1:29] 3.18 2.74 2.9 3.27 3.65 3.42 3.23 2.86 3.6 3.65 ...
檢定前先進行基本的探索性分析:
table(smk_wt$smoking)
nonsmokers smokers
15 14
smoking birthweight.Mean birthweight.SD
1 nonsmokers 3.6266667 0.3584025
2 smokers 3.1742857 0.4631450
boxplot(birthweight ~ smoking, data = smk_wt, boxwex = 0.5)
points(aggregate(birthweight ~ smoking, data = smk_wt, FUN = mean), col = "red")令吸菸與無吸菸孕婦生下的嬰兒平均體重分別為 \(\mu_1\) 和 \(\mu_2\),欲檢定之假說為:
\[ \begin{cases} H_0: \mu_1 = \mu_2 \\ H_1: \mu_1 \ne \mu_2 \end{cases} \]
F-test of equality of variances (two samples)
\[ \begin{cases} H_0: \sigma_1 = \sigma_2 \\ H_1: \sigma_1 \ne \sigma_2 \end{cases} \]
var.test(birthweight ~ smoking, data = smk_wt)
F test to compare two variances
data: birthweight by smoking
F = 0.59884, num df = 14, denom df = 13, p-value = 0.353
alternative hypothesis: true ratio of variances is not equal to 1
95 percent confidence interval:
0.1943104 1.8036316
sample estimates:
ratio of variances
0.5988364
p-value = 0.353,在顯著水準 0.05 下無法拒絕 \(H_0\)。 吸菸與無吸菸孕婦生下的嬰兒體重變異數無顯著差異。
t.test(birthweight ~ smoking, data = smk_wt, var.equal = TRUE)
Two Sample t-test
data: birthweight by smoking
t = 2.9535, df = 27, p-value = 0.006437
alternative hypothesis: true difference in means between group nonsmokers and group smokers is not equal to 0
95 percent confidence interval:
0.1381077 0.7666542
sample estimates:
mean in group nonsmokers mean in group smokers
3.626667 3.174286
結論: p-value = 0.006437,在顯著水準 0.05 下可以拒絕 \(H_0\)。 吸菸與無吸菸孕婦生下的嬰兒平均體重有顯著差異。
2 Analysis of Variance (ANOVA)
2.1 Prerequisites
- Independence of observations
- Normality
- Check using Shapiro-Wilk test:
shapiro.test() - If not met, use non-parametric Kruskal-Wallis test:
kruskal.test()
- Check using Shapiro-Wilk test:
- Equality of variances
- Check using Bartlett’s test or Levene’s test:
bartlett.test()orcar::leveneTest() - If not met, use Welch’s F-test:
oneway.test(..., var.equal = FALSE)
- Check using Bartlett’s test or Levene’s test:
2.2 事後檢定 (post-hoc test)
若有 3 組獨立樣本,需進⾏ \(C_2^3 = 3\) 次成對比較。 令一次成對比較犯錯的可容許機率為\(\alpha^*\),整體犯錯的可容許機率為\(\alpha\),則
\[ \begin{aligned} \alpha &= P(\text{3次成對比較至少犯一錯}) \\ &= 1 - P(\text{3次成對比較皆不犯錯}) \\ &= 1 - (1 - \alpha^*)^3 \\ &= 3\alpha^* - 3\alpha^{*2} + \alpha^{*3} \approx 3\alpha^* \end{aligned} \]
為了避免型 I 誤差膨脹,可將一次成對比較犯錯的機率 \(\alpha^*\) 設為 \(\frac{\alpha}{3}\) (等同於將每一次成對比較的 p-value 乘以3), 此做法即為 Bonferroni correction。
Exercise (4)
欲探討三種類型的鐮刀型紅血球疾病(sickle cell disease) 患者的血紅素濃度是否有差異,研究者收集 41 位患者 並記錄他們的疾病類型 (Hb SC, Hb S/b, Hb SS) 與血紅素濃度。
tibble [41 × 2] (S3: tbl_df/tbl/data.frame)
$ group: chr [1:41] "Hb_SC" "Hb_SC" "Hb_SC" "Hb_SC" ...
$ hb : num [1:41] 10.7 11.3 11.5 11.6 11.7 11.8 12 12.1 12.3 12.6 ...
table(hb$group)
Hb_Sb Hb_SC Hb_SS
10 15 16
boxplot(hb ~ group, data = hb, boxwex = 0.5)
points(aggregate(hb ~ group, data = hb, FUN = mean), col = "red")令三種類型的鐮刀型紅血球疾病患者的血紅素平均濃度分別為 \(\mu_{sb}\)、\(\mu_{sc}\)、和 \(\mu_{ss}\),欲檢定之假說為:
\[ \begin{cases} H_0: \mu_{sb} = \mu_{sc} = \mu_{ss} \\ H_1: \text{at least one } \mu_i \ne \mu_j ~~~\text{for}~~~ i,j = \{sb, sc, ss\} \end{cases} \]
hb_aov <- aov(hb ~ group, data = hb)-
Test for normality
\[ \begin{cases} H_0: \text{data come from a normal distribution} \\ H_1: \text{data do not come from a normal distribution} \end{cases} \]
shapiro.test(hb_aov$residuals)Shapiro-Wilk normality test data: hb_aov$residuals W = 0.96544, p-value = 0.2427 -
Test for equality of variances
\[ \begin{cases} H_0: \sigma^2_{sb} = \sigma^2_{sc} = \sigma^2_{ss} \\ H_1: \text{at least one } \sigma^2_i \ne \sigma^2_j ~~~\text{for}~~~ i,j = \{sb, sc, ss\} \end{cases} \]
bartlett.test(hb ~ group, data = hb)Bartlett test of homogeneity of variances data: hb by group Bartlett's K-squared = 2.1251, df = 2, p-value = 0.3456 -
ANOVA
結論: p-value = \(2.28 \times 10^{-11}\), 在顯著水準 0.05 下可以拒絕 \(H_0\)。 三種類型的鐮刀型紅血球疾病患者的血紅素平均濃度並非完全相同。
Bonferroni correction
為了找出哪種類型的鐮刀型紅血球疾病的血紅素平均濃度與其他類型有差異, 需進行事後檢定。
pairwise.t.test(hb$hb, hb$group, p.adjust.method = "bonf")
Pairwise comparisons using t tests with pooled SD
data: hb$hb and hb$group
Hb_Sb Hb_SC
Hb_SC 0.00064 -
Hb_SS 8.4e-05 1.1e-11
P value adjustment method: bonferroni
這裡的 p-values 已經過調整 (乘以成對比較的次數), 還是用原本設定的顯著水準 (e.g. 0.05) 當作標準。
結論: 在顯著水準 0.05 下,任兩種類型的鐮刀型紅血球疾病的血紅素平均濃度皆有顯著差異。
Bonferroni correction 偏保守 (高估 p-value),較不容易看出成對比較的差別。可改用其他校正方法:
p.adjust.method = "..."-
"holm": Holm (1979) -
"hochberg": Hochberg (1988) -
"hommel": Hommel (1988) -
"BH"(alias"fdr"): Benjamini & Hochberg (1995) -
"BY": Benjamini & Yekutieli (2001)
Tukey’s HSD (honestly significant difference) test
TukeyHSD(hb_aov) Tukey multiple comparisons of means
95% family-wise confidence level
Fit: aov(formula = hb ~ group, data = hb)
$group
diff lwr upr p adj
Hb_SC-Hb_Sb 1.6700 0.6748973 2.6651027 0.0006147
Hb_SS-Hb_Sb -1.9175 -2.9000852 -0.9349148 0.0000819
Hb_SS-Hb_SC -3.5875 -4.4635296 -2.7114704 0.0000000