Resposta | Preditoras | Teste | Hipótese |
---|---|---|---|
Categórica | Categórica | Qui-quadrado | independência |
Contínua | Categórica(2) | Teste-t | \(\mu _1 = \mu_2\) |
Contínua | Categórica (>2) | Anova | \(\mu_1 = \mu_2 = \mu_3\) |
Contínua | 1 Contínua | Regressão | \(\beta_1 = 0\) |
Contínua | >1 Contínua | Reg. múltipla | \(\beta_1 = 0; \beta_n = 0\) |
Contínua | Cont + Cat | Ancova | \(\beta_1 = \beta_2; \alpha_1 = \alpha_2\) |
Proporção | Contínua | Reg. Logística | \(logit(\beta_1) = 1\) |
\[ y = \hat{\alpha} + \hat{\beta} x + \epsilon\] \[ \epsilon = N(0, \sigma) \]
\[ y = \hat{\alpha} + \hat{\beta} x + \epsilon\] \[ \epsilon = N(0, \sigma) \]
Simulando dados
## [1] 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0
## [1] 4.757714 8.962123 13.969613 8.674061 12.799371 14.831051 17.769887
## [8] 16.900755 23.961185
Estimar os parâmetros:
\[ y = \hat{\alpha} + \hat{\beta} x + \epsilon\]
\[ y = \bar{y} ; \beta = 0\]
\[ d = y_i - \hat{y}_i \]
\[ RSS = \sum{(y_i - \hat{y}_i)^2} \]
\[ y = \hat{\alpha} + \hat{\beta} x + \epsilon\]
##
## Call:
## lm(formula = y1 ~ x1, data = xy)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.0446 -1.2415 -0.7005 1.0564 4.1574
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.1864 2.1097 1.036 0.334505
## x1 3.8129 0.6459 5.903 0.000598 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.502 on 7 degrees of freedom
## Multiple R-squared: 0.8327, Adjusted R-squared: 0.8088
## F-statistic: 34.84 on 1 and 7 DF, p-value: 0.0005978
\[ N(3 + 4x, 2.5) \equiv 3 + 4 x + N(0, 2.5) \]
\[ y = 3 + 4x + (N, 2.5)\]
## (Intercept) x1
## 2.186353 3.812911
## 2.5 % 97.5 %
## (Intercept) -2.802189 7.174894
## x1 2.285488 5.340333
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.186353 2.1096552 1.036355 0.3345049717
## x1 3.812911 0.6459473 5.902820 0.0005977766
## [1] 2.501743
## 'data.frame': 9 obs. of 2 variables:
## $ growth: int 12 10 8 11 6 7 2 3 3
## $ tannin: int 0 1 2 3 4 5 6 7 8
growth | tannin |
---|---|
12 | 0 |
10 | 1 |
8 | 2 |
11 | 3 |
6 | 4 |
7 | 5 |
##
## Call:
## lm(formula = growth ~ tannin, data = lag)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.4556 -0.8889 -0.2389 0.9778 2.8944
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 11.7556 1.0408 11.295 9.54e-06 ***
## tannin -1.2167 0.2186 -5.565 0.000846 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.693 on 7 degrees of freedom
## Multiple R-squared: 0.8157, Adjusted R-squared: 0.7893
## F-statistic: 30.97 on 1 and 7 DF, p-value: 0.0008461
## (Intercept) tannin
## 11.755556 -1.216667
\[ y = \hat{\alpha} + \hat{\beta} x + \epsilon\]
##
## Call:
## lm(formula = growth ~ tannin, data = lag)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.4556 -0.8889 -0.2389 0.9778 2.8944
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 11.7556 1.0408 11.295 9.54e-06 ***
## tannin -1.2167 0.2186 -5.565 0.000846 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.693 on 7 degrees of freedom
## Multiple R-squared: 0.8157, Adjusted R-squared: 0.7893
## F-statistic: 30.97 on 1 and 7 DF, p-value: 0.0008461
## 1 2 3 4 5 6 7 8 9
## 0.24 -0.54 -1.32 2.89 -0.89 1.33 -2.46 -0.24 0.98
##
## Call:
## lm(formula = growth ~ tannin, data = lag)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.4556 -0.8889 -0.2389 0.9778 2.8944
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 11.7556 1.0408 11.295 9.54e-06 ***
## tannin -1.2167 0.2186 -5.565 0.000846 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.693 on 7 degrees of freedom
## Multiple R-squared: 0.8157, Adjusted R-squared: 0.7893
## F-statistic: 30.97 on 1 and 7 DF, p-value: 0.0008461
## 1 2 3 4 5 6 7 8 9
## 0.24 -0.54 -1.32 2.89 -0.89 1.33 -2.46 -0.24 0.98
## [1] 1.693358
## Analysis of Variance Table
##
## Response: growth
## Df Sum Sq Mean Sq F value Pr(>F)
## tannin 1 88.817 88.817 30.974 0.0008461 ***
## Residuals 7 20.072 2.867
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
\[ SS_{total} = SS_{entre} + SS_{intra} \]
\[ SS_{total} = SS_{regr} + SS_{residuos} \]
\(SS_{total} = \sum_{i=1}^n (y_{i} - \bar{y})^2\)
## [1] 5.1111111 3.1111111 1.1111111 4.1111111 -0.8888889 0.1111111
## [7] -4.8888889 -3.8888889 -3.8888889
## [1] 26.12345679 9.67901235 1.23456790 16.90123457 0.79012346 0.01234568
## [7] 23.90123457 15.12345679 15.12345679
## [1] 108.8889
\(SS_{error} = \sum_{i=1}^n (y_{i} - \hat{y})^2\)
## (Intercept) tannin
## 11.755556 -1.216667
## [1] 11.755556 10.538889 9.322222 8.105556 6.888889 5.672222 4.455556
## [8] 3.238889 2.022222
## [1] 12 10 8 11 6 7 2 3 3
\[SS_{error} = \sum_{i=1}^n (y_{i} - \hat{y})^2\]
## [1] 20.07222
\[ SS_{total} = SS_{regr} + SS_{erro} \]
Fonte | SumSquare | GL | MeanSquare |
---|---|---|---|
Regressão | 88.82 | 1 | 88.82 |
Erro | 20.07 | 7 | 2.87 |
Total | 108.89 | 8 |
## [1] 0.8156633
## [1] 30.97398
## [1] 0.0008460738
## Analysis of Variance Table
##
## Response: growth
## Df Sum Sq Mean Sq F value Pr(>F)
## tannin 1 88.817 88.817 30.974 0.0008461 ***
## Residuals 7 20.072 2.867
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Analysis of Variance Table
##
## Model 1: growth ~ 1
## Model 2: growth ~ tannin
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 8 108.889
## 2 7 20.072 1 88.817 30.974 0.0008461 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Df | Sum Sq | Mean Sq | F value | Pr(>F) | |
---|---|---|---|---|---|
tannin | 1 | 88.81667 | 88.81667 | 30.97398 | 0.0008461 |
Residuals | 7 | 20.07222 | 2.86746 |
Res.Df | RSS | Df | Sum of Sq | F | Pr(>F) |
---|---|---|---|---|---|
8 | 108.88889 | ||||
7 | 20.07222 | 1 | 88.81667 | 30.97398 | 0.0008461 |
##
## Call:
## lm(formula = growth ~ tannin, data = lag)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.4556 -0.8889 -0.2389 0.9778 2.8944
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 11.7556 1.0408 11.295 9.54e-06 ***
## tannin -1.2167 0.2186 -5.565 0.000846 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.693 on 7 degrees of freedom
## Multiple R-squared: 0.8157, Adjusted R-squared: 0.7893
## F-statistic: 30.97 on 1 and 7 DF, p-value: 0.0008461
par( mfrow= c(2,2),mar=c(4,4.5,2,2), cex.lab=1.2, cex.axis=1.2, las=1, bg = "gray80", bty="l", pch=16)
plot(lmlag)
NÃO DESESPERE, ESPERE! KEEP CALM!!
solos <- read.table("/home/aao/Ale2016/AleCursos/Planejamento&Analise/dados/crop.csv", header = TRUE, as.is=TRUE, sep="\t")
str(solos)
## 'data.frame': 30 obs. of 2 variables:
## $ solo : chr "are" "are" "are" "are" ...
## $ colhe: int 6 10 8 6 14 17 9 11 7 11 ...
## Analysis of Variance Table
##
## Response: colhe
## Df Sum Sq Mean Sq F value Pr(>F)
## solo 2 99.2 49.600 4.2447 0.02495 *
## Residuals 27 315.5 11.685
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## [1] "are" "are" "are" "are" "are" "are" "are" "are" "are" "are" "arg"
## [12] "arg" "arg" "arg" "arg" "arg" "arg" "arg" "arg" "arg" "hum" "hum"
## [23] "hum" "hum" "hum" "hum" "hum" "hum" "hum" "hum"
colhe | solo | arg | hum | |
---|---|---|---|---|
1 | 6 | are | 0 | 0 |
2 | 10 | are | 0 | 0 |
3 | 8 | are | 0 | 0 |
11 | 17 | arg | 1 | 0 |
12 | 15 | arg | 1 | 0 |
13 | 3 | arg | 1 | 0 |
21 | 13 | hum | 0 | 1 |
22 | 16 | hum | 0 | 1 |
23 | 9 | hum | 0 | 1 |
Número de níveis do fator menos 1 (intercepto)
\(y = \alpha_{d_1} + \beta_{2} x_{d_2}+ \beta_3 x_{d_3}\)
\(\alpha_{d_1} = \bar{x}_1\)
\(\beta_{2}= \bar{x}_2 - \bar{x}_1\)
\(\beta_{3}= \bar{x}_3 - \bar{x}_1\)
##
## Call:
## lm(formula = colhe ~ arg + hum, data = soloslin)
##
## Residuals:
## Min 1Q Median 3Q Max
## -8.5 -1.8 0.3 1.7 7.1
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 9.900 1.081 9.158 9.04e-10 ***
## arg 1.600 1.529 1.047 0.30456
## hum 4.400 1.529 2.878 0.00773 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.418 on 27 degrees of freedom
## Multiple R-squared: 0.2392, Adjusted R-squared: 0.1829
## F-statistic: 4.245 on 2 and 27 DF, p-value: 0.02495
##
## Call:
## lm(formula = colhe ~ solo, data = solos)
##
## Residuals:
## Min 1Q Median 3Q Max
## -8.5 -1.8 0.3 1.7 7.1
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 9.900 1.081 9.158 9.04e-10 ***
## soloarg 1.600 1.529 1.047 0.30456
## solohum 4.400 1.529 2.878 0.00773 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.418 on 27 degrees of freedom
## Multiple R-squared: 0.2392, Adjusted R-squared: 0.1829
## F-statistic: 4.245 on 2 and 27 DF, p-value: 0.02495
## (Intercept) arg hum
## 9.9 1.6 4.4
## are arg hum
## 9.9 11.5 14.3
\[y = \hat{\alpha}_{d_1} + \hat{\beta}_{2} x_{d_2}+ \hat{\beta}_3 x_{d_3}\]
\[y = \alpha_{d_1} + \beta_{2} x_{d_2}+ \beta_3 x_{d_3}\]
\(\alpha_{d_1} = \bar{x}_1\)
\(\beta_{2}= \bar{x}_2 - \bar{x}_1\)
\(\beta_{3}= \bar{x}_3 - \bar{x}_1\)
## 'data.frame': 200 obs. of 5 variables:
## $ sex : Factor w/ 2 levels "F","M": 2 1 1 2 1 2 2 2 2 2 ...
## $ weight: int 77 58 53 68 59 76 76 69 71 65 ...
## $ height: int 182 161 161 177 157 170 167 186 178 171 ...
## $ repwt : int 77 51 54 70 59 76 77 73 71 64 ...
## $ repht : int 180 159 158 175 155 165 165 180 175 170 ...
##
## Call:
## lm(formula = weight ~ height, data = Davis)
##
## Residuals:
## Min 1Q Median 3Q Max
## -19.928 -5.406 -0.651 4.891 42.641
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -130.84185 12.30184 -10.64 <2e-16 ***
## height 1.15112 0.07193 16.00 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 8.635 on 178 degrees of freedom
## Multiple R-squared: 0.5899, Adjusted R-squared: 0.5876
## F-statistic: 256.1 on 1 and 178 DF, p-value: < 2.2e-16
## Analysis of Variance Table
##
## Response: weight
## Df Sum Sq Mean Sq F value Pr(>F)
## height 1 19095 19095.0 256.08 < 2.2e-16 ***
## Residuals 178 13273 74.6
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Analysis of Variance Table
##
## Model 1: weight ~ 1
## Model 2: weight ~ height
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 179 32368
## 2 178 13273 1 19095 256.08 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
\(p_{valor} = 2.2e-16\)
\(p_{valor} = 2.2 * 10^{-16}\)
\(r^2 = 0.587\)
##
## Call:
## lm(formula = weight ~ height + sex, data = Davis)
##
## Coefficients:
## (Intercept) height sexM
## -80.2107 0.8341 7.7070
sexo: variável dummy com dois níveis (mulher = 0, homem = 1)
##
## Call:
## lm(formula = weight ~ height + sex, data = Davis)
##
## Residuals:
## Min 1Q Median 3Q Max
## -20.302 -4.808 -0.335 5.239 41.366
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -80.2107 16.8415 -4.763 3.96e-06 ***
## height 0.8341 0.1021 8.169 5.71e-14 ***
## sexM 7.7070 1.8345 4.201 4.20e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 8.258 on 177 degrees of freedom
## Multiple R-squared: 0.6271, Adjusted R-squared: 0.6229
## F-statistic: 148.8 on 2 and 177 DF, p-value: < 2.2e-16
lm(weight ~ height + sex, data = Davis)
## (Intercept) height sexM
## -80.2107328 0.8340964 7.7070166
\[w_f = \hat{\alpha}+ \hat{\beta_s} sex + \hat{\beta_h} *height\] \[w_f = \hat{\alpha} + \hat{\beta_h} * height\]
\[w_h = \hat{\alpha} + \hat{\beta_s}* sex + \hat{\beta} * height\] \[w_h = \hat{\alpha}+ \hat{\beta_s} + \hat{\beta_h} * height\]
##
## Call:
## lm(formula = weight ~ height + sex * height, data = Davis)
##
## Residuals:
## Min 1Q Median 3Q Max
## -20.990 -4.548 -0.926 4.821 41.023
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -45.7988 24.8453 -1.843 0.0670 .
## height 0.6252 0.1507 4.148 5.22e-05 ***
## sexM -57.4326 34.8293 -1.649 0.1009
## height:sexM 0.3815 0.2037 1.873 0.0628 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 8.2 on 176 degrees of freedom
## Multiple R-squared: 0.6344, Adjusted R-squared: 0.6282
## F-statistic: 101.8 on 3 and 176 DF, p-value: < 2.2e-16
lm(weight ~ height + sex*height, data=Davis)
## (Intercept) height sexM height:sexM
## -45.7988220 0.6252035 -57.4326307 0.3815088
\[w = \hat{\alpha}+ \hat{\beta_s} sex + \hat{\beta_h} height + \hat{\beta}_{s:h} sex* height\] \[w_m = \hat{\alpha} + \hat{\beta_h} height\]
\[w = \hat{\alpha} + \hat{\beta_s} sex + \hat{\beta_h} height + \hat{\beta}_{h:s} sex * height \] \[w_h = \hat{\alpha}+ \hat{\beta_s} + (\hat{\beta_h} + \hat{\beta}_{h:s}) * height\]
\[w = \hat{\alpha}+ \hat{\beta_s} sex + \hat{\beta_h} height + \hat{\beta}_{s:h} sex* height\] \[sex =0\]
## (Intercept) height sexM height:sexM
## -45.7988220 0.6252035 -57.4326307 0.3815088
## [1] 54.85893
\[w = \hat{\alpha}+ \hat{\beta_s} sex + \hat{\beta_h} height + \hat{\beta}_{s:h} sex* height\] \[ sex = 1\]
## (Intercept) height sexM height:sexM
## -45.7988220 0.6252035 -57.4326307 0.3815088
predHomem <- (coefull[1]+ coefull[3]) + (coefull[2]
+ coefull[4]) * 182
(predHomem <- as.numeric(predHomem))
## [1] 79.99018
Compare o modelo anterior com o simplificado
* retenha o modelo mais simples
* continue simplificando
* retenha o modelo complexo
* este é o modelo MINÍMO ADEQUADO
## Analysis of Variance Table
##
## Model 1: weight ~ height + sex * height
## Model 2: weight ~ height + sex
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 176 11833
## 2 177 12069 -1 -235.82 3.5075 0.06275 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Analysis of Variance Table
##
## Model 1: weight ~ height + sex
## Model 2: weight ~ height
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 177 12069
## 2 178 13273 -1 -1203.5 17.65 4.204e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Call:
## lm(formula = weight ~ height + sex, data = Davis)
##
## Residuals:
## Min 1Q Median 3Q Max
## -20.302 -4.808 -0.335 5.239 41.366
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -80.2107 16.8415 -4.763 3.96e-06 ***
## height 0.8341 0.1021 8.169 5.71e-14 ***
## sexM 7.7070 1.8345 4.201 4.20e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 8.258 on 177 degrees of freedom
## Multiple R-squared: 0.6271, Adjusted R-squared: 0.6229
## F-statistic: 148.8 on 2 and 177 DF, p-value: < 2.2e-16
## (Intercept) height sexM
## -80.2107328 0.8340964 7.7070166
## 2.5 % 97.5 %
## (Intercept) -113.44661 -46.974852
## height 0.63259 1.035603
## sexM 4.08671 11.327323