| Resposta | Preditoras | Teste | Hipótese | 
|---|---|---|---|
| Categórica | Categórica | Qui-quadrado | independência | 
| Contínua | Categórica(2) | Teste-t | \(\mu _1 = \mu_2\) | 
| Contínua | Categórica (>2) | Anova | \(\mu_1 = \mu_2 = \mu_3\) | 
| Contínua | 1 Contínua | Regressão | \(\beta_1 = 0\) | 
| Contínua | >1 Contínua | Reg. múltipla | \(\beta_1 = 0; \beta_n = 0\) | 
| Contínua | Cont + Cat | Ancova | \(\beta_1 = \beta_2; \alpha_1 = \alpha_2\) | 
| Proporção | Contínua | Reg. Logística | \(logit(\beta_1) = 1\) | 
\[ y = \hat{\alpha} + \hat{\beta} x + \epsilon\] \[ \epsilon = N(0, \sigma) \]
\[ y = \hat{\alpha} + \hat{\beta} x + \epsilon\] \[ \epsilon = N(0, \sigma) \]
Simulando dados
## [1] 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0## [1]  4.757714  8.962123 13.969613  8.674061 12.799371 14.831051 17.769887
## [8] 16.900755 23.961185Estimar os parâmetros:
\[ y = \hat{\alpha} + \hat{\beta} x + \epsilon\]
\[ y = \bar{y} ; \beta = 0\]
\[ d = y_i - \hat{y}_i \]
\[ RSS = \sum{(y_i - \hat{y}_i)^2} \]
\[ y = \hat{\alpha} + \hat{\beta} x + \epsilon\]
## 
## Call:
## lm(formula = y1 ~ x1, data = xy)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.0446 -1.2415 -0.7005  1.0564  4.1574 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   2.1864     2.1097   1.036 0.334505    
## x1            3.8129     0.6459   5.903 0.000598 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.502 on 7 degrees of freedom
## Multiple R-squared:  0.8327, Adjusted R-squared:  0.8088 
## F-statistic: 34.84 on 1 and 7 DF,  p-value: 0.0005978\[ N(3 + 4x, 2.5) \equiv 3 + 4 x + N(0, 2.5) \]
\[ y = 3 + 4x + (N, 2.5)\]
## (Intercept)          x1 
##    2.186353    3.812911##                 2.5 %   97.5 %
## (Intercept) -2.802189 7.174894
## x1           2.285488 5.340333##             Estimate Std. Error  t value     Pr(>|t|)
## (Intercept) 2.186353  2.1096552 1.036355 0.3345049717
## x1          3.812911  0.6459473 5.902820 0.0005977766## [1] 2.501743## 'data.frame':    9 obs. of  2 variables:
##  $ growth: int  12 10 8 11 6 7 2 3 3
##  $ tannin: int  0 1 2 3 4 5 6 7 8| growth | tannin | 
|---|---|
| 12 | 0 | 
| 10 | 1 | 
| 8 | 2 | 
| 11 | 3 | 
| 6 | 4 | 
| 7 | 5 | 
## 
## Call:
## lm(formula = growth ~ tannin, data = lag)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -2.4556 -0.8889 -0.2389  0.9778  2.8944 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  11.7556     1.0408  11.295 9.54e-06 ***
## tannin       -1.2167     0.2186  -5.565 0.000846 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.693 on 7 degrees of freedom
## Multiple R-squared:  0.8157, Adjusted R-squared:  0.7893 
## F-statistic: 30.97 on 1 and 7 DF,  p-value: 0.0008461## (Intercept)      tannin 
##   11.755556   -1.216667\[ y = \hat{\alpha} + \hat{\beta} x + \epsilon\]
## 
## Call:
## lm(formula = growth ~ tannin, data = lag)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -2.4556 -0.8889 -0.2389  0.9778  2.8944 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  11.7556     1.0408  11.295 9.54e-06 ***
## tannin       -1.2167     0.2186  -5.565 0.000846 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.693 on 7 degrees of freedom
## Multiple R-squared:  0.8157, Adjusted R-squared:  0.7893 
## F-statistic: 30.97 on 1 and 7 DF,  p-value: 0.0008461##     1     2     3     4     5     6     7     8     9 
##  0.24 -0.54 -1.32  2.89 -0.89  1.33 -2.46 -0.24  0.98## 
## Call:
## lm(formula = growth ~ tannin, data = lag)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -2.4556 -0.8889 -0.2389  0.9778  2.8944 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  11.7556     1.0408  11.295 9.54e-06 ***
## tannin       -1.2167     0.2186  -5.565 0.000846 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.693 on 7 degrees of freedom
## Multiple R-squared:  0.8157, Adjusted R-squared:  0.7893 
## F-statistic: 30.97 on 1 and 7 DF,  p-value: 0.0008461##     1     2     3     4     5     6     7     8     9 
##  0.24 -0.54 -1.32  2.89 -0.89  1.33 -2.46 -0.24  0.98## [1] 1.693358## Analysis of Variance Table
## 
## Response: growth
##           Df Sum Sq Mean Sq F value    Pr(>F)    
## tannin     1 88.817  88.817  30.974 0.0008461 ***
## Residuals  7 20.072   2.867                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1\[ SS_{total} = SS_{entre} + SS_{intra} \]
\[ SS_{total} = SS_{regr} + SS_{residuos} \]
\(SS_{total} = \sum_{i=1}^n (y_{i} - \bar{y})^2\)
## [1]  5.1111111  3.1111111  1.1111111  4.1111111 -0.8888889  0.1111111
## [7] -4.8888889 -3.8888889 -3.8888889## [1] 26.12345679  9.67901235  1.23456790 16.90123457  0.79012346  0.01234568
## [7] 23.90123457 15.12345679 15.12345679## [1] 108.8889\(SS_{error} = \sum_{i=1}^n (y_{i} - \hat{y})^2\)
## (Intercept)      tannin 
##   11.755556   -1.216667## [1] 11.755556 10.538889  9.322222  8.105556  6.888889  5.672222  4.455556
## [8]  3.238889  2.022222## [1] 12 10  8 11  6  7  2  3  3\[SS_{error} = \sum_{i=1}^n (y_{i} - \hat{y})^2\]
## [1] 20.07222\[ SS_{total} = SS_{regr} + SS_{erro} \]
| Fonte | SumSquare | GL | MeanSquare | 
|---|---|---|---|
| Regressão | 88.82 | 1 | 88.82 | 
| Erro | 20.07 | 7 | 2.87 | 
| Total | 108.89 | 8 | 
## [1] 0.8156633## [1] 30.97398## [1] 0.0008460738## Analysis of Variance Table
## 
## Response: growth
##           Df Sum Sq Mean Sq F value    Pr(>F)    
## tannin     1 88.817  88.817  30.974 0.0008461 ***
## Residuals  7 20.072   2.867                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1## Analysis of Variance Table
## 
## Model 1: growth ~ 1
## Model 2: growth ~ tannin
##   Res.Df     RSS Df Sum of Sq      F    Pr(>F)    
## 1      8 108.889                                  
## 2      7  20.072  1    88.817 30.974 0.0008461 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1| Df | Sum Sq | Mean Sq | F value | Pr(>F) | |
|---|---|---|---|---|---|
| tannin | 1 | 88.81667 | 88.81667 | 30.97398 | 0.0008461 | 
| Residuals | 7 | 20.07222 | 2.86746 | 
| Res.Df | RSS | Df | Sum of Sq | F | Pr(>F) | 
|---|---|---|---|---|---|
| 8 | 108.88889 | ||||
| 7 | 20.07222 | 1 | 88.81667 | 30.97398 | 0.0008461 | 
## 
## Call:
## lm(formula = growth ~ tannin, data = lag)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -2.4556 -0.8889 -0.2389  0.9778  2.8944 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  11.7556     1.0408  11.295 9.54e-06 ***
## tannin       -1.2167     0.2186  -5.565 0.000846 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.693 on 7 degrees of freedom
## Multiple R-squared:  0.8157, Adjusted R-squared:  0.7893 
## F-statistic: 30.97 on 1 and 7 DF,  p-value: 0.0008461par( mfrow= c(2,2),mar=c(4,4.5,2,2), cex.lab=1.2, cex.axis=1.2, las=1, bg = "gray80", bty="l", pch=16)
plot(lmlag)NÃO DESESPERE, ESPERE! KEEP CALM!!
solos <- read.table("/home/aao/Ale2016/AleCursos/Planejamento&Analise/dados/crop.csv", header = TRUE, as.is=TRUE, sep="\t")
str(solos)## 'data.frame':    30 obs. of  2 variables:
##  $ solo : chr  "are" "are" "are" "are" ...
##  $ colhe: int  6 10 8 6 14 17 9 11 7 11 ...## Analysis of Variance Table
## 
## Response: colhe
##           Df Sum Sq Mean Sq F value  Pr(>F)  
## solo       2   99.2  49.600  4.2447 0.02495 *
## Residuals 27  315.5  11.685                  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1##  [1] "are" "are" "are" "are" "are" "are" "are" "are" "are" "are" "arg"
## [12] "arg" "arg" "arg" "arg" "arg" "arg" "arg" "arg" "arg" "hum" "hum"
## [23] "hum" "hum" "hum" "hum" "hum" "hum" "hum" "hum"| colhe | solo | arg | hum | |
|---|---|---|---|---|
| 1 | 6 | are | 0 | 0 | 
| 2 | 10 | are | 0 | 0 | 
| 3 | 8 | are | 0 | 0 | 
| 11 | 17 | arg | 1 | 0 | 
| 12 | 15 | arg | 1 | 0 | 
| 13 | 3 | arg | 1 | 0 | 
| 21 | 13 | hum | 0 | 1 | 
| 22 | 16 | hum | 0 | 1 | 
| 23 | 9 | hum | 0 | 1 | 
Número de níveis do fator menos 1 (intercepto)
\(y = \alpha_{d_1} + \beta_{2} x_{d_2}+ \beta_3 x_{d_3}\)
\(\alpha_{d_1} = \bar{x}_1\)
\(\beta_{2}= \bar{x}_2 - \bar{x}_1\)
\(\beta_{3}= \bar{x}_3 - \bar{x}_1\)
## 
## Call:
## lm(formula = colhe ~ arg + hum, data = soloslin)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
##   -8.5   -1.8    0.3    1.7    7.1 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)    9.900      1.081   9.158 9.04e-10 ***
## arg            1.600      1.529   1.047  0.30456    
## hum            4.400      1.529   2.878  0.00773 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.418 on 27 degrees of freedom
## Multiple R-squared:  0.2392, Adjusted R-squared:  0.1829 
## F-statistic: 4.245 on 2 and 27 DF,  p-value: 0.02495## 
## Call:
## lm(formula = colhe ~ solo, data = solos)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
##   -8.5   -1.8    0.3    1.7    7.1 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)    9.900      1.081   9.158 9.04e-10 ***
## soloarg        1.600      1.529   1.047  0.30456    
## solohum        4.400      1.529   2.878  0.00773 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.418 on 27 degrees of freedom
## Multiple R-squared:  0.2392, Adjusted R-squared:  0.1829 
## F-statistic: 4.245 on 2 and 27 DF,  p-value: 0.02495## (Intercept)         arg         hum 
##         9.9         1.6         4.4##  are  arg  hum 
##  9.9 11.5 14.3\[y = \hat{\alpha}_{d_1} + \hat{\beta}_{2} x_{d_2}+ \hat{\beta}_3 x_{d_3}\]
\[y = \alpha_{d_1} + \beta_{2} x_{d_2}+ \beta_3 x_{d_3}\]
\(\alpha_{d_1} = \bar{x}_1\)
\(\beta_{2}= \bar{x}_2 - \bar{x}_1\)
\(\beta_{3}= \bar{x}_3 - \bar{x}_1\)
## 'data.frame':    200 obs. of  5 variables:
##  $ sex   : Factor w/ 2 levels "F","M": 2 1 1 2 1 2 2 2 2 2 ...
##  $ weight: int  77 58 53 68 59 76 76 69 71 65 ...
##  $ height: int  182 161 161 177 157 170 167 186 178 171 ...
##  $ repwt : int  77 51 54 70 59 76 77 73 71 64 ...
##  $ repht : int  180 159 158 175 155 165 165 180 175 170 ...## 
## Call:
## lm(formula = weight ~ height, data = Davis)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -19.928  -5.406  -0.651   4.891  42.641 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -130.84185   12.30184  -10.64   <2e-16 ***
## height         1.15112    0.07193   16.00   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 8.635 on 178 degrees of freedom
## Multiple R-squared:  0.5899, Adjusted R-squared:  0.5876 
## F-statistic: 256.1 on 1 and 178 DF,  p-value: < 2.2e-16## Analysis of Variance Table
## 
## Response: weight
##            Df Sum Sq Mean Sq F value    Pr(>F)    
## height      1  19095 19095.0  256.08 < 2.2e-16 ***
## Residuals 178  13273    74.6                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1## Analysis of Variance Table
## 
## Model 1: weight ~ 1
## Model 2: weight ~ height
##   Res.Df   RSS Df Sum of Sq      F    Pr(>F)    
## 1    179 32368                                  
## 2    178 13273  1     19095 256.08 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1\(p_{valor} = 2.2e-16\)
\(p_{valor} = 2.2 * 10^{-16}\)
\(r^2 = 0.587\)
## 
## Call:
## lm(formula = weight ~ height + sex, data = Davis)
## 
## Coefficients:
## (Intercept)       height         sexM  
##    -80.2107       0.8341       7.7070sexo: variável dummy com dois níveis (mulher = 0, homem = 1)
## 
## Call:
## lm(formula = weight ~ height + sex, data = Davis)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -20.302  -4.808  -0.335   5.239  41.366 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -80.2107    16.8415  -4.763 3.96e-06 ***
## height        0.8341     0.1021   8.169 5.71e-14 ***
## sexM          7.7070     1.8345   4.201 4.20e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 8.258 on 177 degrees of freedom
## Multiple R-squared:  0.6271, Adjusted R-squared:  0.6229 
## F-statistic: 148.8 on 2 and 177 DF,  p-value: < 2.2e-16lm(weight ~ height + sex, data = Davis)
## (Intercept)      height        sexM 
## -80.2107328   0.8340964   7.7070166\[w_f = \hat{\alpha}+ \hat{\beta_s} sex + \hat{\beta_h} *height\] \[w_f = \hat{\alpha} + \hat{\beta_h} * height\]
\[w_h = \hat{\alpha} + \hat{\beta_s}* sex + \hat{\beta} * height\] \[w_h = \hat{\alpha}+ \hat{\beta_s} + \hat{\beta_h} * height\]
## 
## Call:
## lm(formula = weight ~ height + sex * height, data = Davis)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -20.990  -4.548  -0.926   4.821  41.023 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -45.7988    24.8453  -1.843   0.0670 .  
## height        0.6252     0.1507   4.148 5.22e-05 ***
## sexM        -57.4326    34.8293  -1.649   0.1009    
## height:sexM   0.3815     0.2037   1.873   0.0628 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 8.2 on 176 degrees of freedom
## Multiple R-squared:  0.6344, Adjusted R-squared:  0.6282 
## F-statistic: 101.8 on 3 and 176 DF,  p-value: < 2.2e-16lm(weight ~ height + sex*height, data=Davis)
## (Intercept)      height        sexM height:sexM 
## -45.7988220   0.6252035 -57.4326307   0.3815088\[w = \hat{\alpha}+ \hat{\beta_s} sex + \hat{\beta_h} height + \hat{\beta}_{s:h} sex* height\] \[w_m = \hat{\alpha} + \hat{\beta_h} height\]
\[w = \hat{\alpha} + \hat{\beta_s} sex + \hat{\beta_h} height + \hat{\beta}_{h:s} sex * height \] \[w_h = \hat{\alpha}+ \hat{\beta_s} + (\hat{\beta_h} + \hat{\beta}_{h:s}) * height\]
\[w = \hat{\alpha}+ \hat{\beta_s} sex + \hat{\beta_h} height + \hat{\beta}_{s:h} sex* height\] \[sex =0\]
## (Intercept)      height        sexM height:sexM 
## -45.7988220   0.6252035 -57.4326307   0.3815088## [1] 54.85893\[w = \hat{\alpha}+ \hat{\beta_s} sex + \hat{\beta_h} height + \hat{\beta}_{s:h} sex* height\] \[ sex = 1\]
## (Intercept)      height        sexM height:sexM 
## -45.7988220   0.6252035 -57.4326307   0.3815088predHomem <- (coefull[1]+ coefull[3]) + (coefull[2]
               + coefull[4]) * 182 
(predHomem <- as.numeric(predHomem))## [1] 79.99018Compare o modelo anterior com o simplificado
* retenha o modelo mais simples
* continue simplificando* retenha o modelo complexo 
* este é o modelo MINÍMO ADEQUADO## Analysis of Variance Table
## 
## Model 1: weight ~ height + sex * height
## Model 2: weight ~ height + sex
##   Res.Df   RSS Df Sum of Sq      F  Pr(>F)  
## 1    176 11833                              
## 2    177 12069 -1   -235.82 3.5075 0.06275 .
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1## Analysis of Variance Table
## 
## Model 1: weight ~ height + sex
## Model 2: weight ~ height
##   Res.Df   RSS Df Sum of Sq     F    Pr(>F)    
## 1    177 12069                                 
## 2    178 13273 -1   -1203.5 17.65 4.204e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1## 
## Call:
## lm(formula = weight ~ height + sex, data = Davis)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -20.302  -4.808  -0.335   5.239  41.366 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -80.2107    16.8415  -4.763 3.96e-06 ***
## height        0.8341     0.1021   8.169 5.71e-14 ***
## sexM          7.7070     1.8345   4.201 4.20e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 8.258 on 177 degrees of freedom
## Multiple R-squared:  0.6271, Adjusted R-squared:  0.6229 
## F-statistic: 148.8 on 2 and 177 DF,  p-value: < 2.2e-16## (Intercept)      height        sexM 
## -80.2107328   0.8340964   7.7070166##                  2.5 %     97.5 %
## (Intercept) -113.44661 -46.974852
## height         0.63259   1.035603
## sexM           4.08671  11.327323