지난 시간에 했던 데이터 그대로 사용
library(corrgram)
vars2 <- c("Assists","Atbat","Errors","Hits", "Homer", "logSal", "Putouts","RBI","Runs","Walks","Years")
cor.data <- baseball[,vars2]
train.data <- cor.data[-100,]
Registered S3 methods overwritten by 'ggplot2': method from [.quosures rlang c.quosures rlang print.quosures rlang Registered S3 method overwritten by 'seriation': method from reorder.hclust gclus
두 모형 생성
lm.mult1 <- lm(logSal ~ RBI + Assists + Hits, data = train.data)
lm.mult2 <- lm(logSal ~ RBI + Assists + Hits + Runs, data = train.data)
모형 lm.mult1의 summary
summary(lm.mult1)
Call: lm(formula = logSal ~ RBI + Assists + Hits, data = train.data) Residuals: Min 1Q Median 3Q Max -0.85001 -0.28874 0.06381 0.23237 1.20395 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 2.1523769 0.0549949 39.138 <2e-16 *** RBI 0.0031744 0.0013942 2.277 0.0236 * Assists -0.0001414 0.0001608 -0.880 0.3799 Hits 0.0025485 0.0008379 3.042 0.0026 ** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 0.3423 on 258 degrees of freedom (59 observations deleted due to missingness) Multiple R-squared: 0.2258, Adjusted R-squared: 0.2168 F-statistic: 25.09 on 3 and 258 DF, p-value: 2.827e-14
모형 lm.mult2의 summary
summary(lm.mult2)
Call: lm(formula = logSal ~ RBI + Assists + Hits + Runs, data = train.data) Residuals: Min 1Q Median 3Q Max -0.85203 -0.28880 0.06303 0.23435 1.20486 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 2.1521276 0.0551418 39.029 <2e-16 *** RBI 0.0032043 0.0014203 2.256 0.0249 * Assists -0.0001449 0.0001639 -0.884 0.3774 Hits 0.0026654 0.0013072 2.039 0.0425 * Runs -0.0002463 0.0021124 -0.117 0.9073 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 0.3429 on 257 degrees of freedom (59 observations deleted due to missingness) Multiple R-squared: 0.2259, Adjusted R-squared: 0.2138 F-statistic: 18.75 on 4 and 257 DF, p-value: 1.548e-13
여러 개의 변수를 사용한 경우에는 두 모형을 비교하기 위해 조정된 결정 계수를 사용해야 함