감정사전의 회귀계수를 이용한 감정분석
데이터 불러오기
data.test <- read.csv('tablet2014_test.csv', stringsAsFactors = F)
DocumentTermMatrix 만들기
corpus <- Corpus(VectorSource(data.test$Texts))
dtm.test <- DocumentTermMatrix(corpus,
control = list(tolower = T,
removePunctuation = T,
removeNumbers = T,
stopwords = stopwords("SMART"),
weighting = weightTfIdf,
dictionary = Terms(dtm)))
감정 사전의 회귀계수를 이용하여 감정값 계산하기
X.test <- as.matrix(dtm.test)[,colnames(X)]
senti.lm.test.coef <- predict(res.lm , newx = X.test)
senti.lasso.test.coef <- predict(res.lasso, newx = X.test, s = "lambda.min")
senti.ridge.test.coef <- predict(res.ridge, newx = X.test, s = "lambda.min")
senti.elastic.test.coef <- predict(res.elastic, newx = X.test, s = "lambda.min")
감정값을 0 or 1로 변환하기
senti.lm.b.test.coef <- ifelse(senti.lm.test.coef > 0, 1, 0)
senti.lasso.b.test.coef <- ifelse(senti.lasso.test.coef > 0, 1, 0)
senti.ridge.b.test.coef <- ifelse(senti.ridge.test.coef > 0, 1, 0)
senti.elastic.b.test.coef <- ifelse(senti.elastic.test.coef > 0, 1, 0)
정확도 확인하기
confusionMatrix(senti.lm.b.test.coef, data.test$Sentiment)$overall[1]
confusionMatrix(senti.lasso.b.test.coef, data.test$Sentiment)$overall[1]
confusionMatrix(senti.ridge.b.test.coef, data.test$Sentiment)$overall[1]
confusionMatrix(senti.elastic.b.test.coef, data.test$Sentiment)$overall[1]