Wordcloud is a group of words occurring with a
higher frequency or together in a piece of text. This post will show how to
create a Word Cloud. Inside a wordcloud, the positive, negative and neutral
words can be shown separately and with their proportion. The data used for
generating the wordclouds here is the thousands of book reviews from the e-commerce giant in USA.
The reviews were for book-series: Harry
Potter, Chronicles of Narnia, The Hobbit (Lord of the Rings). The tweets data from the previous post can also be used as an
input for this post. The input file here must have the type of sentiment along
with the text. (check the output of the sentiment analysis from last post).
R code:
library(wordcloud)
library(tm)
library(ColorBrewer)
narnia =
read.csv("HarryPotter_wordcloud.csv") #
read the input file containing the type of sentiment
narnia$text=gsub('[[:punct:]]',
'', narnia$text) #
Clean the data
narnia$text=gsub("[[:digit:]]",
"", narnia$text)
narnia$text =
tolower(narnia$text) #
male it lower case
sents =
levels(factor(narnia$sent)) #
3 levels: positive, neutral, negative
labels <-
lapply(sents, function(x) paste(x,format(round((length((narnia[narnia$sent
==x,])$text)/length(narnia$sent)*100),2),nsmall=2),"%")) # % proportion
calculation for levels
nemo =
length(sents)
emo.docs =
rep("", nemo)
for (i in
1:nemo) #
text data categorized into 3 levels, docs created for each category
{
tmp =
narnia[narnia$sent == sents[i],]$text
emo.docs[i] =
paste(tmp,collapse=" ")
}
emo.docs =
removeWords(emo.docs, stopwords("english")) # remove stopwords
corpus =
Corpus(VectorSource(emo.docs))
tdm =
TermDocumentMatrix(corpus)
tdm =
as.matrix(tdm)
colnames(tdm)
= labels
# comparison
word cloud
comparison.cloud(tdm,max.words=100,
colors = brewer.pal(nemo, "Set1"),
scale =
c(3,.5), random.order = FALSE, title.size = 1.5)
The word cloud for Harry Potter reviews:
The word cloud for Chronicles of Narnia reviews:
The word cloud for Hobbit reviews:
No comments:
Post a Comment