Sunday, 9 November 2014

Creating a Word Cloud in R


Wordcloud is a group of words occurring with a higher frequency or together in a piece of text. This post will show how to create a Word Cloud. Inside a wordcloud, the positive, negative and neutral words can be shown separately and with their proportion. The data used for generating the wordclouds here is the thousands of  book reviews from the e-commerce giant in USA. The reviews  were for book-series: Harry Potter, Chronicles of Narnia, The Hobbit (Lord of the Rings). The tweets data  from the previous post can also be used as an input for this post. The input file here must have the type of sentiment along with the text. (check the output of the sentiment analysis from last post).
R code:


library(wordcloud)
library(tm)
library(ColorBrewer)
narnia = read.csv("HarryPotter_wordcloud.csv")                          # read the input file containing the type of sentiment    
narnia$text=gsub('[[:punct:]]', '', narnia$text)                               # Clean the data
narnia$text=gsub("[[:digit:]]", "", narnia$text)
narnia$text = tolower(narnia$text)                                                  # male it lower case
sents = levels(factor(narnia$sent))                                                   # 3 levels: positive, neutral, negative
labels <- lapply(sents, function(x) paste(x,format(round((length((narnia[narnia$sent ==x,])$text)/length(narnia$sent)*100),2),nsmall=2),"%"))                            # % proportion calculation for levels
nemo = length(sents)
emo.docs = rep("", nemo)
for (i in 1:nemo)                                                   # text data categorized into 3 levels, docs created for each category
{
tmp = narnia[narnia$sent == sents[i],]$text
emo.docs[i] = paste(tmp,collapse=" ")
}
emo.docs = removeWords(emo.docs, stopwords("english"))                     # remove stopwords
corpus = Corpus(VectorSource(emo.docs))
tdm = TermDocumentMatrix(corpus)
tdm = as.matrix(tdm)
colnames(tdm) = labels
# comparison word cloud
comparison.cloud(tdm,max.words=100, colors = brewer.pal(nemo, "Set1"),

scale = c(3,.5), random.order = FALSE, title.size = 1.5)


The word cloud for Harry Potter reviews:


The word cloud for Chronicles of Narnia reviews:


The word cloud for Hobbit reviews:


No comments:

Post a Comment