disceRn Data: Creating a Word Cloud in R

Wordcloud is a group of words occurring with a higher frequency or together in a piece of text. This post will show how to create a Word Cloud. Inside a wordcloud, the positive, negative and neutral words can be shown separately and with their proportion. The data used for generating the wordclouds here is the thousands of book reviews from the e-commerce giant in USA. The reviews were for book-series: Harry Potter, Chronicles of Narnia, The Hobbit (Lord of the Rings). The tweets data from the previous post can also be used as an input for this post. The input file here must have the type of sentiment along with the text. (check the output of the sentiment analysis from last post).

R code:

library(wordcloud)

library(tm)

library(ColorBrewer)

narnia = read.csv("HarryPotter_wordcloud.csv") # read the input file containing the type of sentiment

narnia$text=gsub('[[:punct:]]', '', narnia$text) # Clean the data

narnia$text=gsub("[[:digit:]]", "", narnia$text)

narnia$text = tolower(narnia$text) # male it lower case

sents = levels(factor(narnia$sent)) # 3 levels: positive, neutral, negative

labels <- lapply(sents, function(x) paste(x,format(round((length((narnia[narnia$sent ==x,])$text)/length(narnia$sent)*100),2),nsmall=2),"%")) # % proportion calculation for levels

nemo = length(sents)

emo.docs = rep("", nemo)

for (i in 1:nemo) # text data categorized into 3 levels, docs created for each category

{

tmp = narnia[narnia$sent == sents[i],]$text

emo.docs[i] = paste(tmp,collapse=" ")

}

emo.docs = removeWords(emo.docs, stopwords("english")) # remove stopwords

corpus = Corpus(VectorSource(emo.docs))

tdm = TermDocumentMatrix(corpus)

tdm = as.matrix(tdm)

colnames(tdm) = labels

# comparison word cloud

comparison.cloud(tdm,max.words=100, colors = brewer.pal(nemo, "Set1"),

scale = c(3,.5), random.order = FALSE, title.size = 1.5)

The word cloud for Harry Potter reviews:

The word cloud for Chronicles of Narnia reviews:

The word cloud for Hobbit reviews:

disceRn Data

Sunday, 9 November 2014

Creating a Word Cloud in R

No comments:

Post a Comment

Search This Blog