##Sentiment Value Tag sorting
Archive of Our Own is a site organized by tags. After being cleaned, both data sets needed to be narrowed down to the most popular tags of each set, with still enough tags to actually graph. Starting with the DracoxHarry data, we filtered out anything that isn’t used more that 15 times, 119 tags altogether. Similarly, with DracoxHermione, we filtered out anything that isn’t used more than 15 times, which returned 83 tags.
After narrowing our data sets, they needed to be sorted into intermediate tags to then find the sentiment value. Our intermediate tags are sorted by Sexual - A Romantic - B Neutral - C Unknown - D
A has tags that are inherently sexual or describe sexually related topics. B has tags that describe or relate to romantic relationships. C has tags that describe the rest of the story content that are not sexual or romantic. Tags that are handling the technical parts of the posting itself or the author’s commentary.
Here is the Sentiments code that was used to sort the tags in each set, into our four categories.
This first section just calls up the libraries,importing the data, and then begins a mutate to assign numbers to each row. For this set we are using DracoxHermione, but the code is the same for both sets.
library(readr)
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(tidytext)
library(ggplot2)
DXHer <- read_csv("TopestTagsDXHer - Topest_Tags_DXHER (2).csv")
## New names:
## • `` -> `...1`
## • `...1` -> `...2`
## Rows: 82 Columns: 5
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (2): Tag_Id, Group
## dbl (3): ...1, ...2, Weight
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
DXHer<-DXHer %>%
mutate(reference=1:dim(DXHer)[1])
For the next chunk, we find the frequency of each word in each reference category.
DXHer_per_line<-DXHer %>%
unnest_tokens(word, Tag_Id)%>%
count(reference, word, sort=TRUE) %>%
rename(per_line = n)
Next, calling the afinn data set and interjoining the Harry Potter data and afinn data by category word.
afinn<-get_sentiments("afinn")
DXHerwith_scores<-DXHer_per_line%>%
inner_join(afinn, by="word")
Next, we use group_by and summarize to calculate the sum and standard deviation of the “value” column.
scores_DXHer<-DXHerwith_scores%>%
group_by(reference)%>%
#notice our per line strategy is SUM
summarize(line_value=sum(value), line_var=sd(value))
Next is just joining the original dataset and the “value of” dataset by the reference column.
DXHer_sentiments<-inner_join(DXHer, scores_DXHer, by="reference")
This last section is visualizing the sentiment, using ggplot, with x = reference, and y = line value, and color = group.
DXHer_sentiments %>% ggplot(aes(reference, line_value, colour=Group))+geom_jitter()+
labs(x="Weight",
y="Reference",
color="Category")+
scale_color_manual(values=c("red","pink","green","blue"))+
ggtitle("Draco and Hermione Tag Sentiments")
write.csv(DXHer_sentiments, "DXHer_sentiments.csv")