The project aims to analyse the sentiment behind the speeches given at the Oscars’ ceremonies by the leading actresses and actors over the years. We were very lucky to find all the data we needed and we were able to analyze the content of those speeches. The process will be explained in this document.
In a spreadsheet, we gathered all the data we needed from the Oscars’ speeches database. The database is available at the following link : http://aaspeechesdb.oscars.org/ . Our spreadsheet included the speech itself, the name of the actress/actor, their presence, the name of the replacing speaker (if needed), the year, the film title, the genre and the age of the speaker.
## Warning: Removed 3 rows containing missing values (geom_label).
ggplot(speech_sentiments, aes(line_value, SPEAKER.AGE, color = GENRE)) +
geom_label_repel(aes(label=SPEAKER)) +
ggtitle("Lead Actor Speakers - Speech Value vs. Speaker Age, Grouped by Genre") +
xlab("Speech Value") +
ylab("Speaker Age") +
scale_color_hue(h.start = 0, direction = 1, na.value = "grey50", aesthetics = "color") +
theme(text = element_text(size = 12, family = "Palatino"))
## Warning: ggrepel: 50 unlabeled data points (too many overlaps). Consider
## increasing max.overlaps
Jane Fonda has the highest speech value at 48 for the lead actor speakers data set. She was speaking for her father, Henry Fonda, who couldn’t attend the ceremony because of a heart illness. In her speech, she expresses her gratitude to the Oscars and to the cast of “On Golden Pond”. Although she doesn’t use many words that would score high on afinn, like “good,” with a score of 3 and “lucky” also with a score of 3, she uses similar words to those very often, which keeps adding to her score. Her speech is longer, so it also gives her a chance to achieve a higher value. Her speech variation is much lower for the same reason that there isn’t much of a change in negative to positive value in her words, according to afinn.
Jeremy Irons has the highest speech variation in our data set at just under 2.5. Jeremy won the oscar for lead actor in 1990 at 42 years old for his role in the drama film “Reversal of Fortune”. During his speech Irons thanked someone named Dick Smith. In afinn, the word dick has a score of -4. Irons also uses the word “cut” which has a score of -1 in afinn. He does use the words “wish” and “thank” which earn him some value points but not many as his score in value comes to about 8.
Sacheen Littlefeather accepted the award for leading actor in 1972 on behalf of Marlon Brando. Sacheen is Apache and was president of the National Native American Affirmative Image Committee. Marlon did not want to accept the award due to the mistreatment of American Indians in the film industry at that time.
ggplot(speech_sentimentss, aes(YEAR, line_value, colour=GENRE)) +
geom_label_repel(aes(label=SPEAKER)) +
ggtitle("Lead Actress Speakers - Year vs. Speech Value, Grouped by Genre") +
xlab("Year") +
ylab("Speech Value") +
scale_color_hue(h.start = 0, direction = 1, na.value = "grey50", aesthetics = "color") +
theme(text = element_text(size = 12, family = "Palatino"))
## Warning: ggrepel: 46 unlabeled data points (too many overlaps). Consider
## increasing max.overlaps
ggplot(speech_sentimentss, aes(line_value, SPEAKER.AGE, color = GENRE)) +
geom_label_repel(aes(label=SPEAKER)) +
ggtitle("Lead Actress Speakers - Speech Value vs. Speaker Age, Grouped by Genre") +
xlab("Speech Value") +
ylab("Speaker Age") +
scale_color_hue(h.start = 0, direction = 1, na.value = "grey50", aesthetics = "color") +
theme(text = element_text(size = 12, family = "Palatino"))
## Warning: ggrepel: 51 unlabeled data points (too many overlaps). Consider
## increasing max.overlaps
Greer Garson has the highest speech value at 58 for the lead actress speakers data set. Greer was 38 when she received this oscar for her role in the movie drama “Mrs. Miniver” in 1942. Her speech was quite long and she used several words such as grateful, thank, exciting, great, praise, happy, and humbly which earned her the highest score in both the leading actor and actress categories. Many words she used had values of 3 to 4 in afinn which also allowed for her high score.
Anthony Harvey has the highest speech variation score in the lead actress data set at just above 3. Anthony was accepting the award on behalf of Kathrine Hepburn who was 61 at the time and received the oscar for the movie drama “Lion in the Winter”. Anthony’s acceptance speech on Kathrines behalf was very short which gave him little time to add much value. He also used words such as breaking and and broken which have afinn scores of -1 and -3.
This is the topic model created by the code that automatically organized the top terms in the speeches into 10 different categories.
speech_top_terms %>%
mutate(term = reorder_within(term, beta, topic)) %>%
ggplot(aes(beta, term, fill = factor(topic))) +
geom_col(show.legend = FALSE) +
facet_wrap(~ topic, scales = "free") +
ggtitle("Lead Actor Speakers - Top Terms") +
xlab("Beta") +
ylab("Term") +
theme(text = element_text(size = 14, family = "Palatino")) +
scale_y_reordered()
speech_top_termss %>%
mutate(term = reorder_within(term, beta, topic)) %>%
ggplot(aes(beta, term, fill = factor(topic))) +
geom_col(show.legend = FALSE) +
facet_wrap(~ topic, scales = "free") +
scale_y_reordered() +
ggtitle("Lead Actress Speakers - Top Terms") +
xlab("Beta") +
ylab("Term") +
theme(text = element_text(size = 14, family = "Palatino"))
As the years progressed, the speeches generally got longer and longer. However, when looking at the topic model, the same words are repeated over and over again. Ten distinct groups of words couldn’t be determined. This is unexpected because one would think that longer speeches would mean chances for a diverse vocabulary, especially considering the more virtuosic venues and prestige of the Oscars as time goes on. There is barely any variation, spare a few switches in frequently used words.
This proves that attending the Oscars and making a speech is mostly performative, and it doesn’t hold as much content despite its length.
As an extra feature to this project, we also included an image plot of the movie posters that had lead actors or actresses winning an Oscar.
This graph analyzes the luminance of a poster compared to the saturation. The more a poster is either fully black or fully white, the less saturation it has, therefore it is placed lower on the y axis. The most saturated posters position themselves higher on the y axis and more in the center of luminance.
This graph analyzes the hue of the posters according to what genre they were. Since most of the films that received awards were dramas, most of the data points congregate in the center of the graph no matter how much they skew red or blue. The posters in general use mostly orange, no matter what genre they are. It can be assumed that this is to catch people’s eyes without being too obnoxious like how red or yellow could be perceived.
This graph analyzes how centered the elements inside the poster are. The posters mostly have a centered composition horizontally, therefore placing themselves near the 0.0 line on the x axis. Vertically, they skew a little above 0, leading to slightly topweighted posters. This makes sense because most movie posters leave space at the bottom for credits and other text. The little skew that does exist horizontally leans toward the left, which also makes sense, because English reads from top to bottom, left to right.
This poster appears to break the trend of highly saturated images falling having a pretty neutral luminance. The reason for this, as well as some other posters that read as being more saturated than they are, is that actual grayscale images couldn’t be processed by our color analysis. To get around this, we replaced grayscale posters with images that were mostly the same, but had some slight hue to them, such as white values being slightly tan or black values being slightly blue. An unintended consequence of this was that these posters that would normally have been mostly black or white read as being very full of color, compared to posters that were allowed to keep their actual colors. This is also a possible explanation for the outsized number of orange hued posters, as that set could contain some grayscale posters that were shifted toward tan.
#Possible Explanations for Results In both the Actor and Actress sentiment plots, there is a steady increase in positive value as the year increases. One possible explanation for this is that the speeches themselves have gotten longer over time, while in the 40’s and 50’s people usually just said a few sentences. Because the sentiment data reflects a sum of all the words spoken, it’s possible that longer speeches will seem very positive, while short ones will seem more neutral.
Another thing we noticed was that some of the most extreme positives were from people accepting an award for someone else, such as Jane Fonda accepting an award for her father in 1981. This trend makes some sense, as most people want to be seen as humble or modest, celebrities especially. It’s also possible that some of the more negative speeches come out of the same instinct for humility, as people like Frances McDormand used some self-deprecation when accepting her award.
As far as the movie posters go, it was surprisingly hard to find concrete trends relating to any external traits about the films. Posters for comedies were not distinctly different from those of dramas or musicals. Unlike with the speech data, we didn’t find any changes over time in the colors or composition of the posters shown in. There were some slight trends that affected all the posters, such as leaning toward being horizontally symmetrical but vertically slightly top-heavy, and trending towards orange and pink hues over blue or green. These minor trends may give some insight into overall trends in poster design, but the dataset is likely too small to make any definitive claims.
Most of the trends that emerged in posters illustrated how the image analysis works more than anything specific about the posters themselves. For example, the plot of luminance vs saturation doesn’t say much about movie posters or the Oscars, but does demonstrate nicely the relationship between saturation and luminance in images. The curve of the plot shows how extremely bright or dark images lose color information as a result of being that bright or dark. This trend should be true for basically any set of images, so in that sense it’s not a very useful finding, even though it is really interesting to visualize.
There is a limited library for the amount of words in the English language. Especially for things like phrases, which could mean multiple things and have different levels of intensity all together and not as separate words, it would be impossible to have a value for every situation.
Not all the actors and actresses have English as their native language. There are some people who are from France, Italy, etc, and although they may be very proficient in English, they may use different words that convey more or less expressiveness without realizing it.
Language is very subjective. When composed the right way, a simple word could hold more weight than a complex and generally expressive one. In this study, tone wasn’t measured either, which is an important deciding factor in conveying positivity or negativity. There are also words that can be perceived to have an extremely positive value, like “tremendous,” (found in Jane Fonda’s speech) but that isn’t notated for having any value on afinn.
We went into this project hoping to find some major differences or revealing patterns in how people accept awards and create posters for movies. In the case of speeches, we were surprised by how overall similar most of the speeches were. There were some interesting trends such as increasing length or extra positivity when talking about someone else, but for the most part strong trends didn’t emerge in this dataset. You could interpret this optimistically as a result of the variety and diversity in speeches and posters, or more cynically see it as a sign that the content of these speeches and posters don’t actually contain much meaning at all. A third explanation is that our methods of analysis, (sentiment, color values, and symmetry/edge analysis) are too limited to uncover the patterns that may exist in the data. Ultimately, if there is any profound knowledge to take away from this, maybe it’s this: Jane Fonda really likes her dad, and Jaoquin Phoenix really dislikes capitalism.