Sentiment Analysis of Greta Thunberg’s Speech to the World Leaders by Amazon Machine Learning Tools

Fanni Kiss
4 min readDec 6, 2020

--

In the next article I’ll introduce two Amazon Web Services, Amazon Transcribe and Amazon Comprehend via Greta Thunberg’s famous speech for the leaders of the United Nations. The speech is available on YouTube at multiple links, I worked with the one uploaded by Guardian News.

The aim of my short analysis is to apply a natural language processing (NLP) and make a sentiment analysis on Greta Thunberg’s speech. I used AWS and R for this process and I am going to introduce the applied steps in this article.

Get the transcript

Firstly, I converted the video into MP3 format on ytmp3.com. I uploaded the video and downloaded the MP3 format.

Secondly, I uploaded the MP3 speech into my Amazon S3, which is the storage service of Amazon.

Thirdly, I applied Amazon Transcribe to convert the MP3 into text. Amazon Transcribe is an automatic speech recognition application, which is available on Amazon Web Services among the Machine Learning tools. I created a job, picked the speech from my S3 and converted the voice format into text. At the end of the process, I could download the transcript in JSON format.

Screenshot of the created transcription job

Make the sentiment analysis

Firstly, I created an access key in the Identity and Access Management (IAM) menu on AWS. I downloaded the access key in CSV format.

Secondly, I made the sentiment analysis using Amazon Comprehend by using R. I opened up an R notebook and installed the AWS Comprehend package.

Installing AWS Comprehend package in R

Thirdly, I set up the R system with the downloaded access key.

Load the access key downloaded from AWS IAM

Then, I loaded the speech in JSON format and saved it as a text in R.

“txt” is the transcript of the speech

The transcript, which is going to be the base of the further analysis, is printed above. To make the sentiment analysis, I needed to install the AWS Comprehend package and apply the detect_sentiment function on the saved transcript.

The result of the sentiment analysis shows that Greta Thunberg’s speech is mainly negative (73.78%) and the rest is somehow neutral (12.63%), positive (7.45%) and mixed (6.13%).

Result of the sentiment analysis (automatically generated text)

However, the analysis was made on the transcript generated by Amazon Transcribe. I compared the automatically generated text with the original speech at the video and I realized some mistakes. For example, the automatically generated transcript says “The world had 420 gigatons off CEO to left to emit”, which supposed to be “the world had 420 gigatons of CO2 left to emit”. I corrected mistakes like this and runned the sentiment analysis again on the corrected text. The result is different from the previous one.

Result of the sentiment analysis (corrected text)

The sentiment analysis of the original speech is less negative (67.8%), less neutral (9.91%) and more positive (8.36%), more mixed (13.93%).

The AWS Comprehend package also has a function, which shows the top used entities in a given text.

The detect_entities function lists the entities, which appear in the given text. In Greta Thunberg’s automatically generated speech the entities below are detected.

Detected entities of the automatically generated transcript

The entity detection is a bit different, when I apply it on the corrected text.

Detected entities of the corrected text

The entities in the automatically generated transcript and in the corrected one do not differ much. However, we can notice that the Amazon Transcribe does not identify IPCC as an organization, which could be a problem in the further analysis.

Conclusion

In this article, I demonstrated the usage of two AWS Machine Learning tools via R— Amazon Transcribe and Amazon Comprehend — on Greta Thunberg’s speech at the United Nations conference.

Amazon Transcribe gave back the speech in text format almost perfectly, however the abbreviations were not accurates. For example, “IPCC” was “i p c C” or“CO2” was “CEO” in the transpcript. Furthermore, the year 2018 was written as “2000 and 18”, which could also be confusing in the further analytics without a double check.

Amazon Comprehend is able to analyse the positive and negative sentiments in a given text. Greta Thunberg’s speech was 67.8% negative and the automatically generated transcript get a higher negativity value, 73.78%. The tool also detected the entities, mentioned in the speech showing the type of the entitiy. To sum up, Greta Thunberg’s anger and disappointment was well analysed by Amazon Comprehend.

--

--

Fanni Kiss
Fanni Kiss

Written by Fanni Kiss

0 Followers

Student of MSc in Business Analytics at Central European University, Budapest, Hungary

No responses yet