Beyond sentiment analysis: emojization of world leaders' tweets

Sentiment analysis can be used to measure how positively or negatively people express themselves, for example in relation to a given product or brand. It is useful for analyzing and responding to feelings expressed in customer emails, calls, reviews, and more.

For more fine-grained information, services like IBM's Watson Tone Analyzer (accessible through Dcipher Analytics) measure key emotions such as anger, sadness, joy, and fear.

Both sentiment analysis and emotion recognition are available in Dcipher Analytics. But our latest feature, emojization, takes things to the next level by using a model trained on 1.2 billion tweets to interpret the nuanced emotional tones of any English-language text and tag them with the corresponding emojis.

It can be used to analyze the emotional nuances expressed in news reporting, social media posts, political speeches, and customer emails, to mention a few.

In this post we analyze 1490 tweets from the past month by six world leaders (selected for their many followers and the fact that most of their tweets are in English): Donald Trump, Narendra Modi, Justin Trudeau, Boris Johnson, Imran Khan, and Pope Francis.

The emoji clouds below show the emojis that best describe the tweets of three of these leaders. Note that the emojis are not extracted from the tweets – in fact, very few of the tweets contain emojis – but present an interpretation of them.

Can you guess which emoji cloud belongs to which leader? To find out, keep reading.

This post describes step-by-step how to emojize texts in Dcipher Analytics.

1. Import and get familiar with the data

We use Dcipher's social media import function (described in our blog post on social media mining) to import the last month's tweets from the six leaders' official Twitter accounts. This yields 1490 tweets.

Once the posts have been imported, we use two Dcipher workbenches, the Table View and the Bubble View, to display and get an overview of the data. We can see, for example, that Donald Trump is by far the most active tweeter among the six.

A short video showing what this looks like in Dcipher:

2. Clean the texts

It's usually a good idea to clean up text data before running sentiment analysis or emojization. URLs and @ tags, for example, can increase the noise in the data and don't contribute anything to the analysis.

We first need to identify the field containing the tweets, which we do by examining the Content field in the Table View. This tells us that the Message field is the one we're interested in and the one that needs to be cleaned up.

Dcipher's preprocessing wizard guides us through the process of cleaning up the data. We ask Dcipher to remove URLs and @ tags and to as well as the small number of non-English posts in the dataset. (To discover the full functionality of the preprocessing wizard, see this blog posts.)

Here is what it looks like in Dcipher:

3. Run the emojization operation

The data has now been cleaned, so it's time to move to the key part of the analysis: applying emojization on the tweets.

This is done by selecting the cleaned_message field, clicking the Operations button at the top of the workspace, and selecting the Emojize operation under NLP & Text Analytics. The operation takes about half a minute to run on the 1400 tweets.

This video shows how it's done and what the aggregated emoji cloud looks like:

4. Find overrepresented emojis

We now have an idea of what emojis best represent the tweets overall: "thumbs up" (in 888 of the tweets), "rage" (704 tweets), "angry" (655 tweets), "pray" (542 tweets), and "clap" (432 tweets).

But what we really would like to know is how the different leaders differ in their communication and the emotions they express. This is where Dcipher's Find overrepresented values operation comes in handy. Here is how it works:

First, make sure the emojis are displayed in the Bubble View.

Then, make sure the relevant tweets are displayed in the Table View. In this case we're going to analyze tweets by Donald Trump, so we apply a filter to the author's name column in the Table View – similar to how it would be done in Excel or Google Sheets.

Now that the right values (emojis) are displayed in the Bubble View and the right documents (tweets by Donald Trump) in the Table View, click the column header in the Table View to select all tweets and drag them to the "Find overrepresented values" drop zone in the Bubble View.

The outcome is an emoji cloud, where the size shows how overrepresented an emoji is connection to tweets by Donald Trump compared to the entire set of tweets. In other words, the biggest emojis are those that best describe Donald Trump's tweets, in comparison to those by the other leaders.

See what the process looks like in Dcipher:

Let's do the same for Pope Francis – but this time we're interested both in the emojis and the words and phrases that characterize his tweets. We get words and phrases by applying the Tokenization operation. (We won't go into the details of how it works here, but for more information we recommend this blog post.)

Finding overrepresented emojis and words for Pope Francis:

There appears to be little overlap between the two leaders' styles of communication.

5. Map emojis and explore their corresponding texts

Since the relationships between texts and emojis have emerged from the actual use of emojis in a large training set of 1.2 billion tweets (rather than through someone defining what they mean), it is not always easy to understand the precise meaning of an emoji. Dcipher offers several ways of making sense of the emojis and to explore the texts they describe.

The emoji cloud in the Bubble View can be turned into a network, where links represent co-occurrence. In other words, emojis that tend to describe the same tweets are linked together in the network. This brings out the underlying emotional structure of the tweets. We can see that there are three main clusters of emojis: one upset and angry; one cheering and projecting of strength; and one expressing love and gratitude.

By selecting emojis from the latter cluster and dragging them to the Table View where the tweets are displayed, we can score the tweets based on the emojis, so that the tweets described by the selected emojis and shown.

Rather than having to read through all the tweets to understand that they are about, we use the second Bubble View to display the words that are overrepresented in the tweets. They include "support", "pray", "together", "support", and other words that hint at the meaning being expressed.

Exploring emojis and their meaning:

We can also list all the tweet-emoji combinations and their scores:

How do the leaders differ in their communication?

As you might have guessed, the word clouds shown at the beginning of this post describe the tweets of Boris Johnson, Donald Trump, and Pope Francis. The accompanying word clouds convey the issues that each leader is addressing through their tweets. Johnson is focused on a message of defeating the virus; Trump is rallying his base; and the Pope is offering a spiritual message in a time of crisis.

Get started!

To try out emojization and the rest of Dcipher Analytics' comprehensive text analytics toolbox, sign up for a free trial. For more ideas on how to gain value through text analytics in Dcipher Analytics, check out our other blog posts. If you have questions, want to discuss your use case, or get a guided tour, contact us.