How to analyze open-ended survey responses

Traditionally, analysis of free-form text data from surveys have required coding, where researchers read through the answers and manually code them with fixed or emergent categories. To ensure accuracy and consistency, each post needs to be coded by at least two researchers. If the survey is conducted in multiple languages, native speakers of each language need to do the coding. They also need to coordinate with each other to make sure they interpret answers in a consistent way.

In short, coding is a time-consuming and tedious task with a strong dependence on the skills of individual researchers, making a high degree of quality and consistency difficult to maintain. This is why open-ended questions are often avoided in survey design.

The algorithmic approach to analysis of free-form text data is an attractive alternative because it is fast, automated, and ensures consistency. The quality of insights depend, however, on what approach is used. Below we outline five text analytics approaches that can help making sense of open-ended survey responses.

1. Content analysis and word clouds

The simplest and most common way of analyzing text data in surveys is through content analysis. The idea is simple: split sentences into words and count the frequency of each word. The words and their frequencies are commonly visualized in the form of a word cloud, where the size of the words represent the frequency of their use. While useful for giving an overview of the aggregate use of words, this method is too simplistic to replace human coding. Individual words are simply unable to capture the meaning and nuances that a human researcher can build into a code.

2. Augmentation through natural language processing

A few simple preprocessing steps can be used to augment the content analysis approach. Natural language processing (NLP) is a branch of artificial intelligence that help computers interpret and process human language. Simple NLP operations to get more insights from free-form text data include:

  • Tokenization is about chopping a text into meaningful pieces without simply splitting by white space. Tokens are not necessarily the same things as words. For example, “New York” and “mocha latte” are more meaningful semantic units than “New”, “York”, “mocha”, and “latte” individually. In languages like Chinese, that do not use word spacing, tokenization is essential for identifying the beginnings and endings of words. Tokenization is not done by applying simple rules; it requires an underlying model of the language it is to be applied to, and different models vary in quality.
  • Lemmatization is the act of reducing the inflectional forms of a word to a common base. For example, “running” and “ran” both give the lemma “run”. Without lemmatization, word clouds are likely to contain various forms of the same word, making the importance of each word difficult to assess. The technique uses linguistic information, taking into account the morphologic analysis of each word and using dictionaries to connect a word to its lemma.
  • Parts-of-speech tagging assigns each word a category according to its syntactic function. “Drink”, for example, can be tagged as a noun or a verb depending on the context in which it appears. This is useful in free-form text analysis because not every part-of-speech is equality relevant. Nouns and adjectives, for example, tend to contain more useful information than prepositions and conjunctions. Therefore, filtering out all words tagged with less important parts-of-speech is a useful preprocessing step to content analysis.
  • Text cleaning can also be a useful measure and may take many different forms. For example, some respondents may have written their answers unintelligibly or in a different language. Such issues can be resolved through a combination of preprocessing, for example using automated language detection and and translation to make sure all text is using the same language, and post-processing, such as removing strange words if they are large enough to appear in the word cloud. After these preprocessing steps, the analyst follows the same process as in simple content analysis. Rather than counting the frequency of the words in the responses, however, we suggest counting the number of respondents using each word. This makes the result less sensitive to individual users writing long responses where they use certain words a lot. Counting respondents makes each respondent equal.

While making the overall picture clearer, this approach still does not code individual responses and does not convey the contexts of how words are used.

3. Relational word clouds

A traditional word cloud is simply a visual representations of a list of words and their frequencies. Most of the meaning that words convey, however, are through their use in combination with other words. Traditional word clouds, therefore, leave us to speculate about how and why words were used. To address this, relational word clouds take into account not only words’ frequence of use, but also their contexts. By measuring how often different words are used together in the survey responses, we can construct a network where words are nodes and co-occurence is represented by links. The result is a word cloud where not only the size of words matter, but their positions as well. Two words that are located close to each other in the network are often used together in the responses.

There are several ways of measuring the closeness of words:

  • First-order similarity measures look at whether words have been used together. Examples of first-order similarity include Cosine similarity and Co-occurrence similarity.
  • Higher-order similarity measures assess the degree to which words have been used in similar contexts. CKC similarity and Doc2Vec similarity are examples of higher-order similarity measures. We prefer them over first-order similarity measures because whether two words are contextually similar tends to be more important than whether they have been used in the same context. Word clusters in the relational word cloud represent topics and themes. The relational word cloud therefore provides a more multi-dimensional and nuanced picture of the free-text responses than the simple word cloud. It helps identifying the concepts that could be used for tagging individual responses. To understand a concept in-depth, the analyst can drill down by reading the individual responses that use the tokens representing the concept.

The relational word cloud is a visual way of identifying topics and, in extension, concepts in the responses. There are other approaches too, such as topic modelling, a technique for discovering abstract topics used in a collection of documents. The benefit of the visual approach, and the reason we favor it if we have to choose, is that it makes it fast and easy for the analyst to iterate. When first generated, a structured word cloud commonly contains themes that are not relevant, and sometimes these themes drown out more relevant ones. To address this, the analyst can remove posts containing the irrelevant themes until a clear structure of topics and themes emerges.

4. Tagging of responses

Once the concepts are identified using a relational word cloud or topic modelling approach, the analyst can now tag responses related to each concept. There are different ways of carrying out the scoring:

  • The bag-of-words approach, where a response gets a high score if it contains the keywords that represent a topic.
  • The AI-based word embedding approach, where neural networks are used to determine if a response is relevant even if it does not contain the exact words that represent the topic. This is our suggested approach, although it is quite a bit slower than the statistical bag-of-words approach.

Rather than a “yes” or “no”, the scoring provides a scale for the analyst to use for tagging decisions. Usually responses above a certain score get the tag in question, but what the cutoff should be needs to be determined on a case-by-case basis.

Once the responses are tagged, the free-form text data can be considered as structured. This means traditional methods can be applied to analyze and visualize the data. It may, for example, be relevant to cross tabulate the tags against other structured data, such as background variables, also collected through the survey.

5. Further text enrichment using NLP

In order to further enrich the data, the analyst can apply additional NLP steps. Popular methods include:

  • Sentiment analysis gives a response a score depending on how positive or negative it is.
  • Emotion analysis tags a response into one or more of a number of emotions, such as joy, anger, and frustration.
  • Categorization labels a response by category, such as various areas of technology, society, and so on.
  • Entity extraction identifies entities mentioned in the response, such as names of organizations or individuals.
  • The resulting information is structured and can be used in the same way as other variables in the dataset.



The Swedish Institute wanted to understand the view in six countries of Sweden’s handling of the European migrant crisis of 2015-2016. The analysis was carried out by our value-added partner, Kairos Future, using Dcipher Analytics.

The Swedish Institute and Kairos Future opted for a survey with open-ended questions as a way to identify themes that were not already know, and not to influence the respondents. Questions asked respondents to describe their view of Sweden and whether the view had changed in the last year, before priming respondents by mentioning the migrant crisis.

The survey was conducted in six different countries, none of them sharing the same language. Rather than translating responses into English, which would have risked removing important nuances in the process, the analysis was carried out in six different languages.

The natural language processing techniques outlined in this post were used to identify a total of six broad themes about Sweden’s handling of the migrant crisis. The themes ranged from very negative (“loss of control due to naivety”) to very positive (“a humanitarian role model”). Within each theme, the analysis revealed a number of more specific topics and nuances. Once the responses had been tagged with these topics and themes, the different languages were no longer an issue in the analysis (although the subsequent qualitative analysis still required researchers mastering each language).

Researchers drilled down into each of the six themes by reading individual responses representing each theme. This provided a rich, qualitative understanding of each theme.

The analysis proceeded by cross tabulating the themes against various other variables, such as place of residence, age, and income range. The result was insights into the who, to complement the what, of the discovered themes. A country-level analysis showed how the image varied across the countries surveyed.

Overall, the study informed the Swedish Institute’s decisions on how and where to act in response to the image change that the country had undergone. It would not have been feasible without the free-form text analytics toolbox of Dcipher Analytics.

If you want to try out the methods described in this post yourself, sign up for a free trial. In-app video tutorials will guide you though the process step by step.