How AI powers the next generation information search

Artificial intelligence (AI) is causing two major shifts in information search. First, it makes a transition from manual and criteria-based search to cognitive and adaptive search. Second, it replaces statistical and hypothesis-based approaches with open-ended and deep learning-powered techniques. This post outlines how anyone, with the right tool, can start searching information using the AI-powered approach.

From static search criteria to machine learning

In the information collection step, researchers need to find and gather relevant information. They search through search engines and monitoring services, and in databases and reports. Due to the time consuming nature of sifting through the data collected, they tend to rely on entries that rank high in the search interfaces they use, whether based on page rank, citations, likes, or other measures of importance. While time-efficient, the drawback of this approach is that it misses information beyond the highest ranked entries. Whether looking at investment deals, patents, news articles, or online search results, important information is often found in the long tail of entries that are less highly valued, cited, referenced, and so on. This is particularly the case when searching for information that has not yet been picked up by everyone else.

In the traditional process of gathering information, it is crucial to define the search space as accurately as possible in order to capture most of what is relevant, while eliminating most of what is not. This is typically done through keywords or search criteria. In patent databases, for example, the researcher would build a search criteria using parameters such as patent classifications, application and publication time intervals, keywords used in abstracts and claims, country of origin, and so on. Boolean operators, such as “and” and “or”, can be used to narrow down the search space by create more complex criteria.

The first criteria is going to be a rough approximation of the relevant search space. By inspecting the results returned by the search, the researcher then refines the criteria step by step, until it is satisfactory. The problem is that with all the ambiguity and complexity involved, even the most elaborate criteria is likely to do a poor job of defining a relevant search space. Language enables a single thing to be expressed in a multitude of ways. And a single expression can sometimes have many different meanings. Even if the researcher manages to create a perfect search criteria, it is likely to quickly become irrelevant as the area (technology landscape, investment flows, news reporting, consumer discussions, etc) changes.

AI offers a solution to this problem. Instead of building static, rule-based criteria, deep neural networks have the ability to learn what characterizes an entry that is relevant and one that is not. Rather than relying on rules and individual keywords, it is able to identify entries that are contextually similar to entries known to be relevant even if they do not share a single word. For example, a search for “environmental degradation” could yield entries about “ecosystem destruction” because the deep neural networks used have read huge volumes of text and have learnt that the two phrases express similar meanings.

The AI-based alternative to criteria-based research also has the advantage of being adaptive. Based on feedback from users about the relevance of the extracted information, it adjusts its parameters to incorporate the new information. Continuous improvement, therefore, is as simple as giving thumbs up or down. No need for time-consuming manual tuning of the search criteria.

The trendspotting app Co:tunity adopts this approach by using Dcipher Analytics as an improvement loop for its automated trendspotting. It collects news articles from a wide range of sources and sorts them under related trends. To make sure articles are categorized correctly and the most relevant articles are presented, Dcipher receives feedback from users and use it to adjust its categorization and scoring model.

From hypothesis-driven to open-ended

The information search process is inherently iterative and, therefore, highly path dependent. Faced with a large amount of information, the researcher relies on preexisting knowledge and assumptions to narrow down the scope. In the case of analyzing an innovation ecosystem, we might for example limit the scope to analyzing the activities of a few known key players, or a few geographical hotspots. The findings from this initial search, as well as later steps, feed back into the process, informing each subsequent step. The result is a search path that may or may not have covered everything of importance; upon concluding the research, it is difficult for the researcher to know which.

The alternative approach is data-driven, adopts a bottom-up perspective, and results in open-ended and often unexpected outcomes. It starts by defining broad search criteria. The purpose is not to as accurately as possible capture what is relevant and eliminate what is not. Instead, the objective is to maximize the amount of relevant information collected, even if this means diluting it with irrelevant information. This is feasible thanks to the second step, which clusters entries that express similar meanings. The results in the emergence of a map, in which islands and continents of meaning are formed. The researcher examines each part of the map, tagging relevant content and removing the rest from the analysis.

With this approach, the role of the researcher becomes to interpret the meaning of patterns that emerge from the data. Compared to the traditional process, it is faster, less dependent on the individual researcher or analyst, and more likely to reveal surprising findings. It also helps the researcher to switch between overview and depth, so that both the forest and the trees can be seen.

How organizations are using the new information search paradigm

Future Lab Shanghai leverages the AI-powered information search through Dcipher Analytics. The Future Lab is a physical research space with touch screens that visualize trending news articles and current online discussions among consumers. Researchers use the touch screens to zoom into relevant clusters of articles and posts to quickly get an understanding of emerging themes and the business opportunities they indicate.

If you want to see what AI-powered information search looks like in action, and try it out yourself, sign up for a free trial account of Dcipher Analytics and check out our video tutorials.