The Three Pillars of Real-Time Search

With the recent announcement that both Bing and Google will be including real-time results from Twitter in their Web search results, it is increasingly evident that the need to track and monitor the current conversation on the Web is real. Now that the big players are joining the myriad of startups competing in this arena, it is worth analyzing what factors will determine the eventual victor: what makes for “good” real-time search?

Google revolutionized keyword search not by doing a better job finding keywords in documents, but by developing techniques for separating meaningful documents from low-quality and spammy ones. Similarly, the quality of real-time search will not be determined simply by how recent the results are, but instead by how well the results are filtered. The winner in the real-time search space will be the player who can deliver in three key areas:

Trends, not recency. If hundreds of people are tweeting about a topic, I’m less interested in what the last 10 people had to say than the larger picture: what is the trend of conversation over the last day? The last hour? What are the best comments made on a topic? How have opinions changed on this topic recently? Simply listing the most recent comments is an inadequate mechanism for answering these questions. More value can be had by concentrating on identifying and presenting the emerging trends, rather than being most up-to-the-second.

This doesn’t mean that recency isn’t important – being able to identify trends quickly is still critical, especially if it can be done before the trend has peaked.

What is relevant? Current attempts at presenting real-time results are mainly focused around finding tweets that contain a specific keyword. This can miss relevant information: a user searching for “vancouver olympics”, for example, might be expecting to find information about the recent Cowichan sweater controversy, but would likely fail because the relevant tweets wouldn’t necessarily contain the right keywords. This is a problem with traditional search as well, but is exacerbated in the real-time context because the length of the text is so small. To combat this problem, real-time relevance needs to be much more about the people involved than the specific keywords they use.

Spam filtering. There are already thousands of Twitter accounts being created every day for the purpose of spamming URLs and this problem is likely to get even worse when Twitter results are being included by the major search engines.

Again, the solution here will depend crucially on social factors. Legitimate users can be separated from spammers by looking at patterns in their social activity: who they follow, who follows them, who they retweet, etc. A successful real-time search engine will need to combine these techniques with traditional spam filtering methods to be successful.

So far, no-one has built a real-time product that delivers on all three fronts, and the true potential of real-time search won’t be realized until that happens.