Are You Using Your Signals? They’re What Search Engines are Looking For.

Paul Marek of 3RING.com, corresponding live from Search Engine Strategies Toronto ’09.

Track: Geek Track
Signals: What Relevancy Indicators Are Search Engineers Watching For Today?

This session was presented by Marios Alexandrou of Acronym Media, and Dan Zarrella of HubSpot. It was a bit of a disappointment not to have a representative from any of the search engines to hint toward anything concrete, but Marios and Dan gave a good idea on what you should be paying attention to in the now, near and distant future. Here’s a recap of the session.

Marios brought us through the history and potential future of search signals, starting with;

Phase One – basic information retrieval. Search originally got its relevency signals from basic on page elements – spiders crawled the web and compiled the index from the pages they were able to get to. The problem was that the results were easy to manipulate with simple adjustement of text in on-page elements.

Phase Two – Inbound links were used as a major search signal. Again the problem was manipulation of the SERPS with the creation of spam links, link farms, and link purchasing. Another problem is that only content creators have the ability to “vote” with links, leaving actual users out of the equation.

Phase Three – Universal search. Google is providing results from several information streams rather than just content websites. Content from sites like, YouTube, Flickr, Technorati, Wikipedia is also shown along with “regular results”.

And now, we’re headed toward a method that’s less likely to be gamed:

Phase Four – Changes in – and tracking of – user behaviour as a ranking signal. An example is universal search results – tracking user behaviour (clickstreams data) has allowed to the search engines to see where clicks occur in the SERPS, which provides them the opportunity to adjust the results to push more active results to the top of results. Another is the browser toolbar. Search engines are able to track paths and trails of user clicks and pages, allowing them to not only prioritize and re-rack more active content, but allows them to find content that the spiders may not have found. From Yahoo’s VP of Search Research, Andrew Tomkins “In terms of signals, the toolbar is the big one…”

Marios eludes that even though clickstream data is the new signal, it will eventually be manipulated, although it will be much more difficult. As Dan Zarrella says, “Manipulation will need to move from gaming technology, to gaming people.” You can download Acronym’s paper “New Signals to Search Engines” from KeywordDriven.com.

Dan offers a more analytical approach, offering his own home-grown algorithm that search engines should pay attention to. From Dan’s talk:

Web search is too slow. It can take hours, days or weeks for content to get indexed, while conversly, current social news sites can take hours or days to provide a quality signal. SocNet sites are often too small or niche oriented to provide a broad quality score. Dan also suggests that the Facebook API (et al) are too closed to provide real accuracy and are limited by network privacy issues.

Dan’s solution: Use Twitter data for quality signals.

Dan’s own research shows some very interesting numbers that make sense when considering Twitter (and other real-time news and social sites) as a quality signal. Dan suggests that “retweets” are a very effective method of determining quality, although only 1.4% of tweets are actually “retweeted”. When considering the absolutely huge number of tweets made each day, or hour, that 1.4% is still a huge number. Retweets can be a valuable measure because typically only quality content gets retweeted. Another tool being used by bloggers is automatic tweeting of blog posts. Quality posts also tend to get retweeted. Although this too can be gamed, it is much more difficult to do this on a mass scale, as you’re now dealing with people vs. technology. The playing field is now moving from algorithms to psycology.

Because Dan’s algorithms for determining quality score of tweets vs retweets went over my head, here’s a link to his research and reports that you can view yourself.