Spamdex - Spam Archive

Report spam

Send in your spam and get the offenders listed

Forward the spam you receive to questions@spamdex.co.uk

Also in google.com

RankBrain Survival Guide

RankBrain Survival Guide


RankBrain Survival Guide

Posted: 09 Jun 2016 09:30 AM PDT

This is a guide to surviving RankBrain. I created it, in part, because there’s an amazing amount of misinformation about RankBrain. And the truth is there is nothing you can do to optimize for RankBrain.

I’m not saying RankBrain isn’t interesting or important. I love learning about how search works whether it helps me in my work or not. What I am saying is that there are no tactics to employ based on our understanding of RankBrain.

So if you’re looking for optimization strategies you should beware of the clickbait RankBrain content being pumped out by fly-by-night operators and impression hungry publishers.

You Can’t Optimize For RankBrain

You Can't Optimize For RankBrain

I’m going to start out with this simple statement to ensure as many people as possible read, understand and retain this fact.

You can’t optimize for RankBrain.

You’ll read a lot of posts to the contrary. Sometimes they’re just flat out wrong, sometimes they’re using RankBrain as a vehicle to advocate for SEO best practices and sometimes they’re just connecting dots that aren’t there.

Read on if you want proof that RankBrain optimization is a fool’s errand and you should instead focus on other vastly more effective strategies and tactics.

What Is RankBrain?

RankBrain is a deep learning algorithm developed by Google to help improve search results. Deep learning is a form of machine learning and can be classified somewhere on the Artificial Intelligence (AI) spectrum.

I think of Deep Learning as a form of machine learning where the algorithm can adapt and learn without further human involvement. One of the more interesting demonstrations of deep learning was the identification of cats (among other things) in YouTube thumbnails (pdf).

How Does RankBrain Work?

Knowing how RankBrain works is important because it determines whether you can optimize for it or not. Despite what you might read, there are only a handful of good sources of information about RankBrain.

Greg Corrado

The first is from the October 26 Bloomberg RankBrain announcement that included statements and summaries of a chat with Google Senior Research Scientist, Greg Corrado.

RankBrain uses artificial intelligence to embed vast amounts of written language into mathematical entities — called vectors — that the computer can understand. If RankBrain sees a word or phrase it isn’t familiar with, the machine can make a guess as to what words or phrases might have a similar meaning and filter the result accordingly, making it more effective at handling never-before-seen search queries.

This makes it pretty clear that RankBrain uses vectors to better understand complex language.

Word2Vec is most often referenced when talking about vectors. And it should be noted that Jeff Dean, Greg Corrado and many others were part of this effort. You’ll see these same names pop up time and again surrounding vectors and deep learning.

I wrote a bit about vectors in my post on Hummingbird. In particular I like the quote from a 2013 Jeff Dean interview.

I think we will have a much better handle on text understanding, as well. You see the very slightest glimmer of that in word vectors, and what we’d like to get to where we have higher level understanding than just words. If we could get to the point where we understand sentences, that will really be quite powerful. So if two sentences mean the same thing but are written very differently, and we are able to tell that, that would be really powerful. Because then you do sort of understand the text at some level because you can paraphrase it.

I was really intrigued by the idea of Google knowing that two different sentences meant the same thing. And they’ve made a fair amount of progress in this regard with research around paragraph vectors (pdf).

Paragraph Vector Paper

It’s difficult to say exactly what type of vector analysis RankBrain employs. I think it’s safe to say it’s a variable-length vector analysis and leave it at that.

So what else did we learn from the Corrado interview? Later in the piece there are statements about how much Google relies on RankBrain.

The system helps Mountain View, California-based Google deal with the 15 percent of queries a day it gets which its systems have never seen before, he said.

That’s pretty clear. RankBrain is primarily used for queries not previously seen by Google, though it seems likely that its reach may have grown based on the initial success.

Unfortunately the next statement has caused a whole bunch of consternation.

RankBrain is one of the “hundreds” of signals that go into an algorithm that determines what results appear on a Google search page and where they are ranked, Corrado said. In the few months it has been deployed, RankBrain has become the third-most important signal contributing to the result of a search query, he said.

This provoked the all-too-typical reactions from the SEO community. #theskyisfalling The fact is we don’t know how Google is measuring ‘importance’ nor do we understand whether it’s for just that 15 percent or for all queries.

Andrey Lipattsev

To underscore the ‘third-most important’ signal boondoggle we have statements by Andrey Lipattsev, Search Quality Senior Strategist at Google, in a Q&A with Ammon Johns and others.

In short, RankBrain might have been ‘called upon’ in many queries but may not have materially impacted results.

Or if you’re getting technical, RankBrain might not have caused a reordering of results. So ‘importance’ might have been measured by frequency and not impact.

Later on you’ll find that RankBrain has access to a subset of signals so RankBrain could function more like a meta signal. It kind of feels like comparing apples and oranges.

But more importantly, why does it matter? What will you do differently knowing it’s the third most important signal?

Gary Illyes

Another source of RankBrain information is from statements by Gary Illyes in conversation with Eric Enge. In particular, Gary has been able to provide some examples of RankBrain in action.

I mean, if you think about, for example, a query like, “Can you get a 100 percent score on Super Mario without a walk-through?” This could be an actual query that we receive. And there is a negative term there that is very hard to catch with the regular systems that we had, and in fact our old query parsers actually ignored the “without” part.

And RankBrain did an amazing job catching that and actually instructing our retrieval systems to get the right results.

Gary’s statements lend clear support to the idea that RankBrain helps Google to better understand complex natural language queries.

Paul Haahr

Paul Haahr Speaking at SMX West 2016

Perhaps the most interesting statements about RankBrain were made by Paul Haahr, a Google Ranking Engineer, at SMX West during his How Google Works: An Google Ranking Engineer’s Story presentation and Q&A.

I was lucky enough to see this presentation live and it is perhaps the best and most revealing look at Google search. (Seriously, if you haven’t watched this you should turn in your SEO card now.)

It’s in the Q&A that Haahr discusses RankBrain.

RankBrain gets to see some subset of the signals and it’s a machine learning or deep learning system that has its own ideas about how you combine signals and understand documents.

I think we understand how it works but we don’t understand what it’s doing exactly.

It uses a lot of the stuff that we’ve published on deep learning. There’s some work that goes by Word2Vec or word embeddings that is one layer of what RankBrain is doing. It actually plugs into one of the boxes, one of the late post retrieval boxes that I showed before.

Danny then asks about how RankBrain might work to ascertain document quality or authority.

This is all a function of the training data that it gets. It sees not just web pages but it sees queries and other signals so it can judge based on stuff like that.

These statements are by far the most important because it provides a plethora of information. First and foremost Haahr states that RankBrain plugs in late post-retrieval.

This is an important distinction because it means that RankBrain doesn’t rewrite the query before Google goes looking for results but instead does so afterwards.

So Google retrieves results using the raw query but then RankBrain might rewrite the query or interpret it differently in an effort to select and reorder the results for that query.

In addition, Haahr makes it clear that RankBrain has access to a subset of signals and the query. As I mentioned this makes RankBrain feel more like a meta-signal instead of a stand-alone signal.

What we don’t know are the exact signals that make up that subset. Many will take this statement to theorize that it uses link data or click data or any sundry of signals. The fact is we have no idea which signals RankBrain has access to nor with what weight RankBrain might be using them or if they’re used evenly across all queries.

The inability to know the variables makes any type of regression analysis of RankBrain a non-starter.

Of course there’s also the statement that they don’t know what RankBrain is doing. That’s because RankBrain is a deep learning algorithm performing unsupervised learning. It’s creating its own rules.

More to the point, if a Google Ranking Engineer doesn’t know what RankBrain is doing, do you think that anyone outside of Google suddenly understands it better? The answer is no.

You Can’t Optimize For RankBrain

You can’t optimize for RankBrain based on what we know about what it is and how it works. At its core RankBrain is about better understanding of language, whether that’s within documents or queries.

So what can you do differently based on this knowledge?

Google is looking at the words, sentences and paragraphs and turning them into mathematical vectors. It’s trying to assign meaning to that chunk of text so it can better match it to complex query syntax.

The only thing you can do is to improve your writing so that Google can better understand the meaning of your content. But that’s not really optimizing for RankBrain that’s just doing proper SEO and delivering better user experience (UX).

By improving your writing and making it more clear you’ll wind up earning more links and, over time, be seen as an authority on that topic. So you’ll be covered no matter what other signals RankBrain is using.

The one thing you shouldn’t do is think that RankBrain will figure out your poor writing or that you now have the license to, like, write super conversationally you know. Strong writing matters more now than it ever has before.

TL;DR

RankBrain is a deep learning algorithm that plugs in post-retrieval and relies on variable-length text vectors and other signals to make better sense of complex natural language queries. While fascinating, there is nothing one can do to specifically optimize for RankBrain.

RankBrain Survival Guide originally published on Blind Five Year Old.


---------------------------

All titles, content, publisher names, trademarks, artwork, and associated imagery are trademarks and/or copyright material of their respective owners. All rights reserved. The Spam Archive website contains material for general information purposes only. It has been written for the purpose of providing information and historical reference containing in the main instances of business or commercial spam.

Lets beat spam together
Many of the messages in Spamdex's archive contain forged headers in one form or another. The fact that an email claims to have come from one email address or another does not mean it actually originated at that address!
Please use spamdex responsibly.


Yes You! Get INVOLVED - Send in your spam and report offenders

Create a rule in outlook or simply forward the junk email you receive to questions@spamdex.co.uk | See contributors

Google + Spam | 2010- 2017 Spamdex - The Spam Archive for the internet. unsolicited electric messages (spam) archived for posterity. Link to us and help promote Spamdex as a means of forcing Spammers to re-think the amount of spam they send us.

The Spam Archive - Chronicling spam emails into readable web records index for all time

Please contact us with any comments or questions at questions@spamdex.co.uk. Spam Archive is a non-profit library of thousands of spam email messages sent to a single email address. A number of far-sighted people have been saving all their spam and have put it online. This is a valuable resource for anyone writing Bayesian filters. The Spam Archive is building a digital library of Internet spam. Your use of the Archive is subject to the Archive's Terms of Use. All emails viewed are copyright of the respected companies or corporations. Special thanks: We would like to thank Benedict who is a SEO Consultant who has freely given up his time to advise us on how best to maximise on our organic search traffic strategy and also for his wonderful creative vision on how to spread the word about Spamdex and how we try to combat spam across the globe. Click here for more information.

Our inspiration is the "Internet Archive" USA. "Libraries exist to preserve society's cultural artefacts and to provide access to them. If libraries are to continue to foster education and scholarship in this era of digital technology, it's essential for them to extend those functions into the digital world." This is our library of unsolicited emails from around the world. See https://archive.org. Spamdex is in no way associated though. Supporters and members of http://spam.abuse.net Helping rid the internet of spam, one email at a time. Working with Inernet Aware to improve user knowlegde on keeping safe online. | Link to us | Terms | Privacy | Cookies | Complaints | Copyright | Spam emails / ICO | Spam images | Sitemap

Important: Users take note, this is Spamdex - The Spam Archive for the internet. Some of the pages indexed could contain offensive language or contain fraudulent offers. If an offer looks too good to be true it probably is! Please tread, carefully, all of the links should be fine. Clicking I agree means you agree to our terms and conditions. We cannot be held responsible etc etc.

The Spam Archive - Chronicling spam emails into readable web records

The Glass House | London | SW19 8AE |
Spamdex is a digital archive of unsolicited electronic mail 4.8 out of 5 based on reviews
Spamdex - The Spam Archive Located in London, SW19 8AE. Phone: 080000 0514541.