.

Probability you need to know to understand the search engines properly

By: Will Critchlow

A primer in everything you (n)ever wanted to know about (convergence in) probability

Don’t be scared of the maths in this post. I want to share one of the reasons I enjoyed studied probability at university. It’s an infuriating, beautiful, brilliant subject with myriad applications to search marketing. It’s also (mainly) based in common sense if you are prepared to try to get past the maths. Don’t be scared of the terminology – I’m going to explain things from scratch. And then I’m going to try to apply it to some practical advice.

If you have A-level maths (or equivalent), you should be able to follow. Here we go…

Probability

The bit of probability I want to show you is to do with a subject called convergence, but first, some basics.

What is probability?

The kind of probability you learn about at school is most easily thought about as:

The probability of an event occurring = (the number of ways that event can occur) / (the total number of equally likely outcomes)

So the probability of getting a head when you toss a (fair) coin is 1 / 2 (i.e. the number of heads = 1 and the total number of outcomes = 2, H or T).

This serves us pretty well for a lot of things. But pretty soon, you find outcomes that aren’t equally likely – at which point you start thinking of probability as the proportion of times you expect to get a particular result in a set of trials (you can use this definition of probability to cope with click through rates for example – the probability of getting a click being 0.2 means over time, 1 in 5 people click on your link).

[Side note: when you study more probability, you realise this is actually a circular definition as it relies on expectation which is a consequence of the definition of probability, but tough, we're doing our best here.]

But what about when there are an infinite number of equally likely outcomes? What about if we are trying to pick a random integer? Then you start introducing (at school) probability distribution functions etc. (which are relatively boring, in my opinion). As you learn more about the subject, you start talking about probability spaces, metrics and integration. Ugggh. We’re not going to go there today (this is a blog post, not a textbook – I can talk about the interesting stuff…).

So let’s just carry on thinking about probability as the ‘chance of something happening’. For most practical purposes, you can think of it in terms of experiments. If I repeated this test a hundred (a thousand, a million) times, how many times would I expect to get a particular result.

One pretty cool (geek!) thing about probability when you start talking in terms of probability spaces, is that it becomes clear that you can have events that have zero probability of happening that could actually happen. The easiest way to think of this is to imagine you were to pick a random number (of any size) with equal likelihood of picking any number. Because there are infinitely many possible equally likely outcomes, it is not too hard to come to the conclusion that the probability of picking any particular number is 0. Yet you did pick one…

However, events with probability zero don’t factor into any calculations really – for this reason we talk about almost sure or almost certain for events with probability 1. Having probability 1 doesn’t actually mean something is certain – it simply means the probability of it not happening is 0 (though this could still happen). Confused yet? Loving it yet?

What is convergence?

Suppose you have a sequence of numbers x1, x2, … , xn, …

To say that the sequence ‘converges’ is to say that there is a number x such that no matter how small a number E you choose, there is a point in the sequence past which all the numbers in the sequence are closer to x than E.

An example would be the sequence 0.9, 0.99, 0.999, 0.999, … which converges to 1 – i.e. it gets arbitrarily close to 1 as you go along.

Note, however, that it never actually reaches 1. Nevertheless, we say that the ‘limit of the sequence as n tends to infinity’ is 1.

Not all infinite sequences converge. The simplest example of a non-converging sequence is one that alternates between two numbers: 0, 1, 0, 1, 0, …

While explaining that, I have subtly slipped in the concept of infinity. This is easily a subject large enough not just to be the subject of another post, but to be a graduate-level course all on its own. I am going to use infinity in a slightly sloppy way (in pure maths terms – so not many people will notice). I thought it was enough to try to explain convergence and probabilistic convergence in one day. Never mind the many levels of infinity!

The reason you (might) care about convergence is that it:

  • underpins some of the elements of the Google pagerank algorithm
  • is very important if you are working with any kind of iterative algorithm (especially those involving randomness)
  • involves infinity, which is always cool
  • gives you something to talk about at dinner parties

OK. Maybe just the first two reasons. But still.

What is random convergence?

Suppose you have a random sequence X1, X2, … , Xn, …

What I mean by this is that each of the Xi is a number picked according to some probability distribution (this is called a ‘Random Variable’). For example, I might say Xi = 0 if a coin lands heads up and Xi = 1 if a coin lands heads down for a sequence of coin tosses.

Then we define the probability space W as a set of points corresponding to “real-world outcomes”. For example, if x is a point in W, it might correspond to the outcome where we toss a coin and it keeps landing heads-up (H, H, H, H, …) – this would give us a sequence X1(x), X2(x), X3(x), … = 0, 0, 0, … (according to our rule above). Another point y might correspond to one tail, then heads (T, H, H, H, …) etc.

Now we can look at the probability of a particular outcome – let’s denote this P(x) – this is the probability of the outcome x occurring – in the example above, if our sequence is 10 coin tosses long, P(x) = P(10 heads in a row) = 0.5^10.

Okey dokey. Now consider the behaviour as n grows larger and larger (mathematicians say “as n tends towards infinity”).

Kinda hard to imagine isn’t it?

For any given outcome, we are left with a sequence x1, x2, x3, … (which may or may not converge as n tends to infinity).

Because there are a whole range of these sequences, each with its own probability of occurring, we can define a number of kinds of convergence when we are talking about convergence of a random sequence. There are actually a few kinds, but I just wanted to quickly introduce two:

  1. convergence in probability
  2. convergence with probability 1

Convergence in probability

We say that a random sequence converges in probability if there exists a random variable X such that, for all e (no matter how small), the probability that the difference between Xn and X is greater than e tends to 0 as n tends to infinity.

This is actually quite a weak form of convergence and there are a lot of sequences that don’t look like they shouldn’t converge that do converge in probability.

Convergence with probability 1

As mentioned above, something can have probability 1 without actually being certain – which is why we call it ‘almost sure’ or ‘almost certain’. Convergence with probability 1 is a common form of convergence that is one of the strongest.

What it means is that almost all of the sequences that can possibly be generated converge in the regular sense (i.e. that the probability of getting a sequence that doesn’t converge is 0).

Practical uses

As I mentioned, the concept of convergence is important when you are dealing with iterative algorithms (and probabilistic convergence comes into play when you are dealing with iterative algorithms that have an element of randomness). One example of this kind of algorithm is a class known as Monte Carlo algorithms. These are used to find the answer to questions of probability by running a large number of theoretical trials and seeing what the result is.

The famous Google pagerank algorithm is an iterative algorithm with random elements and so the question of whether or not it will converge is potentially a highly complex one. I haven’t put too much thought into this (or done much background reading), but I guess I imagine they have! Even so, I imagine it converges ‘almost surely’ at best. This means that there is a chance the Internet wouldn’t converge… Wouldn’t that be bad? (Note – I know they use a limited number of iterations, so they’re not going to end up in an infinite loop, but still).

Having said all that, this post isn’t really about practical uses, but is more an adventure into some slightly advanced maths. I hope you enjoyed it.

Disclaimer, inspiration and further reading

This post was inspired by conversations with by Hamlet Batista on SEOmoz.

Disclaimer: it’s a long time since I studied this stuff so I might have any bit of it wrong. It’s still fun though.

If you enjoyed that bit of maths, you might like the following (though some of these need waaay more maths):

Post to Twitter Tweet This Post

Reputation 101 – How to protect your brand online

By: Duncan Morris

Reputation 101

These days you can bet your bottom dollar that most business transactions are preceeded by some sort of a search on one of the major search engines for your company name. If those results don’t show your company in a positive light, then you are likely losing business.

It could be someone ranting on a forum about your company, or maybe it’s those embarrassing photos of you from your ‘less professional’ days. After a job interview, after pitching for business or after finally plucking up the courage to ask the girl of your dreams out you had better hope they don’t check up on you in the search engines.

The following links should provide all the information you need to help you to find out if this is happening and clean up your online presence. Not only that but by following the tips given in the various excellent reputation monitoring guides you should be able to protect you or your company from any reputation management issues going forward.

Cautionary Tales

Apple’s Reputation Sours [Marketing Pilgrim]

Another ‘ignore bloggers, this is what you get’ cautionary tale for companies [The Viral Garden]

Web damage control has become a big business [Seattle Times]

Good or Bad, Words Spreads Fast on the Web [Search Engine Guide]

BloggingDosh – How To Destroy Your Reputation [John Chow]

How NOT to Handle an Online Reputation Management Crisis [Copy Brighter]

Absolute Poker Have A Reputation Management Crisis [Distilled]

Reputation Management

Free Online Reputation Management Beginner’s Guide [Marketing Pilgrim]

A Best Practice Primer to Search Engine Reputation Management [Brandcurve]

Reputation Management – The Top Ten Tips to Clean Up Your Online Reputation [Manage your Buzz]

The Ethics of Reputation Management (or, “Getting Stuff Deleted From Google”) [SEOmoz]

Chris Bennett on Reputation Management (E-Tourism Summit) [97th Floor]

4 Lessons in Online Reputation Management from a Small Town Grocer [Search Engine Guide]

Ten Ways to Avoid a Google Reputation Management Nightmare [Marketing Pilgrim]

The Definitive Guide to Online Reputation Management [Scoreboard Media]

Common Reputation Management Issues and How To Address Them [invesp]

6 Easy Steps to Personal Reputation Management [Reputation advisor]

Reputation Management Anyone Can Do [97th Floor]

Using social media sites for reputation management [Pronet Advertising]

New Survey Proves You Must Manage Your Own Online Reputation [Bill Hartzer]

Get Rid of Rip Off Report in 6 Weeks [97th Floor]Warning – this is a sales page(!) but there are some good tips

Protecting Your Online Reputation [Pronet Advertising]

Online Reputation Management Tips – Reputation Advisor [Reputation Advisor]

Crowdsourcing :: Reputation Management for Digital Natives [Young PR]

Dos and don’ts for digi natives [PR Blogger]

Not Just Your Space – The College Student’s Guide to Managing Online Reputation [Naymz]

Online Reputation Management, Are You Doing It? [Search Engine Guide]

Basics of Online Reputation Management [Top Rank Blog]

Tips on Safeguarding Your Online Reputation [The Wall Street Journal]

Five Ways Negative Reviews Help Your Online Reputation

Reputation Monitoring

17 Search Engine Reputation Management Optimization Tips [Mediapost]

Why Blog Monitoring is Useless Without Community Context (or Another Analogy) [Hyku]

31 Places to Monitor Your Reputation Online [Search Marketing Gurus]

7 Free Brand Reputation Management Tips [Thirstypony]

23 things every company should be monitoring… [Jaffejuice]

10 things you should be monitoring [Pronet Advertising]

10 things you should be monitoring (and a few more from me) [Web Strategist]

How to use analytics software for reputation monitoring [Distilled]

Using the Search Engines

Tips for Controlling the Top 10 [Wolf Howl]

Using Social Media To Help Manage Online Reputation [Search Rank]

Own Your Google Reputation with these Ten Suggestions [Gooruze]

Using Search for Public Relations & Reputation Management [Search Engine Watch]

Using blogs / bloggers

How to Write a Social Media Press Release [Copyblogger]

Build Your Online Reputation with Consistent Posting [A list seo]

Blogger Relations 101 [Top Rank]

Blogs: Making Your Pitch – What Not To Do [Search Marketing Gurus]

How NOT to pitch a blog [Top Rank]

10 Ways to Hurt Your Blog’s Brand by Commenting on Other Blogs [Problogger]

Aggressive Reputation Management

Aggressive Reputation Management [Distilled]

Making Link Bait and Viral Marketing Work – Part Ten [Search Engine Guide]How to capitalise on your competitor’s failures

Everything You Need to Know About FRO (Fake Review Optimization) [Tropical seo]

Don’t Just Manage Your Reputation, Respond to Your Competitor’s [Search Engine Guide]

Post to Twitter Tweet This Post

Official: Distilled Is Great In Bed

By: Tom Critchlow

Yep – it’s official, Distilled are now great in bed. Must be true – Facebook told me so!

While the above fact may be true, it’s not really the right topic for discussion on this blog so what am I talking about? Well Distilled now have a facebook page, come join us here: Distilled on Facebook!

Distilled on Facebook

I really was sorely tempted to pimp out our facebook company profile with applications like how good a lover are you but figured it probably wasn’t appropriate. (Not even lolcats Will?)

So what are you waiting for? Become a fan of Distilled now.

To all my personal fans out there – I’m rushing off on holiday tomorrow so probably won’t be blogging again before Christmas. Hopefully Will and Duncan will be able to keep the blogging up while I’m away. Merry Christmas all!

Post to Twitter Tweet This Post

An Online Reputation Management Case Study

By: Tom Critchlow

I’m not claiming that below is particularly impressive or difficult but if anyone out there isn’t quite sure how you can use the search engines to protect/control your brand online then here’s a quick example:

Tom Critchlow Google Search Results

While this is obviously a very unique search and not a competitive query at all I think it is representative of how you should approach reputation management in the search results:

First, the pages you control

These are the pages which you will always be able to control and (should) always rank no 1. These are the pages which you want people to find and they should be where everyone clicks. Note how I only have two results here but could easily use a sub-domain or other site we control to gain more spots like this.

Positive pages about you which you don’t control

Then you have pages on sites which you clearly don’t control such as SearchEngineLand & Marketing Pilgrim (no link to Spannerworks since they didn’t link to us or even approve the comment I left on their blog!). Blog posts from other people which mention you can be a great page to rank since they not only take up search results but they also say positive things about you.

Some random pages which aren’t related to you

Then we see some pages which are about another Tom Critchlow (yes, there are a few of us out there! clearly no others into SEO though ;-) ). These pages are great since if someone is looking for information about me and they start seeing pages about someone else they’ll most likely give up and refer back to the pages they’ve already seen. Also, pages like this make it look like you’re not dominating the search results and make the page look more natural. The last thing you want is for people to KNOW that you’re covering something up so having these pages rank is a good idea. Depending on your search term though these can be hard to come by.

Some more pages about you

Sure, why not :-) – this time it’s flickr which makes an appearance. This page ranks as Danny links to it from his page on SearchEngineLand.

How do you create search results like this?

Easy – just drop them some links! For example, a couple of pages I’d quite like to push onto the first page are (don’t forget to include your anchor text in the link):

Tom Critchlow’s one and only mention in Wired magazine

Tom Critchlow works for Distilled right? SEOmoz seems to think so.

Closing note – I don’t have anything to cover up in the search results, hopefully you knew that anyway but thought I’d mention it just in case. I’m really just playing around here to see what works and what doesn’t. I’d encourage you to do the same since playing with search results that are easy to manipulate is always a good way to learn.

Post to Twitter Tweet This Post

 
infographic-tools