The PageRank probability distribution
The PageRank Calculator - Click here to put theory to the test by modelling the PageRank for a web site link structure.
About the PageRank Calculator - A brief explanation and some further resources.
Conservation of PageRank
In the current model, conservation of PageRank will not happen in all circumstances. For instance, what happens when Google comes across a page without any outgoing links?
Supposing the page has a starting PageRank of 1. This PageRank is supposed to be passed on to all of the pages it links to, but there aren't any. In this situation we must return to the original definition of PageRank. Section 2.1.1 and 2.1.2 of The Anatomy of a Large-Scale Hypertextual Web Search Engine. states that:
Note that the PageRanks form a probability distribution over web pages, so the sum of all web pages' PageRanks will be one....PageRank can be thought of as a model of user behavior. We assume there is a "random surfer" who is given a web page at random and keeps clicking on links, never hitting "back" but eventually gets bored and starts on another random page. The probability that the random surfer visits a page is its PageRank. And, the d damping factor is the probability at each page the "random surfer" will get bored and request another random page.
Taking this scenario of the surfer randomly clicking on links on the page, the surfer reaches a dead end so they must then select another page at random. They could either:
- select another page completely at random, in which case every page on the internet has an equal chance of being selected. In this case the 'lost' PageRank will be evenly divided between all the other pages in Google's database.
- some pages will have a higher probability of being selected than others. In this case the 'lost' PageRank could be divided up between a select group of higher profile pages (such as Yahoo and the Open Directory).
At the moment it is not possible to know what Google does with the lost PageRank when it comes to a dead end. In this situation, when all pages have an initial PageRank of 1 but there are pages in the model which do not have any outgoing links, PageRank will not be conserved but will instead be 'leaked'.
There is another scenario where you may find the total PageRank is not conserved between iterations, and that is when a page does not start with an initial PageRank of 1. In fact, we do not know what initial PageRanks Google is assigning before it applies the algorithm, and it may well be the case that certain web sites are indeed being given a 'head start', as discussed in the second bullet point above.
The Probability Distribution
Regardless of starting criteria, we can arrive at the probability distribution, Brin and Page tantalisingly suggest in their research paper by dividing the PageRank of each page by the total amount of PageRank available at the end of each iteration. You can view these probabilities by clicking on the 'Show/Hide Probabilities' button on the PageRank Calculator.
Bear in mind however, that much of this is conjecture as we can never be sure of all the criteria Google applies. The PageRank Calculator is not intended as a definite representation of how Google behaves, but by experimenting with different link structures and initial PageRank values, you should get a good idea of how Google will grade these pages in terms of importance.
Mark Horrell
Last Updated: 27 November 2001.
- Bookmark site with
del.icio.us - Post site to
Facebook

