|
In the current model, conservation of PageRank will not happen in all circumstances. For instance, what happens when Google comes across a page without any outgoing links?
Supposing the page has a starting PageRank of 1. This PageRank is supposed to be passed on to all of the pages it links to, but there aren't any. In this situation we must return to the original definition of PageRank. Section 2.1.1 and 2.1.2 of The Anatomy of a Large-Scale Hypertextual Web Search Engine. states that:
Note that the PageRanks form a probability distribution over web pages, so the sum of all web pages' PageRanks will be one....PageRank can be thought of as a model of user behavior. We assume there is a "random surfer" who is given a web page at random and keeps clicking on links, never hitting "back" but eventually gets bored and starts on another random page. The probability that the random surfer visits a page is its PageRank. And, the d damping factor is the probability at each page the "random surfer" will get bored and request another random page.
Taking this scenario of the surfer randomly clicking on links on the page, the surfer reaches a dead end so they must then select another page at random. They could either:
- select another page completely at random, in which case every page on the internet has an equal chance of being selected. In this case the 'lost' PageRank will be evenly divided between all the other pages in Google's database.
- some pages will have a higher probability of being selected than others. In this case the 'lost' PageRank could be divided up between a select group of higher profile pages (such as Yahoo and the Open Directory).
At the moment it is not possible to know what Google does with the lost PageRank when it comes to a dead end. In this situation, when all pages have an initial PageRank of 1 but there are pages in the model which do not have any outgoing links, PageRank will not be conserved but will instead be 'leaked'.
There is another scenario where you may find the total PageRank is not conserved between iterations, and that is when a page does not start with an initial PageRank of 1. In fact, we do not know what initial PageRanks Google is assigning before it applies the algorithm, and it may well be the case that certain web sites are indeed being given a 'head start', as discussed in the second bullet point above.
|