Google PageRank Algorithm – The World’s Largest Eigenvalue Problem
October 24, 2010
We have talked in some detail about the Eigensystem decomposition of a correlation and a covariance matrix (of asset returns) in our CFE classes and have also talked about the Principal Components Analysis (PCA) and how PCA algorithm is based on eigenvectors and eigenvalues of a correlation matrix of an asset returns. In the class we did mention that PCA and eigensystem decomposition has wide applications in diverse disciplines such as Physics, Engineering, Control Theory, Economics, Finance, Sociology, etc. If a problem can be reduced to an eigenvalue problem then the dimensionality of that problem gets reduced and it becomes a lot more tractable.
In the class we also mentioned that Google’s web (internet) search algorithm was, in a large measure, developed as an eigenvalue problem. In fact, Google PageRank algorithm is the world’s largest eigenvalue problem.
Andrew Fung, who is a CFE graduate in Hong Kong and a QUAD student and who works for Barclays Capital in Hong Kong has developed an Excel spreadsheet implementation of a Google PageRank algorithm based on a research paper /article referenced at the end of this article*.
Consider a very simple and a stylized internet (world wide web) where there are only five websites (web pages) as shown below.
Each website is linked to the other and the linked structure is shown by the arrows. The initial probability of choosing a site (i.e. the initial probability that the web crawler will choose this site before a search request is sent or the algorithm is run) is 20%. All websites have equal probability of getting picked up. This is the initial condition.
Now when a search request is sent, the search engine will search through these five websites and then rank them in the order of importance. This ranking of the websites will be done by estimating the Principal Eigenvector of the system.
The mathematical model that forms the basis of PageRank (how websites and web pages are ranked in order of their importance) is predicated on a random walk that moves through all pages of the stylized internet as show above. If is the probability of being on page at time then the PageRank of page is expressed as . Of course, we have to make sure that the random walk does not get stuck in a page by assigning artificial links to those pages with no out links.
From the linked structured (i.e. how each website is linked to the other) shown in the diagram above and using the random walk model the Markov matrix is generated. The Markov matrix for the above stylized web would be given by:
The matrix P above is irreducible and stochastic. It is stochastic because all columns add up to one. The random walk is therefore expressed as a Markov Chain. The PageRank of all pages can be computed as the Principal Eigenvector of .
Mathematically, we can write as:
Where, is the Markov matrix as shown above, is the matrix of eigenvectors from which the Principal Eigenvector will be estimated and is the set of Eigenvalues corresponding to each eigenvector and show the importance of each eigenvector.
Any comments and queries can
be sent through our
More on Quantitative Finance >>
back to top