The network to analyze must be directed, otherwise there are no special constraints.
Originally introduced to rank Web pages, it finds application in more general problems of ranking, like in citation networks.
The algorithm requires three inputs, the file where the edges of the network are listed, the damping factor, which is a probability (i.e. it is a real number between 0 and 1) and the number of points for the binned distribution described below. A first read-in of the inputfile will set the values of the number of nodes and edges of the network. In the second read-in the degree of all nodes is calculated and the edges are stored in an array. PageRank is calculated with the power method, i.e. by multiplying the transition matrix of the PageRank process by an initial arbitrary vector, and repeating the multiplication until the resulting vector does not change any more. The components of the stationary vector are the PageRank values of the nodes of the network. The PageRank values are listed in one of the three output files, together with the corresponding node index.
Then the distribution of PageRank is calculated. The program generates two other output files, corresponding to two different ways of partitioning the interval spanned by the PageRank values. In the first output, one divides the range of variation of PageRank in equal bins and determines the number of nodes whose PageRank value lies inside each bin: the scores are then divided by the number of nodes of the network, so to obtain the probability: the output displays such probability values together with the center points of the bins they refer to.
The second output gives the binned distribution, i.e. the interval spanned by the PageRank values is divided into bins whose size grows while going to higher values of the variable. The size of each bin is obtained by multiplying by a fixed number the size of the previous bin. The program calculates the fraction of nodes whose PageRank falls within each bin. Because of the different sizes of the bins, these fractions must be divided by the respective bin size, to have consistent results.
This technique is particularly suitable to study skewed distributions: the fact that the size of the bins grows large for large values of the variable compensates for the fact that not many nodes have high values, so it suppresses the fluctuations that one would observe by using bins of equal size. On a double logarithmic scale, which is very useful to determine the possible power law behavior of the distribution, the points of the latter will appear equally spaced on the x-axis.
The algorithm runs in a time
E is the number of edges of the network.
The algorithm was implemented and documented by S. Fortunato, integrated by S. Fortunato and W. Huang. For the description we acknowledge Wikipedia.
Brin, S., Page, L. (2001) The Anatomy of a Large-Scale Hypertextual Web Search Engine.
Proceedings of the seventh International Conference on the World Wide Web (WWW1998):107-117.