Message-ID: <169259574.27259.1590889554325.JavaMail.confluence@wiki.cns.iu.edu> Subject: Exported From Confluence MIME-Version: 1.0 Content-Type: multipart/related; boundary="----=_Part_27258_1277699137.1590889554325" ------=_Part_27258_1277699137.1590889554325 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Content-Location: file:///C:/exported.html Fast Pathfinder Network Scaling

# Fast Pathfinder Network Scaling

###### Descrip= tion
=20

Network scaling algorithms such as the Pathfi= nder algorithm are used to prune many different kinds of networks, incl= uding citation networks, random networks, and social networks. However, thi= s algorithm suffers from run time problems for large networks and online pr= ocessing due to its O(n4) time complexity. Hence another alterna= tive, Fast Pathfinder algorithm, is used which prunes the original network = to get its PFNET(r,n-1) in just O(n3) time. Since it essentially= employs different algorithm than MST based PAthfinder scaling its resultsm= ight differ from it.

=20

The underlying idea of this approach is based on a classical algorithm i= n graph theory for shortest path computation - Floyd-Warshall's Shortest Path Algorithm.

=20
###### Pros & Co= ns
=20

There is a substantial speed increase due to the changes made in shortes= t path distance matrix computation. This leads to a reduction in the ti= me complexity from O(n4) to O(n3). Also the = space complexity is drastically reduced from (2 * n) - 1 matrices in t= he original algorithm to 2 matrices.

=20

However, the user cannot control the q paramet= er, which constrains the number of indirect proximities examined in generat= ing the network. The q parameter is an integer value between 2 and n-1, inc= lusive where n is the number of nodes or items. Value of q is fixed to n-1.= Also we have found that networks that have a low standard deviation for ed= ge weights or if many of the edge weights are equal to the minimum edge wei= ght, then the network might not be scaled as much as expected. This causes = unweighted networks to be not scaled at all.

=20

If the networks being processed are undirected then MST based Pathfinder Network scalin= g algorithm can be used. This will give results in 30 times faster than= Fast pathfinder algorithm. Also it does not face the disadvantages faced b= y Fast Pathfinder algorithm.

=20
###### Imple= mentation Details
=20

Once the input file is validated the algorithm is started. It works as f= ollows,

=20
=20
1. Initialize matrices for edge weights & minimum distance costs with = dimensions being number of nodes x number of nodes. Initialize their defaul= t values as POSITIVE_INFINITY.
2. =20
3. When reading through the nodes, map node ids to matrix indexes.
4. =20
5. Next, when reading through the edges set weight & distance matrices= with edge weights corresponding to a pair of nodes. Depending upon users c= hoice of how edge weight should be represented, normalize the edge weight i= .e. If they chose similarity, normalize edge weights to be dissimilarity ba= sed by inverting the edge weight.
6. =20
7. Now run 3 nested "for" loops with indexes i, j, k such that it returns = the shortest possible path from node i to node j using only vertices 1 thro= ugh k as intermediate points along the way. Now, given this function, our g= oal is to find the shortest path from each node i to each node j using only= nodes 1 through k.=20
=20
1. For this, either the true shortest path only uses nodes in the set (1..= .k); or there exists some path that goes from node i to node k, then from k= to j that is better.
2. =20
3. Use Minkowski's distance, r parameter, to calculate the distance betwee= n the intermediate nodes.
4. =20
8. =20
9. Once the distances for all pairs of nodes are calculated, during genera= ting the output file, output only those edges whose distance & weight m= atrix values are equal.
10. =20
=20
###### Usage Hints
=20

The user has to provide 4 inputs; the file storing the information about= the network, value of r parameter, the column na= me that represents the edge weight and how the edge weight should be consid= ered (Dissimilarity or Similarity). The provided network should not contain= edge weights of less than equal to 0 else a warning is generated and defau= lt edge weight is assumed for that edge. The value of r ranges from 1 to infinity. Since r =3D infinity is the most= commonly used in network scaling, there is a check-box, which when checked= indicates that value of r is infinity (as defaul= t it is checked). The algorithm assumes the edge weights to be a measure of= dissimilarity i.e. More the weight, more is the dissimilarity. If the inpu= t edge weights are a measure of Similarity i.e. More the weight, less is th= e similarity then the user can select Similarity option from the d= rop-down box. Only in the case of user selecting Similarity the ed= ge weights will be inverted. For example, in a co-authorship network, more = number of articles authored by 2 subjects equates to a stronger relationshi= p. In this case the user should select Similarity option. The outp= ut will be provided as a .NWB file with a pruned network.

=20
= =20
=20
###### Acknowledgm= ents
=20

The original Pathfinder Network Scaling algorithm authored by Roger Schvaneveldt et al. was= adapted with a Fast based approach by Arnaud Quirin et al. is imp= lemented, integrated and documented by Chintan Tank. Also thanks to Micah L= innemeier and Russell Duhon for providing guidance during implementation. F= or the description I acknowledge Wikipedia.

=20
###### References
=20
=20
1. Schvaneveldt, Roger. Pathfinder associative networks : Studies in Knowl= edge organization. 1990. Link
2. =20
3. Quirin, Arnaud., Cordon, Oscar., Guerrero-Bote, Vicente., Vargas-Quesad= a, Benjamin., Moya-Anegnon, Felix. A quick MST-based algorithm to obtain Pa= thfinder networks ((infinity), n - 1). In Journal of the American Socie= ty for Information Science and Technology, volume 59, pages 1912-1924,= 2008. Link
4. =20
5. Quirin, Arnaud., Cordon, Oscar., Santamaria, J., Vargas-Quesada, Benjam= in., Moya-Anegnon, Felix. A new variant of the Pathfinder algorithm to gene= rate large visual science maps in cubic time. In Information processing= & management, volume 44, pages 1611-1623, 2008. Link
6. =20
=20