Network scaling algorithms such as the Pathfinder algorithm are used to prune many different kinds of networks, including citation networks, random networks, and social networks. However, this algorithm suffers from run time problems for large networks and online processing due to its O(n4) time complexity. Hence another alternative, MST-Pathfinder algorithm, is used which prunes the original network to get its PFNET(infinity,n-1) in just O(n2 * log n) time.
The underlying idea comes from the fact that the union (superposition) of all the Minimum Spanning Trees extracted from a given network is equivalent to the PFNET resulting from the Pathfinder algorithm parameterized by a specific set of values (r = infinity and q = n-1), those usually considered in many different applications.
The algorithm has a space complexity of 3 * n2 + n, which is very less compared to the Original Pathfinder approach's complexity of 2 * (n3 - n2). Also its time complexity is O(n2 * log(n)) compared to O(n4) of the Original Pathfinder approach. According to an experimental setup, this efficiency enables MST based approach to scale the network with 10,000 nodes and 100 million edges in approximately 129 seconds compared to more than 222 hours using Original approach.
However, since it only provides an optimal solution the user cannot control the parameters,
This, however, is mitigated by the fact that the networks resulting from citation, cocitation, or term co-occurrence analysis are usually dense when the categories are used as the unit for each node. Due to this fact, and especially in the case of vast scientific domains with a high number of entities (categories in our case) in the network, Pathfinder is usually parameterized to r = infinity and q = n-1, obtain an schematic representation of the most outstanding existing information by means of a network showing just the most salient links.
One of the drawbacks of this algorithm is that it cannot be applied on directed networks, because the sorting process deals with undirected links. Scaling of Directed networks using Pathfinder algorithm can be implemented in the future.
Once the input file is validated the algorithm is started. It works as follows,
The user has to provide 3 inputs; the file storing the information about the network, the column name that represents the edge weight and how the edge weight should be considered (Dissimilarity or Similarity). The provided network should not contain edge weights of less than equal to 0 else a warning is generated and default edge weight is assumed for that edge. The algorithm assumes the edge weights to be a measure of dissimilarity i.e. More the weight, more is the dissimilarity. If the input edge weights are a measure of Similarity i.e. More the weight, less is the similarity then the user can select Similarity option from the drop-down box. Only in the case of user selecting Similarity the edge weights will be inverted. For example, in a co-authorship network, more number of articles authored by 2 subjects equates to a stronger relationship. In this case the user should select Similarity option. The output will be provided as a .NWB file with a pruned network.
The original Pathfinder Network Scaling algorithm authored by Roger Schvaneveldt et al. was adapted with a MST based approach by Arnaud Quirin et al. is implemented, integrated and documented by Chintan Tank. Also thanks to Micah Linnemeier and Russell Duhon for providing guidance during design and implementation. For the description I acknowledge Wikipedia.