Now select 'BethPlale.nsf' in the Data Manager. Use the following parameters to generate a Horizontal Bar Graph:
Figure 5.6: Funding profile over time of Beth Plale
Finally, select 'MichaelMcRobbie.nsf' in the Data Manager. Use the following parameters to generate a Horizontal Bar Graph:
Figure 5.7: Funding profile over time of Michael McRobbie
The horizontal bar graph visualizations in Figures 5.5, 5.6, and 5.7 make it easy to see the timespan of different researchers, as well as the types and volume of grants they generally receive (e.g., many small grants or a handful of large ones). From here, it may be useful to compare their Co-PI networks and look more closely at award totals. Select each dataset in the Data Manager window and run 'Data Preparation > Text Files > Extract Co-Occurrence Network' using these parameters (note that the Aggregation Function File is located in 'yoursci2directory/sampledata/scientometrics/properties/' folder:
Run 'Visualization > Networks > GUESS' on each generated network to visualize the resulting Co-PI relationships. Select 'GEM' from the layout menu to organize the nodes and edges.
To color and size the nodes and edges using the default Co-PI visualization theme, run 'yoursci2directory/scripts/GUESS/co-PI-nw.py' from 'Script > Run Script ...'.
Figure 5.8: Co-PI network of Geoffrey Fox in Indiana University
Figure 5.9: Co-PI network of Beth Plale in Indiana University
Figure 5.10: Co-PI network of Michael McRobbie in Indiana University
To extract the paper citation network, select the '361 Unique ISI Records' table and run 'Data Preparation > Text Files > Extract Directed Network' using the parameters :
The result is a directed network of paper citations in the Data Manager. Each paper node has two citation counts. The local citation count (LCC) indicates how often a paper was cited by papers in the set. The global citation count (GCC) equals the times cited (TC) value in the original ISI file. Only references from other ISI records count towards an ISI paper's GCC value. Currently, the Sci2 Tool sets the GCC of references to -1 (except for references that are not also ISI records) to prune the network to contain only the original ISI records.
- Resize Linear > Nodes > globalcitationcount> From: 1 To: 50 > When the nodes have no 'globalcitationcount': 0.1 > Do Resize Linear
- Colorize > Nodes > globalcitationcount > From: To: > When the nodes have no 'globalcitationcount': 0.1 > >Do Colorize
- Colorize > Edges > weight > From (select the "RGB" tab) 127, 193, 65 To: (select the "RGB" tab) 0, 0, 0
Type in Interpreter: >for n in g.nodes: \[tab\] n.strokecolor = n.color Or, select the 'Interpreter' tab at the bottom, left-hand corner of the GUESS window, and enter the command lines: > resizeLinear(globalcitationcount,1,50) > colorize(globalcitationcount,gray,black) > for e in g.edges: \[tab\] e.color="127,193,65,255" Note: The Interpreter tab will have '>>>' as a prompt for these commands. It is not necessary to type '>" at the beginning of the line. You should type each line individually and press "Enter" to submit the commands to the Interpreter. This will result in nodes which are linearly sized and color coded by their GCC, connected by green directed edges, as shown in Figure 5.11 (left). Any numeric node attribute within the network can be used to code the nodes. To view the available attributes, mouse over a node. The GUESS interface supports pan and zoom, node selection, and details on demand. For more information, refer to the GUESS tutorial at [http://nwb.slis.indiana.edu/Docs/GettingStartedGUESSNWB.pdf|http://nwb.slis.indiana.edu/Docs/GettingStartedGUESSNWB.pdf].
Figure 5.11: Directed, unweighted paper-paper citation network for 'FourNetSciResearchers' dataset with all papers and references in the GUESS user interface (left) and a pruned paper-paper citation network after removing all references and isolates (right)
The complete network can be reduced to papers that appeared in the original ISI file by deleting all nodes that have a GCC of -1. Simply run 'Preprocessing > Networks > Extract Nodes Above or Below Value' with parameter values:
The resulting network is unconnected, i.e., it has many subnetworks many of which have only one node. These single unconnected nodes, also called isolates, can be removed using 'Preprocessing > Networks > Delete Isolates'. Deleting isolates is a memory intensive procedure. If you experience problems at this step, refer to Section 3.3 Memory Allocation.
The complete paper-paper-citation network can be split into its subnetworks using 'Analysis > Networks > Unweighted & Directed > Weak Component Clustering' with the default values:
The largest component has 2407 nodes; the second largest, 307; the third, 13; and the fourth has 7 nodes. The largest component is shown in Figure 5.12. The top 20 papers, by times cited in ISI, have been labeled using > toptc = g.nodes\[:\] > def bytc(n1, n2): \[tab\] return cmp(n1.globalcitationcount, n2.globalcitationcount) > toptc.sort(bytc) > toptc.reverse() > toptc > for i in range(0, 20): \[tab\] toptc\[i\].labelvisible = true
Alternatively, run 'Script > Run Script' and select 'yoursci2directory/scripts/GUESS/paper-citation-nw.py'.
Figure 5.12: Giant components of the paper citation network
To produce a co-authorship network in the Sci2 Tool, select the table of all 361 unique ISI records from the 'FourNetSciResearchers' dataset in the Data Manager window. Run 'Data Preparation > Text Files > Extract Co-Author Network' using the parameter:
The result is two derived files in the Data Manager window: the "Extracted Co-Authorship Network" and an "Author information" table (also known as a "merge table"), which lists unique authors. In order to manually examine and edit the list of unique authors, open the merge table in your default spreadsheet program. In the spreadsheet, select all records, including "label," "timesCited," "numberOfWorks," "uniqueIndex," and "combineValues," and sort by "label." Identify names that refer to the same person. In order to merge two names, first delete the asterisk ('*') in the "combineValues" column of the duplicate node's row. Then, copy the "uniqueIndex" of the name that should be kept and paste it into the cell of the name that should be deleted. Resave the revised table as a .csv file and reload it. Select both the merge table and the network and run Data Preparation > Text Files > Update Network by Merging Nodes. Table 5.2 shows the result of merging "Albet, R" and "Albert, R": "Albet, R" will be deleted and all of the node linkages and citation counts will be added to "Albert, R".
A merge table can be automatically generated by applying the Jaro distance metric (Jaro, 1989, 1995) available in the open source Similarity Measure Library (http://sourceforge.net/projects/simmetrics/) to identify potential duplicates. In the Sci2 Tool, simply select the co-author network and run 'Data Preparation > Text Files > Detect Duplicate Nodes' using the parameters:
The result is a merge table that has the very same format as Table 5.2, together with two textual log files:
The log files describe, in a more human-readable form, which nodes will be merged or not merged. Specifically, the first log file provides information regarding which nodes will be merged, while the second log file lists nodes which are similar but will not be merged. The automatically generated merge table can be further modified as needed.
To merge identified duplicate nodes, select both the "Extracted Co-Authorship Network" and "Merge Table: based on label" by holding down the 'Ctrl' key. Run 'Data Preparation > Text Files > Update Network by Merging Nodes'. This will produce an updated network as well as a report describing which nodes were merged. To complete this workflow, an aggregation function file must also be selected from the pop-up window:
The updated co-authorship network can be visualized using _'Visualization > Networks > GUESS'_, (See section +18.104.22.168 GUESS Visualizations+ for more information regarding GUESS). Figure 5.13 shows the layout of the combined _'FourNetSciResearchers'_ dataset after it was modified using the following commands in the "Interpreter": > resizeLinear(numberofworks,1,50) > colorize(numberofworks,gray,black) > for n in g.nodes: \[tab\] n.strokecolor = n.color > resizeLinear(numberofcoauthoredworks, .25, 8) > colorize(numberofcoauthoredworks, "127,193,65,255", black) > nodesbynumworks = g.nodes\[:\] > def bynumworks(n1, n2): \[tab\] return cmp(n1.numberofworks, n2.numberofworks) > nodesbynumworks.sort(bynumworks) > nodesbynumworks.reverse() > for i in range(0, 50): \[tab\] nodesbynumworks\[i\].labelvisible = true
In the resulting visualization, author nodes are color and size coded by the number of papers per author. Edges are color and thickness coded by the number of times two authors wrote a paper together. The remaining commands identify the top 50 authors with the most papers and make their name labels visible.
Figure 5.13: Undirected, weighted co-author network for 'FourNetSciResearchers' dataset