Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Table of Contents

testinggg

5.1.1 Mapping Collaboration, Publication, and Funding Profiles of One Researcher (EndNote and NSF Data)

...

Many researchers, tools, and online services use EndNote to organize their bibliographies. To analyze an individual researcher's collaboration and publication profile, load an EndNote file which includes the researcher's entire CV into the Sci2 Tool. To generate a research profile for Katy Börner, load Katy Börner's EndNote CV at 'yoursci2directory/sampledata/scientometrics/endnote/KatyBorner.enw' and run 'Data Preparation > Text Files > Extract Co-Author Network' using the parameter:

After generating Dr. Börner's co-authorship network, run 'Analysis > Networks > Unweighted & Undirected > Node Degree' to append degree information to each node. To visualize the network, run 'Visualization > Networks > GUESS' and select 'GEM' in the Layout menu once the graph is fully loaded.. The resulting network in Figure 5.1 was modified using the following workflow:

...

Load the NSF data 'yoursci2directory/sampledata/scientometrics/nsf/KatyBorner.nsf' using 'File > Load'. Select NSF csv format from the 'Load' pop-up window. Make sure the loaded dataset in the Data Manager window is highlighted in blue, and run 'Data Preparation > Text Files > Extract Co-Occurrence Network' using these parameters:

Select the "Extracted Network on Column All Investigators" network and run 'Analysis >Networks > Network Analysis Toolkit (NAT)' to reveal that there are 13 nodes and 28 edges without isolates in the network. Click on "Extracted Network on Column All Investigators" and select 'Visualization > Networks > GUESS' to visualize the resulting Co-PI network. Select 'GEM' from the layout menu.

...

The Sci2 Tool supports the analysis of evolving networks. For this study, load Alessandro Vespignani's publication history from ISI, which can be downloaded from Thomson's Web of Science or loaded from 'yoursci2directory/sampledata/scientometrics/isi/AlessandroVespignani.isi'using 'File > Load' and select 'ISI scholarly format' in the Load window. Slice the data into five year intervals from 1990-2006 using 'Preprocessing > Temporal > Slice Table by Time' and the following parameters:

...

To extract the paper citation network, select the '361 Unique ISI Records' table and run 'Data Preparation > Text Files > Extract Directed Network' using the parameters :

The result is a directed network of paper citations in the Data Manager. Each paper node has two citation counts. The local citation count (LCC) indicates how often a paper was cited by papers in the set. The global citation count (GCC) equals the times cited (TC) value in the original ISI file. Only references from other ISI records count towards an ISI paper's GCC value. Currently, the Sci2 Tool sets the GCC of references to -1 (except for references that are not also ISI records) to prune the network to contain only the original ISI records.

...

The complete network can be reduced to papers that appeared in the original ISI file by deleting all nodes that have a GCC of -1. Simply run 'Preprocessing > Networks > Extract Nodes Above or Below Value' with parameter values:

The resulting network is unconnected, i.e., it has many subnetworks many of which have only one node. These single unconnected nodes, also called isolates, can be removed using 'Preprocessing > Networks > Delete Isolates'. Deleting isolates is a memory intensive procedure. If you experience problems at this step, refer to Section 3.3 Memory Allocation.

...

The result is two derived files in the Data Manager window: the "Extracted Co-Authorship Network" and an "Author information" table (also known as a "merge table"), which lists unique authors. In order to manually examine and edit the list of unique authors, open the merge table in your default spreadsheet program. In the spreadsheet, select all records, including "label," "timesCited," "numberOfWorks," "uniqueIndex," and "combineValues," and sort by "label." Identify names that refer to the same person. In order to merge two names, first delete the asterisk ('*') in the "combineValues" column of the duplicate node's row. Then, copy the "uniqueIndex" of the name that should be kept and paste it into the cell of the name that should be deleted. Resave the revised table as a .csv file and reload it. Select both the merge table and the network and run Data Preparation > Text Files > Update Network by Merging Nodes. Table 5.2 shows the result of merging "Albet, R" and "Albert, R": "Albet, R" will be deleted and all of the node linkages and citation counts will be added to "Albert, R".

...

A merge table can be automatically generated by applying the Jaro distance metric (Jaro, 1989, 1995) available in the open source Similarity Measure Library (http://sourceforge.net/projects/simmetrics/) to identify potential duplicates. In the Sci2 Tool, simply select the co-author network and run 'Data Preparation > Text Files > Detect Duplicate Nodes' using the parameters:

The result is a merge table that has the very same format as Table 5.2, together with two textual log files:

...

To merge identified duplicate nodes, select both the "Extracted Co-Authorship Network" and "Merge Table: based on label" by holding down the 'Ctrl' key. Run 'Data Preparation > Text Files >Update Network by Merging Nodes'. This will produce an updated network as well as a report describing which nodes were merged. To complete this workflow, an aggregation function file must also be selected from the pop-up window:

...

Isolate nodes can be removed running 'Preprocessing > Networks >Delete Isolates'. The resulting network has 242 nodes and 1,534 edges in 12 weakly connected components.

...

To keep only the strongest edges in the "Laid out with DrL" network, run 'Preprocessing > Networks > Extract Top Edges' on the new network using the following parameters:

...