Child pages
  • 5.1.4 Studying Four Major NetSci Researchers (ISI Data)

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

5.1.4.1 Paper-Paper (Citation) Network

Load the file 'file using 'File > Load' and following this path: 'yoursci2directory/sampledata/scientometrics/isi/FourNetSciResearchers.isi' using 'File > Load.'  A table of all records and a table of 361 records with unique ISI ids will appear in the Data Manager. In this "clean" file, each original record now has a "Cite Me As" attribute that is constructed from the first author, publication year (PY), journal abbreviation (J9), volume (VL), and beginning page (BP) fields of its ISI record. This "Cite Me As" attribute will be used when matching paper and reference records.

...

To view the complete network, select the "Network with directed edges from Cited References to Cite Me As" in the Data Manager and run 'Visualization > Networks > GUESS' and wait until the network is visible and centered. Because the FourNetSciResearchers dataset is so large, the visualization will take some time to load, even on powerful systems.

...

To produce a co-authorship network in the Sci2 Tool, select the table of all 361 unique ISI records from the 'FourNetSciResearchers' dataset in the Data Manager window. Run 'Data Preparation > Text Files > Extract Co-Author Network' using the parameter:

...

The result is two derived files in the Data Manager window: the "Extracted Co-Authorship Network" and an "Author information" table (also known as a "merge table"), which lists unique authors. In order to manually examine and edit the list of unique authors, open the merge table in your default spreadsheet program. In the spreadsheet, select all records, including "label," "timesCited," "numberOfWorks," "uniqueIndex," and "combineValues," and sort by "label." Identify names that refer to the same person. In order to merge two names, first delete the asterisk ('*') in the "combineValues" column of the duplicate node's row. Then, copy the "uniqueIndex" of the name that should be kept and paste it into the cell of the name that should be deleted. Resave the revised table as a .csv file and reload it. Select both the merge table and the network and run 'Data Preparation > Text Files > Update Network by Merging Nodes'. Table 5.2 shows the result of merging "Albet, R" and "Albert, R": "Albet, R" will be deleted and all of the node linkages and citation counts will be added to "Albert, R".

...

A merge table can be automatically generated by applying the Jaro distance metric (Jaro, 1989, 1995) available in the open source Similarity Measure Library (http://sourceforge.net/projects/simmetrics/ to identify potential duplicates. In the Sci2 Tool, simply select the co-author network and run 'Data Preparation  > Text Files > Detect Duplicate Nodes'. using the parameters:

...

In sum, unification of author names can be done manually or automatically, independently or in conjunction with other data manipulation. It is recommended that users create the initial merge table automatically and fine-tune it as needed. Note that the same procedure can be used to identify duplicate references – simply select a paper-citation network and run 'Data Preparation > Text Files > Detect Duplicate Nodes' using the same parameters as above and a merge table for references will be created.

To merge identified duplicate nodes, select both the "Extracted Co-Authorship Network" and "Merge Table: based on label" by holding down the 'Ctrl' key. Run 'Data Preparation > Text Files > Update Network by Merging Nodes'. This will produce an updated network as well as a report describing which nodes were merged. To complete this workflow, an aggregation function file must also be selected from the pop-up window:

...

In Sci2, a bibliographic coupling network is derived from a directed paper citation network (see section 4.9.1.1. Document-Document (Citation) Network).

Load the file 'file using 'File > Load' and following this path:  'yoursci2directory/sampledata/scientometrics/isi/FourNetSciResearchers.isi' using 'File > Load.' Choose "ISI scholarly format" in the pop-up 'Load'  window. A table of all records and a table of 361 records with unique ISI ids will appear in the Data Manager.

Select the "361 Unique ISI Records" in the Data Manager and run 'Data Preparation > Text Files > Extract Paper Citation Network.' Select "Extracted Paper Citation Network" and run 'Data Data Preparation > Text Files > ExtractReference Extract Reference Co-Occurrence (Bibliographic Coupling) Network.'

Running 'Analysis > Networks > Network Analysis Toolkit (NAT)' reveals that the network has 5,342 nodes (5,013 of which are isolate nodes) and 6,277 edges.

...

5.1.4.4 Document Co-Citation Network (DCA)

Load the file 'file using 'Load > File'' and following this path: 'yoursci2directory/sampledata/scientometrics/isi/FourNetSciResearchers.isi' using 'File > Load.' Choose "ISI scholarly format" in the pop-up 'Load'  window. A table of all records and a table of 361 records with unique ISI ids will appear in the Data Manager.

Select the "361 Unique ISI Records" and run 'Data Preparation > Text Files > Extract Document Co-Citation Network.' The co-citation network will have 5,335 nodes (213 of which are isolates) and 193,039 edges. Isolates can be removed by running 'Preprocessing > Networks > Delete Isolates.' The resulting network has 5122 nodes and 193,039 edges – and is too dense for display in GUESS. Edges with low weights can be eliminated by running 'Preprocessing > Networks > Extract Edges Above or Below Value' with parameter values:
     Extract from this number: 4
     Below?: # leave unchecked
     Numeric Attribute: weight

...

The result is a derived table – "with normalized Abstract" – in which the text in the abstract column is normalized. Select this table and run 'Data Preparation > Text Files > Extract Word Co-Occurrence Network' using parameters:

...

There are 354 isolated nodes that can be removed by running 'Preprocessing > Networks > Delete Isolates' on the Co-Word Occurrence network. Note that when isolates are removed, papers without abstracts are removed along with the keywords.

...