Child pages
  • 5.1.4 Studying Four Major NetSci Researchers (ISI Data)

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Migrated to Confluence 5.3

...

Info
titleAggregate Function File

Make sure to use the aggregate function file indicated in the image below. Aggregate function files can be found in sci2/sampledata/scientometrics/properties.

The result is a directed network of paper citations in the Data Manager. Each paper node has two citation counts. The local citation count (LCC) indicates how often a paper was cited by papers in the set. The global citation count (GCC) equals the times cited (TC) value in the original ISI file. Only references from other ISI records count towards an ISI paper's GCC value. Currently, the Sci2Sci2 Tool sets the GCC of references to -1 (except for references that are not also ISI records) to prune the network to contain only the original ISI records.

...

The complete network can be reduced to papers that appeared in the original ISI file by deleting all nodes that have a GCC of -1. Simply run 'Preprocessing > Networks > Extract Nodes Above or Below Value' with parameter values:

Image RemovedImage Added

The resulting network is unconnected, i.e., it has many subnetworks many of which have only one node. These single unconnected nodes, also called isolates, can be removed using 'Preprocessing > Networks > Delete Isolates'. Deleting isolates is a memory intensive procedure. If you experience problems at this step, refer to Section 3.4 Memory Allocation.

...

Anchor
5.1.4.2 Author Co-Occurrence (Co-Author) Network
5.1.4.2 Author Co-Occurrence (Co-Author) Network
5.1.4.2 Author Co-Occurrence (Co-Author) Network

To produce a co-authorship network in the Sci2Sci2 Tool, select the table of all 361 unique ISI records from the 'FourNetSciResearchers' dataset in the Data Manager window. Run 'Data Preparation > Extract Co-Author Network' using the parameter:

...

Table 5.2: Merging of author nodes using the merge table

A merge table can be automatically generated by applying the Jaro distance metric (Jaro, 1989, 1995) available in the open source Similarity Measure Library (http://sourceforge.net/projects/simmetrics/)  to   to identify potential duplicates. In the Sci2Sci2 Tool, simply select the co-author network and run 'Data Preparation > Detect Duplicate Nodes'. using the parameters:

...

Visualize this second output file with 'Visualization > General > GnuPlot':

Community Detection

Community Detection algorithms look for subgraphs where nodes are highly interconnected among themselves and poorly connected with nodes outside the subgraph. Many community detection algorithms are based on the optimization of the modularity - a scalar value between -1 and 1 that measures the density of links inside communities as compared to links between communities. The Blondel Community Detection finds high modularity partitions of large networks in short time and that unfolds a complete hierarchical community structure for the network, thereby giving access to different resolutions of community detection.

...

5.1.4.3 Cited Reference Co-Occurrence (Bibliographic Coupling) Network

In Sci2Sci2, a bibliographic coupling network is derived from a directed paper citation network (see section 4.9.1.1.1 Document-Document (Citation) Network).

...

Note

In the Sci2Sci2 Tool, select "361 unique ISI Records" from the 'FourNetSciResearchers' dataset in the Data Manager. Run 'Preprocessing > Topical > Lowercase, Tokenize, Stem, and Stopword Text' using the following parameters:

...

Warning

The database plugin is not currently available for the most recent version of Sci2 (v1.0 aplpha). However, the plugin that allows files to be loaded as databases is available for Sci2 v0.5.2 alpha or older. Please check the Sci2 news page (https://sci2.cns.iu.edu/user/news.php). We will update this page when a database plugin becomes available for the latest version of the tool.

The Sci2Sci2 Tool supports the creation of databases from ISI files. Database loading improves the speed and functionality of data preparation and preprocessing. While the initial loading can take quite some time for larger datasets (see sections 3.4 Memory Allocation and 3.5 Memory Limits) it results in vastly faster and more powerful data processing and extraction.

...

Figure 5.21: Longitudinal study of 'FourNetSciResearchers,' visualized in GUESS

Using Sci2Sci2's database functionality allows for several network extractions that cannot be achieved with the text-based algorithms. For example, extracting journal co-citation networks reveals which journals are cited together most frequently. Run 'Data Preparation > Database > ISI > Extract Document Co-Citation Network (Core and References)' on the database to create a network of co-cited journals, and then prune it using 'Preprocessing > Networks > Extract Edges Above or Below Value' with the parameters:

...