Make sure to use the aggregate function file indicated in the image below. Aggregate function files can be found in sci2/sampledata/scientometrics/properties.
The result is a directed network of paper citations in the Data Manager. Each paper node has two citation counts. The local citation count (LCC) indicates how often a paper was cited by papers in the set. The global citation count (GCC) equals the times cited (TC) value in the original ISI file. Only references from other ISI records count towards an ISI paper's GCC value. Currently, the Sci2Sci2 Tool sets the GCC of references to -1 (except for references that are not also ISI records) to prune the network to contain only the original ISI records.
- Resize Linear > Nodes > globalcitationcount> From: 1 To: 50 > When the nodes have no 'globalcitationcount': 0.1 > Do Resize Linear
- Colorize > Nodes > globalcitationcount > From: > (When the nodes have no 'globalcitationcount': 0.1 > >Do Colorize) To:
- Colorize > Edges > weight > From (select the "RGB" tab) 127, 193, 65 To: (select the "RGB" tab) 0, 0, 0
- Type in Interpreter:
The complete network can be reduced to papers that appeared in the original ISI file by deleting all nodes that have a GCC of -1. Simply run 'Preprocessing > Networks > Extract Nodes Above or Below Value' with parameter values:
The resulting network is unconnected, i.e., it has many subnetworks many of which have only one node. These single unconnected nodes, also called isolates, can be removed using 'Preprocessing > Networks > Delete Isolates'. Deleting isolates is a memory intensive procedure. If you experience problems at this step, refer to Section 3.4 Memory Allocation.
To produce a co-authorship network in the Sci2Sci2 Tool, select the table of all 361 unique ISI records from the 'FourNetSciResearchers' dataset in the Data Manager window. Run 'Data Preparation > Extract Co-Author Network' using the parameter:
Table 5.2: Merging of author nodes using the merge table
A merge table can be automatically generated by applying the Jaro distance metric (Jaro, 1989, 1995) available in the open source Similarity Measure Library (http://sourceforge.net/projects/simmetrics/) to to identify potential duplicates. In the Sci2Sci2 Tool, simply select the co-author network and run 'Data Preparation > Detect Duplicate Nodes'. using the parameters:
This second type of output file is particularly suitable to study skewed distributions: the fact that the size of the bins grows large for large degree values compensates for the fact that not many nodes have high degree values, so it suppresses the fluctuations that one would observe by using bins of equal size. On a double logarithmic scale, which is very useful to determine the possible power law behavior of the distribution, the points of the latter will appear equally spaced on the x-axis.
Visualize also this second output file with 'Visualization > General > GnuPlot':
Community Detection algorithms look for subgraphs where nodes are highly interconnected among themselves and poorly connected with nodes outside the subgraph. Many community detection algorithms are based on the optimization of the modularity - a scalar value between -1 and 1 that measures the density of links inside communities as compared to links between communities. The Blondel Community Detection finds high modularity partitions of large networks in short time and that unfolds a complete hierarchical community structure for the network, thereby giving access to different resolutions of community detection.
22.214.171.124 Cited Reference Co-Occurrence (Bibliographic Coupling) Network
In Sci2Sci2, a bibliographic coupling network is derived from a directed paper citation network (see section 126.96.36.199.1 Document-Document (Citation) Network).
In the Sci2Sci2 Tool, select "361 unique ISI Records" from the 'FourNetSciResearchers' dataset in the Data Manager. Run 'Preprocessing > Topical > Lowercase, Tokenize, Stem, and Stopword Text' using the following parameters:
The database plugin is not currently available for the most recent version of Sci2 (v1.0 aplpha). However, the plugin that allows files to be loaded as databases is available for Sci2 v0.5.2 alpha or older. Please check the Sci2 news page (https://sci2.cns.iu.edu/user/news.php). We will update this page when a database plugin becomes available for the latest version of the tool.
The Sci2Sci2 Tool supports the creation of databases from ISI files. Database loading improves the speed and functionality of data preparation and preprocessing. While the initial loading can take quite some time for larger datasets (see sections 3.4 Memory Allocation and 3.5 Memory Limits) it results in vastly faster and more powerful data processing and extraction.
View the file "Burst detection analysis (Publication Year, Reference): maximum burst level 1". On a PC running Windows, right click on this table and select view to see the data in Excel. On a Mac or a Linux system, right click and save the file, then open using the spreadsheet program of your choice. See Burst Detection for the meaning of each field in the output.
A An empty value in the "End" field indicates that the burst lasted until the last date present in the dataset. Where the "End" field is empty, put manually add the last year present in the dataset. In this case, 2007.
After you manually add manually this information, save this .csv file somewhere in your computer. Load back this .csv file into Sci2 using 'File > Load'. Select 'Standart csv format' int the pop-up window. A new table will appear in the Data Manager. To visualize these this table that contains the results of the Burst Detection algorithm, select the table you just loaded in the Data Manager and run 'Visualization > Temporal > Horizontal Bar Graph' with the following parameters:
Figure 5.21: Longitudinal study of 'FourNetSciResearchers,' visualized in GUESS
Using Sci2Sci2's database functionality allows for several network extractions that cannot be achieved with the text-based algorithms. For example, extracting journal co-citation networks reveals which journals are cited together most frequently. Run 'Data Preparation > Database > ISI > Extract Document Co-Citation Network (Core and References)' on the database to create a network of co-cited journals, and then prune it using 'Preprocessing > Networks > Extract Edges Above or Below Value' with the parameters: