Child pages
  • 5.1.4 Studying Four Major NetSci Researchers (ISI Data)

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Note that only the top 1000 edges (by weight) in this large network appear in the above visualization, creating the impression of isolate nodes. To remove nodes that are not connected by the top 1000 edges (by weight), run 'Preprocessing > Networks > Delete Isolates' on the "top 1000 edges by weight" network and visualize the result using the workflow described above.

Database Extractions

Note
titleExtended Version

This workflow uses the extended version of the Sci2 Tool. To know how to extend Sci2 view Section 3.2 Additional Plugins.

The Sci2 Tool supports the creation of databases from ISI files. Database loading improves the speed and functionality of data preparation and preprocessing. While the initial loading can take quite some time for larger datasets (see sections 3.4 Memory Allocation and 3.5 Memory Limits) it results in vastly faster and more powerful data processing and extraction.

...

As before, it is important to clean the database before running any extractions by merging and matching authors, journals, and references. Run 'Data Preparation > Database > ISI > Merge Identical ISI People', followed by 'Data Preparation > Database > ISI > Merge Document Sources'' and 'Data Preparation > Database > ISI > Match References to Papers'. Make sure to wait until each cleaning step is complete before beginning the next one.

...

Extracting different tables will provide different views of the data. Run 'Data Preparation > Database > ISI > Extract Authors' to view all the authors from FourNetSciResearchers.isi. The table includes the number of papers each person in the dataset authored, their Global Citation Count (how many times they have been cited according to ISI), and their Local Citation Count (how many times they were cited in the current dataset.)
The queries can also output data specifically tailored for the burst detection algorithm (see section 4.6.1 Burst Detection).
Run 'Data Preparation > Database > ISI > Extract References by Year for Burst Detection' on the cleaned "with references and papers matched" database, followed by 'Analysis > Topical > Burst Detection' with the following parameters:

...

Visualize the burst analysis with 'Visualization > Temporal > Horizontal Bar Graph' with the following parameters:

...

The largest speed increases from the database functionality can be found in the extraction of networks. First, compare the results of a co-authorship extraction with those from section #55.1.4.2 Author Co-Occurrence (Co-Author) Network. Run 'Data Preparation > Database > ISI > Extract Co-Author Network' followed by 'Analysis > Networks > Network Analysis Toolkit (NAT)'. Notice that both networks have 247 nodes and 891 edges. Visualize the extracted co-author network in GUESS using 'Visualization > Networks > GUESS' and reformat the visualization using 'Layout > GEM' and 'Layout > Bin Pack.' To apply the default co-authorship theme, go to 'Script > Run Script' and find 'yoursci2directory/scripts/GUESS/co-author-nw_database.py'. The resulting network will look like Figure 5.21.

...