Child pages
  • 5.1.4 Studying Four Major NetSci Researchers (ISI Data)

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

In the Sci2 Tool, select "361 unique ISI Records" from the 'FourNetSciResearchers' dataset in the Data Manager. Run 'Preprocessing > Topical > Lowercase, Tokenize, Stem, and Stopword Text' using the following parameters:

Image RemovedImage Added

Text normalization utilizes the Standard Analyzer provided by Lucene (http://lucene.apache.org|). It separates text into word tokens, normalizes word tokens to lower case, removes "s" from the end of words, removes dots from acronyms, deletes stop words, and applies the English Snowball stemmer (http://snowball.tartarus.org/algorithms/english/stemmer.html), which is a version of the Porter2 stemmer designed for the English language..

The result is a derived table – "with normalized Abstract" – in which the text in the abstract column is normalized. Select this table and run 'Data Preparation > Extract Word Co-Occurrence Network' using parameters:

Image RemovedImage Added

The outcome is a network in which nodes represent words and edges and denote their joint appearance in a paper. Word co-occurrence networks are rather large and dense. Running the 'Analysis > Networks > Network Analysis Toolkit (NAT)' reveals that the network has 2,821 word nodes and 242,385 co-occurrence edges.

...

The result is one giant component with 2,467 nodes and 242,385 edges. To visualize this rather large network, begin by running 'Visualization > Networks > DrL (VxOrd)' with default values:

Image RemovedImage Added
Note that the DrL algorithm requires extensive data processing and requires a bit of time to run, even on powerful systems. See the console window for details:

To keep only the strongest edges in the "Laid out with DrL" network, run 'Preprocessing > Networks > Extract Top Edges' on the new network using the following parameters:

...