CIShell Manual : Extract Word Co-Occurrence Network
This page last changed on Jan 11, 2011 by barbosaa.
DescriptionExtract Word Co-Occurrence Network creates a weighted network where each node is a word and edges connect words to each other, where the strength of an edge represents how often two words occur in the same body of text together. This algorithm is a shortcut for extracting a directed network using Extract Directed Network, and then performing bibliographic coupling using Extract Reference Co-Occurrence (Bibliographic Coupling) Network. Parameters
Implementation DetailsDue to this algorithm's use of Extract Directed Network, there is an unfortunate residual effect present, where the titles of the original papers (or whatever your original records are) are present as nodes in the network. Fortunately these nodes are listed as occurring 0 times, so they can be removed by running Extract Nodes Above or Below Value, and keeping all nodes which occur more than zero times. Usage HintsIt may be useful to run DrL (VxOrd) on the resulting network, which will lay out of the nodes in a force-directed manner, so words which are similar will be relatively close to each other. Also, this extraction will produce edges between all words which ever occur together, which can lead to a number of edges which is difficult and costly to visualize or otherwise manipulate, so after running DrL (VxOrd), or before doing any other processing, it is often a good idea to run Extract Edges Above or Below Value, and remove all but the strongest edges, until the number of edges reaches a manageable number. You can determine the number of edges in a graph by running the Network Analysis Toolkit. Links |
![]() |
Document generated by Confluence on May 31, 2011 16:37 |