|Table of Contents|
Indiana University, University of Rome, Yale University, Leiden University, International Center for Theoretical Physics, University of Paris-Sud
Informatics, Complex Network Science and System Research, Physics, Statistics, Epidemics
18.104.22.168 Burst Detection
A scholarly dataset can be understood as a discrete time series: in other words, a sequence of events/ observations which are ordered in one dimension – time. Observations exist for regularly spaced intervals, e.g., each month or year.
The burst detection algorithm (see Section 4.6.1 Burst Detection) identifies sudden increases or "bursts" in the frequency-of-use of character strings over time. This algorithm identifies topics, terms, or concepts important to the events being studied that increased in usage, were more active for a period of time, and then faded away.
Load Alessandro Vespignani's ISI publication history using 'File > Load' and following this path: 'yoursci2directory/sampledata/scientometrics/isi/AlessandroVespignani.isi' (if the file is not in the sample data directory it can be downloaded from 2.5 Sample Datasets).
New ISI File Format
Web of Science made a change to their output format in September, 2011. Older versions of Sci2 tool (older than v0.5.2 alpha) may refuse to load these new files, with an error like "Invalid ISI format file selected."
If you are using an older version of the Sci2 tool, you can download the WOS-plugins.zip file and unzip the JAR files into your sci2/plugins/ directory. Restart Sci2 to activate the fixes. You can now load the downloaded ISI files into the Sci2 without any additional step. If you are using the old Sci2 tool you will need to follow the guidelines below before you can load the new WOS format file into the tool.
You can fix this problem for individual files by opening them in Notepad (or your favorite text editor). The file will start with the words:
Just add the word ISI.
And then Save the file.
The ISI file should now load properly. More information on the ISI file format is available here (http://wiki.cns.iu.edu/display/CISHELL/ISI+%28*.isi%29).
The "Gamma" parameter is the value that state transition costs are proportional to. This parameter is used to control how ease easy the automaton can change states. The higher the "Gamma" value, the smaller the list of bursts generated.
The "Text Column" parameter is the name of the column with values (delimiter and tokens) to be computed for bursting results.
And the "End" field indicates when the burst stopped. A An empty value in the "End" field indicates that the burst lasted until the last date present in the dataset. Where the "End" field is empty, put manually add the last year present in the dataset. In ; in this case, 2006.
After you manually add manually this information, save this .csv file somewhere in your computer. Load back this Reload the .csv file into Sci2 using 'File > Load'. Select 'Standart Standard csv format' int the pop-up window. A new table will appear in the Data Manager. To visualize these the table that contains the results of the Burst Detection algorithm, select the table you just loaded in the Data Manager and run 'Visualization > Temporal > Temporal Bar Graph' with the following parameters:
Horizontal Temporal bar graphs are used to visualize numeric data over time, generating labeled horizontal bars. A PostScript file containing the horizontal bar graph will appear in the Data Manager.
Again where the "End" field is empty, put manually add the last year present in the dataset. In ; in this case, 2006.
After you manually add manually this information, save this .csv file somewhere in your computer. Load back Reload this .csv file into Sci2 using 'File > Load'. Select 'Standart Standard csv format' int the pop-up window. A new table will appear in the Data Manager. To visualize these table that contains these new results for the Burst Detection algorithm, select the table you just loaded in the Data Manager and run 'Visualization > Temporal > Horizontal Bar Graph (not included version)' with the same parameters.
As expected, a larger number of bursts appear, and the new bursts have a smaller weight that those depicted in the first graph. These smaller, more numerous bursting terms permit a more detailed view of the dataset and allow the identification of trends. The "protein" burst starting in 2003, for example, indicates the year in which Alessandro Vespignani started to work with "protein-protein interaction networks," while the burst "epidem" - also from 2001 - is related to the application of complex networks to the analysis of epidemic phenomena in biological networks.
The original dataset for Alessandro Vespignani was created in 2006. If you wish to update the dataset to gain an understanding for how his research has changed and evolved since 2006 you can obtain a new dataset from the from Web of Science, see 22.214.171.124 ISI Web of Science. However, another way to obtain an individual researcher's publication information is to use their Google Scholar profile, if they have one. One of the biggest benefits to using a Google Scholar profile is that you will get publications not indexed in Web of Science, such as some book chapters. In this example, we will obtain the publication information for Alessandro Vespignani using Google Scholar:
Open Google Scholar in a web browser and search for "Alessandro Vespignani":
If the author or investigator you have searched for a Google Scholar profile, you will see a link to their profile at the top of the results page:
Keep in mind that not every author you search will necessarily have a Google Scholar profile, but for those that do, this is a very useful way to get their publication information. Click on the link to view Alessandro Vespignani's profile, and then select all publications and click the export button at the top of his publication list to export the citation information:
The easiest way to import the citation data into Sci2 is to export the data as a CSV file:
After you have specified the export format you can save the CSV file to your desired location by clicking the "Export all articles by Alessandro Vespignani" button. Save the file to your desktop and then load it into Sci2 in the standard CSV format:
Once the data is in Sci2, you will need to normalize the text for the titles before you can run Burst Detection. Run 'Preprocessing > Topical > Lowercase, Tokenize, Stem, and Stopword Text' and select the title parameter:
After you normalize the text for the title field you will notice a "with normalized Title" file in the data manager. You will likely need to edit this file before you can run Burst Detection. Right click on the file in the data manager and select view:
This will open the dataset in Excel (or you preferred spreadsheet editor). You will notice that the Lowercase, Tokenize, Stem, and Stopword Text algorithm has place brackets around the years. You will need to remove these before you can run the Burst Detection algorithm. In Excel, hit 'Ctrl-F' on the keyboard. This will bring up the Find and Replace tool. Highlight the column of years and then perform a find and replace:
You will have to repeat this for the other bracket symbol. This will essentially allow you remove the brackets around the years. Next you will need to remove those publications for which there is no year information. Burst Detection will not run if there are empty values in the date column. You can search for the publications and find the proper date, but the year value could be empty because these are forthcoming publications. In this example, we will just remove all publications without a value in the year column:
You will need to save this file to your desktop and re-load it into Sci2. Then, select the file you have just loaded and run 'Analysis > Topical > Burst Detection' and enter the following parameters:
This will result in a "Burst detection analysis (Year, Title): maximum burst level 1" file in the data manager Right click on this file to view the data:
You will need to edit the data before you can run the Temporal Bar Graph algorithm to visualize the results of the burst detection. First, you should make sure every record has an "End" date or the Temporal Bar Graph will not run properly. We know that this dataset contains records that are labeled with the year of 2013, so that will be our end date for those bursts that are still continuing:
Before you can visualize the results with the Temporal Bar Graph it is important to know that if you want to size bars based on weight, the weight value will be distributed across the length of the burst. In other words, the total area of the bar corresponds to the weight value. This means you can have a bar with a high weight value that appears thinner, compared to bar with a lower weight value if the former burst occurs over a longer period than the latter. Finally, before you visualize this dataset, you can add some categories to allow you to color your bars. For example you can sort the records from largest to smallest based on the "total weight" column and assign strong, medium, and weak categories to these records based on the "total weight" values:
Now, save the file to your desktop and reload it into Sci2 in the standard CSV format and run 'Visualization > Temporal > Temporal Bar Graph', entering the following parameters:
Note that if you select the "Simplified Layout" option no legend will be created for the map. This allows you to create your own legend that will be accurate based creating new weight values. To learn how to create a legend for your visualization see 2.4 Saving Visualizations for Publication.
To view the visualization, save the file from the data manager by right-clicking and selecting save:
Make sure to save the visualization as a PostScript file:
Save the PostScript file to your desktop, and if you have a version of the Adobe Creative Suite on your machine you can simply double-click the PostScript file to launch Adobe Distiller and automatically convert the PostScript file into a PDF for viewing. However, if you do not have a copy of the Adobe Creative Suite installed on your machine, you can use an online version of GhostScript to convert PostScript files to PDF files: http://ps2pdf.com/. The resulting visualization should look similar to the following:
Remember that the weight for the bars is equal to the total area, not simply the thickness. So, including the color categories will help users make more sense of the visualization. You notice that this burst analysis for Alessandro Vesipignani's publications looks similar to the one created in the previous section. However, this new burst analysis takes into consideration his more recent publications and interests in human mobility networks and epidemiology. This workflow can easily be repeated using any author who has a profile in Google Scholar. Give it a try for yourself!
126.96.36.199 Visualizing Burst Detection in Excel
Its possible to generate a visualization for burst analysis in MS Excel. For this, open the results of the first burst analysis conducted ('Burst detection analysis (Publication Year, Title): maximum burst level 1') in MS Excel, by right clicking on this table in the Data Manager and selecting View.
4. Once both formatting rules have been established, select 'Conditional Formatting > Manage Rules', highlight the first formatting rule and move to the top of the list:
5. Make sure both formatting rules are selected and apply them to current selection. Apply the format to all cells in the word by year matrix by dragging the box around cell G2 to highlight all cells in the matrix. The result for the given example is shown in Figure 5.33
Figure 5.32.1: Visualizing burst results in MS Excel