Indiana University, University of Rome, Yale University, Leiden University, International Center for Theoretical Physics, University of Paris-Sud
Informatics, Complex Network Science and System Research, Physics, Statistics, Epidemics
A scholarly dataset can be understood as a discrete time series: in other words, a sequence of events/ observations which are ordered in one dimension – time. Observations exist for regularly spaced intervals, e.g., each month or year.
The burst detection algorithm (see Section 4.6.1 Burst Detection) identifies sudden increases or "bursts" in the frequency-of-use of character strings over time. This algorithm identifies topics, terms, or concepts important to the events being studied that increased in usage, were more active for a period of time, and then faded away.
An analysis of publications authored or co-authored by Alessandro Vespignani from 1990 to 2006 will be used to illustrate the "burst" concept. Alessandro Vespignani is an Italian physicist and Professor of Informatics and Cognitive Science at Indiana University, Bloomington. In his publications, it is possible to see a change in research focus - from Physics to Complex Networks - beginning in 2001.
Load Alessandro Vespignani's ISI publication history using 'File > Load' and following this path: 'yoursci2directory/sampledata/scientometrics/isi/AlessandroVespignani.isi' (if the file is not in the sample data directory it can be downloaded from 2.5 Sample Datasets).
This analysis will detect the "bursty" terms used in the title of papers in the dataset. Since the burst detection algorithm is case-sensitive, it is necessary to normalize the field to be analyzed before running the algorithm. Select the table "101 Unique ISI Records" and run 'Preprocessing > Topical > Lowercase, Tokenize, Stem, and Stopword Text.' Check the "Title" box to indicate that you want to normalize this field:
Select the resulting "with normalized Title" table in the Data Manager and run 'Analysis > Topical > Burst Detection' with the following parameters:
The "Gamma" parameter is the value that state transition costs are proportional to. This parameter is used to control how ease the automaton can change states. The higher the "Gamma" value, the smaller the list of bursts generated.
The "Density Scaling" parameter determines how much 'more bursty' each level is beyond the previous one. The higher the scaling value, the more active (bursty) the event happens in each level.
The "Bursting States" parameter determines how many bursting states there will be, beyond the non-bursting state. An i value of bursting states is equals to i + 1 automaton states.
The "Date Column" parameter is the name of the column with date/time when the events / topics happens.
The "Date Format" specifies how the date column will be interpreted as a date/time. See http://java.sun.com/j2se/1.4.2/docs/api/java/text/SimpleDateFormat.html for details.
The "Text Column" parameter is the name of the column with values (delimiter and tokens) to be computed for bursting results.
The "Text Separator" parameters determines the separator that was used to delimit the tokens in the text column.
View the file "Burst detection analysis (Publication Year, Title): maximum burst level 1". On a PC running Windows, right click on this table and select view to see the data in Excel. On a Mac or a Linux system, right click and save the file, then open using the spreadsheet program of your choice.
In this table, there are six columns: "Word," "Level," "Weight," "Length," "Start," and "End."
The "Word" field identifies the specific character string which was detected as a "burst." The "Length" field indicates how long the burst lasted (over the selected time parameter).
The "Level" is the burst level of this burst. The higher burst level, the more frequent the event / topic happens.
The "Weight" field is the weight of this burst between its "Length". A higher weight could be resulted by the longer "Length", the higher "Level" or both.
The "Length" is the period of the burst. It is generated based on (Start - End + 1).
The "Start" field identifies when the burst began (again, according to the specified time parameter).
And the "End" field indicates when the burst stopped. A empty value in the "End" field indicates that the burst lasted until the last date present in the dataset. Where the "End" field is empty, put manually the last year present in the dataset. In this case, 2006.
After you add manually this information, save this .csv file somewhere in your computer. Load back this .csv file into Sci2 using 'File > Load'. Select 'Standart csv format' int the pop-up window. A new table will appear in the Data Manager. To visualize these table that contains the results of the Burst Detection algorithm, select the table you just loaded in the Data Manager and run 'Visualization > Temporal > Temporal Bar Graph' with the following parameters:
Horizontal bar graphs are used to visualize numeric data over time, generating labeled horizontal bars. A PostScript file containing the horizontal bar graph will appear in the Data Manager.
Open and view the file using the workflow from Section 2.4 Saving Visualizations for Publication.
The resulting analysis indicates a change in the research focus of Alessandro Vespignani for publications beginning in 2001. For example, the bursting terms "fractal," "growth," "transform," and "fix" starting at 1990 are related to Vespignani's Ph.D., entitled "Fractal Growth and Self-Organized Criticality" in Physics. Other bursts also related to Physics follow these, such as "sandipil." After 2001, bursting terms like "complex," "network," "free," and "weight" appear, signifying a change in Vespignani's research area from Physics to Complex Networks, with a larger number of publications on topics like "weighted networks" and "scale-free networks."
Now, let's run the Burst Detection algorithm again for the same dataset but for a different value for the 'Gamma' parameter. Select the table 'with normalized Title' in the Data Manager and run 'Analysis > Topical > Burst Detection' with the following parameters:
Notice that the value for the gamma parameter is now set to 0.5. The parameter gamma controls the ease with which the automaton can change states. With a smaller gamma value, more bursts will be generated. Running the algorithm with these parameters will generate a new table named "Burst detection analysis (Publication Year, Title): maximum burst level 1.2" in the Data Manager.
Again where the "End" field is empty, put manually the last year present in the dataset. In this case, 2006.
After you add manually this information, save this .csv file somewhere in your computer. Load back this .csv file into Sci2 using 'File > Load'. Select 'Standart csv format' int the pop-up window. A new table will appear in the Data Manager. To visualize these table that contains these new results for the Burst Detection algorithm, select the table you just loaded in the Data Manager and run 'Visualization > Temporal > Horizontal Bar Graph (not included version)' with the same parameters.
A new PostScript file containing the horizontal bar graph will appear in the Data Manager. Once more, open and view the file using the workflow from Section 2.4 Saving Visualizations for Publication.
As expected, a larger number of bursts appear, and the new bursts have a smaller weight that those depicted in the first graph. These smaller, more numerous bursting terms permit a more detailed view of the dataset and allow the identification of trends. The "protein" burst starting in 2003, for example, indicates the year in which Alessandro Vespignani started to work with "protein-protein interaction networks," while the burst "epidem" - also from 2001 - is related to the application of complex networks to the analysis of epidemic phenomena in biological networks.
Visualizing Burst Detection in Excel
Its possible to generate a visualization for burst analysis in MS Excel. For this, open the results of the first burst analysis conducted ('Burst detection analysis (Publication Year, Title): maximum burst level 1') in MS Excel, by right clicking on this table in the Data Manager and selecting View.
To generate a visual depiction of the bursts in MS Excel perform the following steps:
1. Sort the data ascending by burst start year.
2. Add column headers for all years, i.e., enter first the start year in the cell of index G1, here 1990. As stated before, when there is no value in the "End" field that indicates that the burst lasted until the last date present in the dataset. So continue, e.g., using formula '=G1+1', until highest burst end year, here 2006 in cell W1.
3. In the resulting word by burst year matrix, select the upper left blank cell (G2) and select 'Conditional Formatting' from the ribbon. Then select 'Data Bars > More Rules > Use a formula to determine which cells to format.' To color cells for years with a burst weight value of more or equal 10 red and cells with a higher value dark red use the following formulas and format patterns:
Select 'OK' and then repeat step three, using the formula below:
4. Once both formatting rules have been established, select 'Conditional Formatting > Manage Rules', highlight the first formatting rule and move to the top of the list:
5. Make sure both formatting rules are selected and apply them to current selection. Apply the format to all cells in the word by year matrix by dragging the box around cell G2 to highlight all cells in the matrix. The result for the given example is shown in Figure 5.33
Figure 5.32.1: Visualizing burst results in MS Excel