Child pages
  • Burst Detection
Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 6 Next »

Description

This is Burst algorithm is implemented based on the Jon Kleinberg's, Bursty and Heirarchical Structure in Stream. A burst is a period of increased activity, determined by minimizing a cost function that assumes a set of possible states (not bursting and various degrees of burstiness) with increasing event frequencies, where it is expensive to go up a level and cheap (zero-cost) to decrease a level. It is useful for text stream analysis (such as emails, corpus, publication dataset) where you want to know the activity of the stream in a period of time. Given a table with at least three columns, a Text Column (event or topics to be targeted), a dates/timestamps (time the event happens) and a delimited value (to separate multiple events / topics), this algorithm detects bursts of each event / topics.

The algorithm takes 8 parameters.

  • Gamma is the value that state transition costs are proportional to. The higher Gamma value results the higher transition costs. Use this parameter to control how ease the automaton can change states.
  • *Density scaling *determines how much 'more bursty' each level is beyond the previous one. The higher the scaling value, the more active (bursty) the event happens in each level.
  • Bursting States determines how many bursting states there will be, beyond the non-bursting state. An i value of bursting states is equals to i+1 automaton states.
  • Date Column is the name of the column with date/time when the events / topics happens.
  • Date Format specifies how the date column will be interpreted as a date/time. See http://java.sun.com/j2se/1.4.2/docs/api/java/text/SimpleDateFormat.html for details.
  • Text Column is the name of the column with values (delimiter and tokens) to be computed for bursting results.
  • Text Separator delimits the tokens in the text column. When constructing your tables, do not use a separator that is used as a whole or part of any token.

Usually you will only need to change Date Column, Date Format, Text Column, and Text Separator when using Burst Detection. Please see 'Usage Hints' for more details about guidance.

Pros & Cons

Because of the by-value state machine approach, values are bursted on independently of each other. This makes this algorithm suitable primarily when the changes in patterns of individual value usage are the area of interest. Cross-value comparisons of bursts are possible, because burst 'strength' is calculated.

Applications

Burst detection is particularly useful for examining the trends in collections of texts or communities of conversation. Even words that are used comparatively little, but that change in frequency of usage over time, stand out, unlike in burst detection algorithms based on thresholds.

Implementation Details

This algorithm provides all options for the original C program that had any effect.

Usage Hints

Please read the Description section before continue. 

This burst algorithm is a text based burst detection that provide burst results in heirarchical structure. However, it is also capable to detect if the bursts exist by setting the bursting states to 1. 

Steps:

1. Choose Analysis > Topical > burst Detection from the Sci2's menu bar.

2. A window will popup and a list of input parameters are listed. Change the parameter based on your need.

The algorithm takes 8 input parameters.

  • Gamma is the value that state transition costs are proportional to. The higher Gamma value results the higher transition costs. Use this parameter to control how ease the automaton can change states.
  • Density scaling determines how much 'more bursty' each level is beyond the previous one. The higher the scaling value, the more active (bursty) the event happens in each level.
  • Bursting States determines how many bursting states there will be, beyond the non-bursting state. An i value of bursting states is equals to i+1 automaton states.
  • Date Column is the name of the column with date/time when the events / topics happens.
  • Date Format specifies how the date column will be interpreted as a date/time. See [http://java.sun.com/j2se/1.4.2/docs/api/java/text/SimpleDateFormat.html|http://java.sun.com/j2se/1.4.2/docs/api/java/text/SimpleDateFormat.html] for details.
  • Text Column is the name of the column with values (delimiter and tokens) to be computed for bursting results.
  • Text Separator delimits the tokens in the text column. When constructing your tables, do not use a separator that is used as a whole or part of any token.

Note that the burst detection algorithm will not edit the words in the Text Column. Each token are treats different such as author, Author, authors, etc. The values in the Text Column must be a list of textual tokens. You can use Lowercase, Tokenize, Stem, and Stopword Text to normalize free-form text into this shape.

The defaults are typically good choices, but more sophisticated models can be fitted by tweaking them in various ways.

References

J. Kleinberg. Bursty and Hierarchical Structure in Streams. Proc. 8th ACM SIGKDD Intl. Conf. on Knowledge Discovery and Data Mining, 2002.

See Also

The license could not be verified: License Certificate has expired! Generate a Free license now.

  • No labels