Tools Home : HTML Tools : Summarizer

Click here to show HTML tools HTML Tools

Click here to expand XML tools XML tools

Click here to expand plain text tools Plain Text Tools

Click here to expand other tools Other tools

 Beta tools
 Add Tools Demo
 Manual
 About

Summarizer
?
Summary

This tool creates a summary of statistical information on an HTML document, either from a web address or uploaded from the user’s files. High frequency words can be represented in a number of ways, including lists of words, lists of sentences containing top words, concordances, collocates, or distribution.

Note: This tool is processing intensive and takes time to analyze the text, with the wait being proportional to the size of the text. For this reason, it is strongly recommended that this tool be used only for texts of short story length or less. The larger the text, the more likely the tool will overburden the server and result in an error. Should users wish to analyze a text of up to novel length, TAPoR recommends converting the document to .txt and using the Plain Text version of this tool instead.

Please click the ? buttons at the bottom right of each set of options for more information on that set.

For further information on this tool, please see the TADA Wiki's Summarizer entry here. A glossary of terms is also available here.

Walkthrough

To generate a summary from http://www.w3.org/ extract only the text found between <body> and </body>, list the top five high frequency words, list all sentences containing those words, and generate both a concordance and collocates for those words, then display the results by frequency as an HTML page:
  1. Source text
    1. Enter ‘http://www.w3.org/’ into the ‘URL’ field.
  2. Subtext limited to
    1. Enter ‘body’ in the ‘Elements’ field.
  3. Summary limited to
    1. Enter ‘5’ in ‘List top ___ frequency words’.
    2. Click the radio button next to ‘From all words’.
    3. Enter ‘1’ in ‘List sentences that have ___ or more high frequency words’.
    4. Choose ‘first’ and enter ‘5’ from ‘For each high frequency word, list ______ context(s) with context length of ___ words before and after’.
    5. Enter ‘1’ in ‘List collocation within ___ words of high frequency words’.
  4. Results
    1. Set ‘Sort’ to ‘by frequency’.
    2. Set the ‘Display as’ field to ‘HTML’.
    3. Click the ‘Submit’ button to process the text.
*
» Source text
  Example: http://taporware.ualberta.ca/einstein-bio.html

?
Summary

This section determines the source of the document you wish the tool to process. HTML can be obtained either from a web address or by uploading a file.

Fields

Source URL
To use content from a web page, enter a full web address (URL) in the field provided. Copy and paste from your browser’s address bar for best results.

Local file
To upload an HTML (.html) file from your computer, choose ‘Local file,’ click ‘Browse,’ and select the file you wish to use from your directory.
» Subtext limited to
(separate multiple elements with a `,')
?
Summary

This section determines which HTML tags to extract text from.

Fields

Elements
Use this field to specify which HTML tag(s) to extract text from. Multiple tags must be separated by commas (ex: 'p, h1, h2'). This field defaults to 'body'.
» Summary limited to





(separate words by `,')




?
Summary

This section determines what to include in the text summary.

Fields

List top ___ frequency words
Determines how many of the top frequency words to include in the analysis.

From all words
Searches the full text for the top high frequency words.

Matching pattern
Use this field to specify a search pattern. Only words matching that pattern will be included in the top high frequency words. The pattern may be a string of characters or a regular expression.

From word list
Use this field to limit words to those found on a user-specified word list, either typed directly into the field provided or contained within a .txt file of comma-separated words. Note: The text field and ‘from local file’ options are located under ‘Not from word list’.

Not from word list
Use this field to exclude words found on a user-specified word list from the results. The list may be typed directly into the field provided or contained within a .txt file of comma-separated words. If desired, the provided modified Glasgow Stop Words list may be applied instead.

Type in
This field enables the user to enter a comma-separated list of words, for use with the ‘From/Not from word list’ options.

From local file
Use this field to upload a .txt file containing a comma-separated list of words, for use with the From word list or Not from word list options.

Use modified Glasgow Stop Words
Click the radio button to exclude words on the modified Glasgow Stop Words list from the results. For use with the ‘Not from word list’ option.

List sentences that have ___ or more high frequency words
Enter a number in the box provided to specify the minimum number of high frequency words a sentence must have to appear in the results.

For each high frequency word, list (first/first three/all) context(s) with context length of ___ words before and after
This pair of fields determines how many instances of each high frequency word to display contexts for in the results, and how many words to include in that context.

List collocation within ___ words of the high frequency words
This field determines the maximum number of words from a high frequency word that a collocate can appear to be included in the results.

Elements against text distribution
This option allows the user to specify how many occurrences of a high frequency word there are per element, and the average number of words found within each unique element. NOTE: This option is currently unavailable.
» Results
?
Summary

This section allows users to choose how the results will be formatted, and whether to display it in a new browser window.

Fields

Sort
This drop-down lists enables users to choose from several sort options: Alphabetically, Order of first appearance, By frequency, or By reversed alphabetic order.

Display as
This drop-down lists enables users to choose between results formatted in HTML or as tab delimited text.

Open results in new window
Check this box to display the results in a new window or browser tab. This option is selected by default. Some pop-up blockers may prevent a new window from being opened; if so, un-check the box to open the results in the same window instead.
`*' indicates a required field

 

 

TAPoRware Project, McMaster University,