Tools Home : Other Tools : Raining Words

Click here to show HTML tools HTML Tools

Click here to expand XML tools XML tools

Click here to expand plain text tools Plain Text Tools

Click here to expand other tools Other tools

 Beta tools
 Add Tools Demo
 Manual
 About

Raining Words
?
Summary

This tool generates a list of the top 20 high frequency words from an HTML document, either located at a specified web address or uploaded from the user’s files, and displays them in a Java applet. The higher the freqency of a given word, the larger it appears and more slowly it moves. The source text can be filtered with a stop word list or narrowed to only include words appearing within specified HTML elements.

Note: This tool requires the Java plug-in version 1.4.2_08 or newer to view the applet.

Please click the ? buttons at the bottom right of each set of options for more information on that set.

For further information on this tool, please see the TADA Wiki's Raining Words entry here. A glossary of terms is also available here.

Walkthrough

To generate a list of the top high frequency words from http://www.ucc.ie/celt/published/E726000-001.html; limit the words to only those appearing between <body> and </body> filter out the words on the modified Glasgow Stop Words list and display the results as an HTML page:
  1. Source text
    1. Enter 'http://www.ucc.ie/celt/published/E726000-001.html' in the URL field.
  2. Subtext limited to
    1. Enter 'body' in the 'Elements' field.
    2. Click the radio button next to 'Words not in the list below'.
    3. Click the radio button next to 'Use modified Glasgow stop words list'.
  3. Results
    1. Click the 'Submit' button to process the text.
*
» Source text
  Example: http://en.wikipedia.org/wiki/Socrates
?
Summary

This section determines the source of the document. Note: the source text must be hosted at a URL.

Fields

URL
To use content from a web page, enter a full web address (URL) in the field provided. Copy and paste from your browser’s address bar for best results.
*
» Subtext limited to
(separate multiple elements with a `,')





(separate words by ',')

Use modified Glasgow stop_words list (All words except stop list only)
?
Summary

This section determines how the text will be limited.

Fields

Elements
Specify which HTML elements to look within, such as ‘body’ or ‘p’. Multiple elements may be listed, separated by a comma. This field defaults to ‘body,’ which will provide all body text from the source.

All words
All words from within the specified element will be included in the word list.

Words in the list below
Only words specified by the user will be included in the list. Users can type the list of desired words in manually or upload a text (.txt) file containing their list.

Words not in the list below
Only words not in a user-specified list or the modified Glasgow Stop Words list will be included in the final list. To use a custom list, users can type it in manually or upload a text (.txt) file containing their list.

Words matching pattern
Only words matching a regular expression provided by the user will be included in the word list.

Word list typed in
Users can type a list of words separated by commas into this field to include or exclude. If they also choose ‘Words in the list below,’ only the words they have specified will be included in the final list. If users choose ‘Words not in the list below,’ the words they have specified will be excluded from the final list.

Text file with words
Users can upload a text (.txt) file of words to include or exclude. The list must be separated by commas (ex: red, orange, green, purple). If they also choose ‘Words in the list below,’ only the words they have specified will be included in the final list. If users choose ‘Words not in the list below,’ the words they have specified will be excluded from the final list.

Use Glasgow stop_words list
Choose this option to filter the modified Glasgow Stop Words list out of the final list.
» Results
'*' indicates a required field

 

 

TAPoRware Project, McMaster University,