Tools Home : Plain Text Tools : List Words

Click here to show HTML tools HTML Tools

Click here to expand XML tools XML tools

Click here to expand plain text tools Plain Text Tools

Click here to expand other tools Other tools

 Beta tools
 Add Tools Demo
 Manual
 About

List Words
?
Summary

This tool is used to list words found within a plain text document, either located at a specified web address or in a document uploaded from the user’s files. While users can also submit XML or HTML documents, this tool will strip all tags and treat them as plain text.

The document’s subtext can be filtered by applying a pattern of words or using regular expressions, searching only for the words on a user-provided list, or excluding words from TAPoR’s modified Glasgow Stop Words list.

The query results can be displayed alphabetically, by frequency, by order of appearance, or in reversed alphabetical order.

Note: When a custom list of words is applied to the text, the tool’s resultant word list order will match the order of that custom list regardless of the sort criteria specified under Results.

Please click the ? buttons at the bottom right of each set of options for more information on that set.

For further information on this tool, please see the TADA Wiki's List Words entry here. A glossary of terms is also available here.

Walkthrough

To extract words from http://tada.mcmaster.ca/wikita/pub/Main/ToolTestingTexts/GulliversTravels.txt, filter out words that appear in the modified Glasgow Stop Words list and sort the results by frequency:
  1. Source text
    1. Enter ‘http://tada.mcmaster.ca/wikita/pub/Main/ToolTestingTexts/GulliversTravels.txt’ into the ‘URL’ field.
  2. Words limited to
    1. Click the radio button next to ‘Words not in the list below
    2. Click the radio button next to ‘Use modified Glasgow Stop Words
  3. Results
    1. Select ‘By Frequency’ from the 'Sort' drop-down menu..
  4. Click the ‘Submit’ button to process the text.
*
» Source text
  Example: http://taporware.ualberta.ca/sampleDocs/plainText.txt


?
Summary

This section determines the source of the document you wish the tool to process.

Fields

Source URL
To use content from a web page, enter a full web address (URL) in the field provided. Copy and paste from your browser’s address bar for best results. If the web address directs to an HTML or XML document instead of plain text, the tool will strip all tags and process it as plain text.

Local file
To upload a plain text (.txt) file from your computer, choose ‘Local file,’ click ‘Browse,’ and select the file you wish to use from your directory.

» Words limited to




?
Summary

This section applies limits to the source text.

Fields

All words
All words from the source text will be included in the word list.

Words matching pattern
Only words matching a regular expression provided by the user will be included in the word list.

Words in the list below
Only words specified by the user will be included in the list. The list of desired words can be typed into the provided field, or uploaded as a text (.txt) file.

Words not in the list below
Only words not in a user-specified list or the modified Glasgow Stop Words list will be included in the final list. To use a custom list, type it in manually or upload it as a text (.txt) file.

Word list typed in
Type a list of words separated by commas into this field to include or exclude. If ‘Words in the list below’ is also selected, only the words specified here will be included in the final list. If ‘Words not in the list below’ is selected, the specified words will be excluded from the final list instead.

Text file with words
Click the 'Browse' button to upload a text (.txt) file of words to include or exclude. The list must be separated by commas (ex: red, orange, green, purple). If ‘Words in the list below’ is also selected, only the words specified here will be included in the final list. If ‘Words not in the list below’ is selected, the specified words will be excluded from the final list instead.

Use modified Glasgow Stop Words
Select this option to use TAPoR's modified Glasgow Stop Words list instead of a custom words list.If ‘Words in the list below’ is also selected, only the words specified here will be included in the final list. If ‘Words not in the list below’ is selected, the specified words will be excluded from the final list instead.
» Results

?
Summary

This section allows users to choose how the results will be formatted, and whether to display it in a new browser window.

Fields

Sort
This drop-down lists enables users to choose from several sort options: Alphabetically, Order of first appearance, By frequency, or By reversed alphabetic order.

Apply inflectional stemmer
Click this box to have the tool process each word as its root (ex: waited, waiting, waits would be processed as ‘wait’). Please note that his option will slow the tool down proportional to the size of the text being processed.

Display as
This drop-down lists enables users to choose from several output formats: HTML, XML text in HTML, XML tree, and Tab delimited text.

Display top ___ words distribution over each 5% of text
This drop-down lists enables users to choose whether to include word distribution statistics for the top words in the final list. Options include None, 5, 10, 20 or 50 words.

Open results in new window
Check this box to display the results in a new window or browser tab. This option is selected by default. Some pop-up blockers may prevent a new window from being opened; if so, un-check the box to open the results in the same window instead.
`*' indicates a required field

 

 

TAPoRware Project, McMaster University,