Tools Home : Plain Text Tools : Compare with Control

Click here to show HTML tools HTML Tools

Click here to expand XML tools XML tools

Click here to expand plain text tools Plain Text Tools

Click here to expand other tools Other tools

 Beta tools
 Add Tools Demo
 Manual
 About

Word List Comparison with Control Corpus
?
Summary

This tool generates a list of top high frequency words by comparing a source text document, either located at a specified web address or uploaded from the user’s files, with a control corpus. The results can be limited by applying a modified Glasgow Stop Words list, and sorted either by frequency or by ratio over the control corpus. Users can also choose to use the provided Brown corpus, or use their own.

Note: If an HTML or XML text is submitted, the tool will strip all tags and process it as plain text.

Please click the ? buttons at the bottom right of each set of options for more information on that set.

For further information on this tool, please see the TADA Wiki's Compare with Control entry here. A glossary of terms is also available here.

Walkthrough

To compare the text from http://tada.mcmaster.ca/wikita/pub/Main/ToolTestingTexts/GulliversTravels.txt with the Brown corpus, filter out words that appear in the modified Glasgow Stop Words list, sort by ratio over the control corpus and view the results as HTML:
  1. Source text
    1. Enter ‘http://tada.mcmaster.ca/wikita/pub/Main/ToolTestingTexts/GulliversTravels.txt’ into the ‘URL’ field.
  2. Control text
    1. Click the radio button next to ‘Choose control corpus’.
  3. Words limited to
    1. Click the radio button next to ‘Use modified Glasgow Stop Words’.
  4. Results
    1. Select ‘Ratio over control corpus’ from the ‘Sort by’ drop-down menu.
    2. Select ‘HTML’ from the ‘Display as’ drop-down menu.
  5. Click the ‘Submit’ button to process the text.
*
» Source text
  Example: http://taporware.ualberta.ca/sampleDocs/plainText.txt


?
Summary

This section determines the source of the document you wish the tool to process.

Fields

Source URL
To use content from a web page, enter a full web address (URL) in the field provided. Copy and paste from your browser’s address bar for best results. If the web address directs to an HTML or XML document instead of plain text, the tool will strip all tags and process it as plain text.

Local file
To upload a plain text (.txt) file from your computer, choose ‘Local file,’ click ‘Browse,’ and select the file you wish to use from your directory.

*
» Control text




?
Summary

This section allows the user to apply the Brown corpus or upload their own.

Fields

Choose control corpus
Select this option to use the provided Brown corpus.

User's corpus
This option allows the user to upload a custom corpus from their computer. The corpus must be in plain text format (.txt).
» Words limited to

?
Summary

This section applies limits to the source text.

Fields

All words
All words from the source text will be included in the word list.

Use modified Glasgow Stop Words
Select this option to use TAPoR's modified Glasgow Stop Words list instead of a custom words list.If ‘Words in the list below’ is also selected, only the words specified here will be included in the final list. If ‘Words not in the list below’ is selected, the specified words will be excluded from the final list instead.
» Results
?
Summary

This section affects how the tool’s final results will be displayed.

Fields

Sort by
This drop-down list allows the user to sort the results by frequency, or by the ratio of the source text over the control corpus.

Display as
This drop-down lists offers a choice of displaying the tool results as HTML or tab delimited text.

Open results in new window
Check this box to display the results in a new window or browser tab. This option is selected by default. Some pop-up blockers may prevent a new window from being opened; if so, un-check the box to open the results in the same window instead.

 

 

TAPoRware Project, McMaster University,