Tools Home : Plain Text Tools : Find Text — Collocation

Click here to show HTML tools HTML Tools

Click here to expand XML tools XML tools

Click here to expand plain text tools Plain Text Tools

Click here to expand other tools Other tools

 Beta tools
 Add Tools Demo
 Manual
 About

Find Text — Collocation
?
Summary

This tool searches for a word or pattern and places it into a user-defined context of words, sentences, lines or paragraphs within a text (.txtl) document located either at a user-specified web address or uploaded from the user’s files. The results can be sorted alphabetically, by frequency or by Z-score (a measure of how far and in what direction that word deviates from it’s distribution’s mean in units of its standard deviation).

Note 1: This tool works best when the input is plain text. If an HTML or XML text is submitted, the tool will strip all tags and process it as plain text.

Note 2: When the set context is large, it is likely that the word or pattern searched will appear more than once in a given context. In this instance, the word or pattern will be counted more than once, decreasing the accuracy of the count and Z-score. For this reason, the best results are returned from small contexts.

Please click the ? buttons at the bottom right of each set of options for more information on that set.

For further information on this tool, please see the TADA Wiki's Collocation entry here. A glossary of terms is also available here.

Walkthrough

To generate collocates for text found within http://tada.mcmaster.ca/wikita/pub/Main/ToolTestingTexts/GulliversTravels.txt, search for occurrences of ‘great’, 'country’, ‘time’ and ‘people’, generate a list of the ten words appearing before and after each instance, sort the results by frequency and display it as HTML:
  1. Source text
    1. Enter ‘http://tada.mcmaster.ca/wikita/pub/Main/ToolTestingTexts/GulliversTravels.txt’ into the ‘URL’ field.
  2. Subtext limited to
    1. Enter ‘body’ in the ‘Elements’ field.
  3. What to find
    1. Enter ‘great, country, time, people’ into the ‘Word/Pattern’ field.
  4. Context for concordance
    1. Set the ‘Context’ drop list to ‘Words’.
    2. Enter ‘10’ into the ‘Context length’ field.
  5. Results
    1. Set the ‘Sort’ drop menu to ‘Co-occurring words by frequency’.
    2. Set the ‘Display as’ drop menu to ‘HTML’.
  6. Click the ‘Submit’ button to process the text.
*
» Source text
  Example: http://taporware.ualberta.ca/sampleDocs/plainText.txt


?
Summary

This section determines the source of the document you wish the tool to process.

Fields

Source URL
To use content from a web page, enter a full web address (URL) in the field provided. Copy and paste from your browser’s address bar for best results. If the web address directs to an HTML or XML document instead of plain text, the tool will strip all tags and process it as plain text.

Local file
To upload a plain text (.txt) file from your computer, choose ‘Local file,’ click ‘Browse,’ and select the file you wish to use from your directory.

*
» What to find
?
Summary

This section determines what to search for within the document.

Fields

Word/Pattern
Users can filter results based on the word, phrase or pattern (regular expression) entered here. To search multiple words without the tool treating them as a phrase, separate words with commas (Ex: red, orange, purple). To search for a phrase, enter it as it appears in the text - quotation marks are not needed (Ex: wine-dark sea, not "wine-dark sea"). Unix-style searching may also be used.

Exclude modified Glasgow Stop Words
Check this box to filter the words from TAPoR's modified Glasgow Stop Words List out of the final results.
*
» Context for concordance
?
Summary

This section allows the user to define the context type and how many of that type to show on either side of each instance of the word or pattern.

Fields

Context
Context can be set to one of four options from the drop menu: words, lines, sentences and paragraphs. Note: Paragraphs is not available for the Raw Grep tool.

Context Length
This drop menu allows users to enter how many words, lines, sentences or paragraphs they want to show before and after the word or pattern.
» Results
?
Summary

This section allows users to choose how the results will be formatted, and whether to display it in a new browser window.

Fields

Sort
This drop menu allows users to select from three sort options for co-occurring words: by frequency, alphabetically or by Z-score.

Display as
This drop-down lists enables users to choose from several output formats: HTML, XML text in HTML, XML tree, and Tab delimited text.

Open results in new window
Check this box to display the results in a new window or browser tab. This option is selected by default. Some pop-up blockers may prevent a new window from being opened; if so, un-check the box to open the results in the same window instead.
`*' indicates a required field

 

 

TAPoRware Project, McMaster University,