Tools Home : HTML Tools : Find Text — Concordance

Click here to show HTML tools HTML Tools

Click here to expand XML tools XML tools

Click here to expand plain text tools Plain Text Tools

Click here to expand other tools Other tools

 Beta tools
 Add Tools Demo
 Manual
 About

Find Text — Concordance
?
Summary

This tool finds text anywhere within an HTML document, either located at a specified web address or in a document uploaded from the user’s files, and generates a concordance for each instance.

Users can limit the search to text within specified HTML tags, and choose whether to situate their desired text within a context of words, lines, sentences or paragraphs. The search terms can be a word or a regular expression.

Note: When 'Paragraphs' is selected as context, any element name entered by the user will be overwritten by the 'p' (<p></p>) tag. As not all HTML texts use the paragraph tag to identify paragraphs, this may cause some problems with the tool’s results. If so, use a context of word, line or sentence instead.

Please click the ? buttons at the bottom right of each set of options for more information on that set.

For further information on this tool, please see the TADA Wiki's Concordance entry here. A glossary of terms is also available here.

Walkthrough

To generate a concordance for text found between <body> and </body> from http://www.w3.org/, search the text for 'web', display up to 10 words found immediately before and after 'web' and display it within a new HTML page:
  1. Source text
    1. Enter ‘http://www.w3.org/’ into the ‘URL’ field.
  2. Subtext limited to
    1. Enter ‘body’ in the ‘Elements’ field.
  3. What to find
    1. Enter 'web' in the 'Word/Pattern' field.
  4. Context for concordance
    1. Choose 'Words' from the 'Context' drop-down menu.
    2. Enter '10' into the 'Context length' field.
  5. Results
    1. Select ‘HTML’ from the ‘Display as’ drop-down menu.
    2. Check the box next to 'Display words before and after the pattern'.
  6. Click the ‘Submit’ button to process the text.
*
» Source text
  Example: http://taporware.ualberta.ca/einstein-bio.html

?
Summary

This section determines the source of the document you wish the tool to process. HTML can be obtained either from a web address or by uploading a file.

Fields

Source URL
To use content from a web page, enter a full web address (URL) in the field provided. Copy and paste from your browser’s address bar for best results.

Local file
To upload an HTML (.html) file from your computer, choose ‘Local file,’ click ‘Browse,’ and select the file you wish to use from your directory.
*
» Subtext limited to
(separate multiple elements with a `,')
?
Summary

This section determines which HTML tags to extract text from.

Fields

Elements
Use this field to specify which HTML tag(s) to extract text from. Multiple tags must be separated by commas (ex: 'p, h1, h2'). This field defaults to 'body'.
*
» What to find
**
?
Summary

This section allows the source text to be searched for words, phrases or patterns.

Fields

Word/Pattern
Use this field to filter results based on the word, phrase or pattern (regular expression) entered here. To search multiple words without the tool treating them as a phrase, separate words with commas (Ex: red, orange, purple). To search for a phrase, enter it as it appears in the text - quotation marks are not needed (Ex: wine-dark sea, not "wine-dark sea"). Unix-style searching may also be used.

Get Synonyms
If searching a word, this button generates a list of synonyms in a new window. To search the original word plus synonymns, copy the comma-separated list in the text box of the new window and paste it into the Word/Pattern field. Note: Concordance only.

Exclude modified Glasgow Stop Words
Choose this option to filter the modified Glasgow Stop Words list out of the final list Note: Collocation only.
» Context for concordance
?
Summary

This section allows the user to define the context type (words, lines, sentences or paragraphs), and how many of that type (context length) to show on either side of each instance of the word or pattern.

Fields

Context

Context can be set to one of four options from the drop menu: words, lines, sentences and paragraphs.

Words
This option places the search term or pattern in context by a specified number of words.

Lines
This option places the search term or pattern in context by a specified number of lines.

Sentences
This option places the search term or pattern in context by a specified number of sentences.

Paragraphs
This option places the search term or pattern in context by a specified number of paragraphs.

Context Length
This drop menu allows users to enter how many words, lines, sentences or paragraphs to show before and after the word or pattern. For example, if the context length is 5 words, every instance of the search term or pattern will be shown surrounded by a total of ten words - the five that precede it, and the five that follow it.
» Results

?
Summary

This section allows the user to choose the format of the aggregated text.

Fields

Display as
This drop-down lists enables users to choose from several output formats: HTML, XML text in HTML, XML tree, and Tab delimited text. Note: XML outputs are not available for the Find Dates tool.

Display words before and after the pattern
Check this box to show the words that appear before and after the pattern in the results Note: Concordance only.

Open results in new window
Check this box to display the results in a new window or browser tab. This option is selected by default. Some pop-up blockers may prevent a new window from being opened; if so, un-check the box to open the results in the same window instead.
`*' indicates a required field
`**' Thesaurus service provided by words.bighugelabs.com

 

 

TAPoRware Project, McMaster University,