This section applies limits to the source text based on specific HTML elements and word filtering techniques.
Specify which HTML elements to look within, such as ‘body’ or ‘p’. Multiple elements may be listed, separated by a comma. This field defaults to ‘body,’ which will provide all body text from the source.
All words from within the specified element will be included in the word list.
Words matching pattern
Only words matching a regular expression provided by the user will be included in the word list.
Words in the list below
Only words specified by the user will be included in the list. Type the list of desired words in manually or upload a text (.txt) file containing their list.
Words not in the list below
Only words not in a user-specified list or the modified Glasgow Stop Words list will be included in the final list. To use a custom list, type it in manually or upload a text (.txt) file containing the list.
Word list typed in
Type a list of words separated by commas into this field to include or exclude those words from the final list. To include the typed words, choose ‘Words in the list below’, and to exclude them, choose ‘Words not in the list below’.
Text file with words
Use this field to upload a text (.txt) file of words to include or exclude. The list must be separated by commas (ex: red, orange, green, purple). To include only the words contained within the text file in the final list, choose ‘Words in the list below.’ To exclude the words instead, choose ‘Words not in the list below.’
Use modified Glasgow Stop Words
Choose this option to filter the modified Glasgow Stop Words list out of the final list.