Tools Home : XML Tools : Compare Two Documents

Click here to show HTML tools HTML Tools

Click here to expand XML tools XML tools

Click here to expand plain text tools Plain Text Tools

Click here to expand other tools Other tools

 Beta tools
 Add Tools Demo
 Manual
 About

Compare Two Documents
?
Summary

This tool is used to compare words within two XML documents, either located at a specified web address or uploaded from the user’s files.

Words for comparison can be limited to specific XML tags, and then filtered further by searching a particular word, searching a pattern, searching either for the words or excluding only the words on a user-specified list. or excluding words from TAPoR’s modified Glasgow Stop Words list.

Results can be sorted by the top words in the first document or by the ratio of relative counts in both documents. The list can then be displayed in an HTML or tab delimited format.

Please click the ? buttons at the bottom right of each set of options for more information on that set.

For further information on this tool, please see the TADA Wiki's Comparator entry here. A glossary of terms is also available here.

Walkthrough

To compare texts from http://www.ucc.ie/celt/texts/E850003-007.xml and http://www.ucc.ie/celt/texts/E850003-008.xml, search only the text between the <p> and </p> tags, filter out words that appear in the modified Glasgow Stop Words list and sort the results by ratio of relative word count:
  1. Source text
    1. Enter ‘http://www.ucc.ie/celt/texts/E850003-007.xml’ into the first ‘URL’ field.
    2. Enter ‘http://www.ucc.ie/celt/texts/E850003-008.xml’ into the second ‘URL’ field.
  2. Words to compare
    1. Enter ‘p’ in the ‘Elements’ field.
    2. Click the radio button next to ‘Words not in the list below’.
    3. Click the radio button next to ‘Use modified Glasgow Stop Words
  3. Results
    1. Select ‘Ratio of relative count’ from the 'Sort' drop-down list.
    2. Select ‘HTML’ from the ‘Display as’ drop-down menu.
  4. Click the ‘Submit’ button to process the text.
*
» Source text
  Example: http://taporware.ualberta.ca/sampleDocs/interact2.xml

?
Summary

This section determines the source of the document you wish the tool to process. XML can be obtained either from a web address or by uploading a file.

Fields

Source URL
To use content from a web page, enter a full web address (URL) ending in .html in the field provided. Copy and paste from your browser’s address bar for best results.

Local file
To upload an XML (.xml) file from your computer, choose ‘Local file,’ click ‘Browse,’ and select the file you wish to use from your directory.
*
» Second Source text
  Example: http://taporware.ualberta.ca/sampleDocs/ddh.xml

?
Summary

This section determines the source of the document you wish the tool to process. XML can be obtained either from a web address or by uploading a file.

Fields

Source URL
To use content from a web page, enter a full web address (URL) ending in .html in the field provided. Copy and paste from your browser’s address bar for best results.

Local file
To upload an XML (.xml) file from your computer, choose ‘Local file,’ click ‘Browse,’ and select the file you wish to use from your directory.
» Words to compare
(separate multiple elements with a `,')




(use `,' as delimiter)

?
Summary

This section applies limits to the source texts.

Fields

Elements
Specify which XML elements to search within. Multiple elements may be listed, separated by a comma. Note: Element attributes cannot be included in this search.

All words
All words from within the specified element will be included in the word list.

Words matching pattern
Only words matching the text string or regular expression provided by the user will be included in the word list.

Words in the list below
Only words specified by the user will be included in the list. Users can type the list of desired words in manually or upload a text (.txt) file containing their list.

Words not in the stop list below
Only words not in a user-specified list or the modified Glasgow Stop Words list will be included in the final list. To use a custom list, users can type it in manually or upload a text (.txt) file containing their list.

Word list typed in
Users can type a list of words separated by commas into this field to include or exclude. If they also choose ‘Words in the list below,’ only the words they have specified will be included in the final list. If users choose ‘Words not in the list below,’ the words they have specified will be excluded from the final list.

Text file with words
Users can upload a text (.txt) file of words to include or exclude. The list must be separated by commas (ex: red, orange, green, purple). If they also choose ‘Words in the list below,’ only the words they have specified will be included in the final list. If users choose ‘Words not in the list below,’ the words they have specified will be excluded from the final list.

Use Glasgow Stop Words
Choose this option to filter the modified Glasgow Stop Words list out of the results.
» Results
?
Summary

This section allows users to choose how the results will be formatted, and whether to display it in a new browser window.

Fields

Sort by
This drop-down list allows users to choose between sorting the results by the word count of the first text or by the ratio of relative count between texts.

Display as
This drop-down lists offers a choice of displaying the tool results as HTML or tab delimited text.

Open results in new window
Checking this box will display the results in a new window. This option is selected by default. Some pop-up blockers may prevent a new window from being opened; if so, un-check the box to open the results in the same window instead.

 

 

TAPoRware Project, McMaster University,