Tools Home : TAPoRware Manual

Click here to show HTML tools HTML Tools

Click here to expand XML tools XML tools

Click here to expand plain text tools Plain Text Tools

Click here to expand other tools Other tools

 Beta tools
 Add Tools Demo
 Manual
 About

TAPoRware Manual

The TAPoRware manual is currently in development. If you have any questions that you would like to appear on this page, please send them to lyan (at) mcmaster (dot) ca


Introduction to TAPoRware tools
» What is TAPoRware?
» Where do I start with TAPoRware?
» TAPoRware demo: using the HTML concordance tool
» Are there any specific requirements needed in order to run the TAPoRware tools?
» Where can I get help for individual tools?

Advanced techniques
» Using regular expressions

Troubleshooting
» I chose to view results in a new window but I can't see the window!?
» Problems running Java applets under OS X


What is TAPoRware?

TAPoRware is a set of text analysis tools that enables users to perform text analysis on HTML, XML and plain text files, using documents from the users' machine or on the web.

The TAPoRware tools were developed with support from the Canada Foundation for Innovation and the McMaster University Faculty of Humanities. These tools are being developed by Geoffrey Rockwell, Lian Yan and Matt Patey of the TAPoR Project for a TAPoR Portal which we expect to open in 2005.

Where do I start with TAPoRware?

The first thing you must do is determine what kind of data you will be working with. Each TAPoRware tool has been designed to work with a specific type of document. Currently HTML, XML and plain text are supported. The tools are organised in a hierarchical menu on the left-hand menu.

Once you have determined the type of data you will be working with you will need to select a tool from the appropriate tool section, i.e. HTML Tools, XML Tools or Plain Text Tools. After clicking one of the tools you will be presented with an interface to the tool. Individual help and walkthroughs have been written for each tool. See the entry below on where help and walkthroughs can be accessed for each tool.

TAPoRware demo: using the HTML concordance tool

The HTML Concordance tool allows you to find concordances in an HTML document. The tool allows you to specify an HTML document (online or offline via upload) and search for the context around the keyword you are searching for. Here we will go through an example of how one might use this tool. Open a new window and follow along.

First step: defining a source text

Accessing the Concordance tool is easy. Click the `+' beside HTML in the main menu to expand the list of HTML tools and click the Concordance found in the expanded list. We first need to define the text on which the tool will operate. Using the Source text form, select the URL radio button and enter http://www.w3.org/ in the adjacent text field. This will tell the tool to fetch HTML from http://www.w3.org/ and use it as the source text for the following operations.



Second step: limiting the search

Here we have the opportunity to define a limiter that will allow us to refine the scope of our concordances. The value provided at this stage can be any number of things such as: HTML tags, keywords or punctuation. In this case we will use the <body> tag because we are interested in the core content of the HTML file. This will ensure that the tool returns concordances that occur only between the <body> and <body> tags. Simply type in body in the Elements text field.

n.b.: it is possible to enter multiple elements by separating each one with a comma.



Third step: defining which word(s) or pattern to find

Here is where you define which word(s) or pattern you are interested in finding. Any word(s) or pattern found within the document will be presented with its corresponding context. For the purpose of this example, select the Word(s) radio button and enter web in the corresponding text field. Doing this will tell the tool to search for all occurrences of the word web in the source document and present each one with the words that surround them.



Fourth step: defining the context for concordance

In this step we define how many words surrounding each occurrence of the word or pattern we are searching for to be presented in the results. For example, entering 20 in the Context length field would return twenty words before and after each occurrence of the given word/pattern. We can also define the context for concordance. That is, it is possible to have the tool return up to n number of words, lines or sentences before and after occurrences of the word/pattern we are interested in finding.



You may now press the Submit button to begin the analysis. You should note that larger web pages can take several seconds to be processed so please do not try to submit several times at once, it won't speed up the process. Finally, tools are generally set by default to display results in a new window. If you have a pop-up blocker installed or are using a browser with pop-up blocking enabled (e.g. Firefox) you may have to enable pop-ups for the TAPoRware site or simply choose to open results in the tool window by unchecking the Open results in new window option. For more information on enabling pop-up windows, please see the link below.

Are there any specific requirements needed in order to run the TAPoRware tools?

All you will need is a web browser with graphical support, i.e. not a text-based web browser (e.g. Lynx). Most of the tools also make extensive use of JavaScript and at times employ Java applets for displaying interactive graphs etc. That said, you should enable JavaScript for your web browser when using the tools and consider installing a recent version of the Java runtime environment (JRE) should you decide to use certain tools (e.g. Distribution Graph, Raining Words).

Where can I get help for individual tools?

Each tool has an embedded help system. By clicking the `?' at the top-right-hand corner of each tool, you can read a summary of what the tool does and follow a mini tutorial on how you might use the tool. Furthermore, tool components include their own help interface which explains the significance of each field. These are accessed by clicking the `?' in the lower-right- hand corner of each component.

screenshot of tool help widget

Using regular expressions

Although you can use the TAPoRware suite of tools without any knowledge of regular expressions, knowing how to use them can augment the quality of results produced by the tools. Stephen Ramsay's introduction to regular expressions is an excellent place to begin learning how to use them.

I chose to view results in a new window but I can't see the window!?

Chances are that you have a pop-up blocker running, which will prevent new windows from being created. Firefox and Internet Explorer both have built-in mechanisms which, by default, disable new windows from being created. If you are using either of these browsers you will have to configure it to allow pop-up windows from http://taporware.mcmaster.ca/. Instructions for Firefox and Internet Explorer are available online. If you are using other pop-up blocker software, you will have to add http://taporware.mcmaster.ca/ to the list of sites for which pop-ups should not be blocked.

Problems running Java applets under OS X

OS X (prior to 10.4, aka Tiger) does not natively support Java 1.4.x for web browsers other than Safari. Because the applets used in some of the TAPoRware tools have been compiled using Java version 1.4.2, they will not work on OS X systems that use earlier versions of Java. There is a workaround that will allow Firefox users to use applets compiled with Java 1.4.2 which can be found here.

 

 

TAPoRware Project, McMaster University,