HTML Document Analysis Tools

(Note: These tools are for html file which contains "text content". If your input contains 'form' or graphics components etc. the results may not be as good as those of "text content".)
List Words within Specified HTML tags
Count, sort and list words of the source text in difference orders. It can strip words user specified from the list, or ...
List HTML Tags
List and count the HTML tags of the input HTML document in different orders.
Extract Text Using HTML Tags
Extract text from the input HTML document based on user specified HTML tag(s).
Find Text — Concordance
Find user specified word/pattern anywhere in the text with its context. Display the result in KWIC format.
Find Co-occurring Patterns
Find user specified primary pattern and co-pattern in a given length of context. Highlight the patterns in the result.
Find Collocates
Find and list words before and after the user specified word in a given length of context. Sort the result in different ways.
Tokenize HTML Document
Split text based on token which can be word, line, sentence etc. or split text on user specified separators.
Fixed Phrase
Locate fixed phrases with a specific word in them and displays the located phrase in several different ways.
Date Finder
Extract dates along with the sentence containing them. The date can be years, months, weeks, seasons or all of them.
Perform basic statistics on the text, list top frequency words, and display concordance of the top words etc.
Find and count the user specified word in different chunk of sub texts, display them in the different distributions.
Compare words distribution of the two texts along with the basic statistics of the texts. The words can be sorted in ...
Link Extractor
Extract all the href links in the input HTML document, make all relative links absolute, then list the links.



