What is a HtmlTokenizer?
What does a HtmlTokenizer really do?
What is its utility ?
How can开发者_开发百科 I use it in a C# application ?
It converts HTML elements to tokens, like this:
<div><b>Tekst!</b></div>
This can be converted to something like this:
TOKEN_DIV TOKEN_STRONG TOKEN_STRING TOKEN_END_STRONG TOKEN_END_DIV
With this, you can create a parser that will parse the document.
It parses html and exposes the tags (and their data and attributes), using a 'listener' style interface similar to SAX for XML. Although I beleive there are quite a few different implementations of classes called HTMLTokenize.
The listener style output works by calling methods in the listener when the parser encounters certain elements, for example you may have a startTag(...) method in the listener, and whenever the parser encounters the start of a new tag it will call this method and pass in data for the tag found. Similarly when the end of the tag is encountered it will call a coresponding endTag() method. It is up to the listener to keep track of exactly what and where the parsing is up to, which is why often a parser that simple exposes a DOM tree is easier to use.
If you can provide mode specifics on where this class comes from, more detail can be then provided in an answer.
Also, I am not aware of any C# class libraries that have this class, only Java??
A HTML tokenizer simply breaks a stream of text into tokens, where each token is a string. Normally each string represents either "text", or an HTML element.
You can use it in c# like any other class.
精彩评论