开发者

What is a HtmlTokenizer?

What does a HtmlTokenizer really do?

What is its utility ?

How can开发者_开发百科 I use it in a C# application ?


It converts HTML elements to tokens, like this:

<div><b>Tekst!</b></div>

This can be converted to something like this:

TOKEN_DIV TOKEN_STRONG TOKEN_STRING TOKEN_END_STRONG TOKEN_END_DIV

With this, you can create a parser that will parse the document.


It parses html and exposes the tags (and their data and attributes), using a 'listener' style interface similar to SAX for XML. Although I beleive there are quite a few different implementations of classes called HTMLTokenize.

The listener style output works by calling methods in the listener when the parser encounters certain elements, for example you may have a startTag(...) method in the listener, and whenever the parser encounters the start of a new tag it will call this method and pass in data for the tag found. Similarly when the end of the tag is encountered it will call a coresponding endTag() method. It is up to the listener to keep track of exactly what and where the parsing is up to, which is why often a parser that simple exposes a DOM tree is easier to use.

If you can provide mode specifics on where this class comes from, more detail can be then provided in an answer.
Also, I am not aware of any C# class libraries that have this class, only Java??


A HTML tokenizer simply breaks a stream of text into tokens, where each token is a string. Normally each string represents either "text", or an HTML element.

You can use it in c# like any other class.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜