开发者

preparsing (x)html while reading from stream

I am building a cgi for an embedded system, and needs the ability to insert system-info into web pages. My plan is to insert into the source xhtml and let the cgi do its magic whenever it sees the FunctionCall "macro". There is not a problem handling this if I can hold the full source xhtml in memory and do regex on it, but I would rather be able to spend less memory and process a stream or 开发者_Python百科chunks while reading. The problem is: I have to be sure that I don't chunk in the middle of the "" or regex won't work. Is there a good alternative to regex, or do you have any thoughts, that might help?


You're correct in wanting an alternative to regex since (X)HTML is not a 'regular' language.

You might benefit from something like one of HTML::Parser's subclasses: HTML::TokeParser, HTML::TokeParser::Simple, HTML::TreeBuilder(::Xpath)?, HTML::TableExtract etc.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜