preparsing (x)html while reading from stream
I am building a cgi for an embedded system, and needs the ability to insert system-info into web pages. My plan is to insert into the source xhtml and let the cgi do its magic whenever it sees the FunctionCall "macro". There is not a problem handling this if I can hold the full source xhtml in memory and do regex on it, but I would rather be able to spend less memory and process a stream or 开发者_Python百科chunks while reading. The problem is: I have to be sure that I don't chunk in the middle of the "" or regex won't work. Is there a good alternative to regex, or do you have any thoughts, that might help?
You're correct in wanting an alternative to regex since (X)HTML is not a 'regular' language.
You might benefit from something like one of HTML::Parser's subclasses: HTML::TokeParser, HTML::TokeParser::Simple, HTML::TreeBuilder(::Xpath)?, HTML::TableExtract etc.
精彩评论