开发者

Building a markup parser in php

I have created a very simple markup parser in php. However, it currently uses str_replace to switch between markup and html. How can I make a "code" box of sorts (will eventually use GeSHI) that has the contents untouched?

Right now, the following markup: [code][b]Some bold text[/b][/code] winds up parsing as the code box with <b>Some bold text</b>.

I need some advice, which option is best?

  • Have it check each word individually, and if it is not inside a [code] box it should parse
  • Leave it as is, let users be unable to post markup inside of [code].
  • Create another type of code box specifically for HTML markup, have [code开发者_开发问答] autorevert any < or > to [ and ].

Is there maybe even another option? This is a bit tougher than I thought it would be...

EDIT: Is it even worth adding a code box type thing to this parser? I mean, I see how it could be useful, but it is a rather large amount of effort for a small result.


Why would you reinvent the wheel?

There's plenty of markup parsers already.

Anyway, just str_replace won't help much. You'd have to learn regular expressions and as they say, now you've got two problems ;)


You could break it down into multiple strings for the purposes of using the str_replace. Split the strings on the [code] and [/code] tags - saving the code box in a separate string. Make note of where it went in the original string somehow. Then use str_replace on the original string and do whatever parsing you like on the code box string. Finally reinsert the parsed code boxes and display.

Just a word of warning though, turning input into html for display strikes me as inherently dangerous. I'd recommend a large amount of input sanitization and checking before converting to html for redisplay.


HTML beautifier is pretty sweet. http://pear.php.net/package/PHP_Beautifier . The have a decorator class as well that would probably suit your needs.


To be clear, your problem is in two parts. The first part is the need for a lexical analyzer to break your "code" into the keywords for your "language." Once you have a lexical analyzer, you then need a parser. A parser is code that accepts the keywords for your language one-at-a-time in a logical (usually recursive-descent way) manner.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜