Building a markup parser in php
I have created a very simple markup parser in php. However, it currently uses str_replace to switch between markup and html. How can I make a "code" box of sorts (will eventually use GeSHI) that has the contents untouched?
Right now, the following markup: [code][b]Some bold text[/b][/code]
winds up parsing as the code box with <b>Some bold text</b>
.
I need some advice, which option is best?
- Have it check each word individually, and if it is not inside a [code] box it should parse
- Leave it as is, let users be unable to post markup inside of [code].
- Create another type of code box specifically for HTML markup, have [code开发者_开发问答] autorevert any < or > to [ and ].
Is there maybe even another option? This is a bit tougher than I thought it would be...
EDIT: Is it even worth adding a code box type thing to this parser? I mean, I see how it could be useful, but it is a rather large amount of effort for a small result.
Why would you reinvent the wheel?
There's plenty of markup parsers already.
Anyway, just str_replace won't help much. You'd have to learn regular expressions and as they say, now you've got two problems ;)
You could break it down into multiple strings for the purposes of using the str_replace. Split the strings on the [code] and [/code] tags - saving the code box in a separate string. Make note of where it went in the original string somehow. Then use str_replace on the original string and do whatever parsing you like on the code box string. Finally reinsert the parsed code boxes and display.
Just a word of warning though, turning input into html for display strikes me as inherently dangerous. I'd recommend a large amount of input sanitization and checking before converting to html for redisplay.
HTML beautifier is pretty sweet. http://pear.php.net/package/PHP_Beautifier . The have a decorator class as well that would probably suit your needs.
To be clear, your problem is in two parts. The first part is the need for a lexical analyzer to break your "code" into the keywords for your "language." Once you have a lexical analyzer, you then need a parser. A parser is code that accepts the keywords for your language one-at-a-time in a logical (usually recursive-descent way) manner.
精彩评论