Django syntax highlighting causing character escaping issues
I've been working on my own django based blog (like everyone, I know) to sharpen up my python, and I thought added some syntax highlight would be pretty great. I looked at some of the snippets out there and decided to combine a few and write my own syntax highlighting template filter using Beautiful Soup and Pygments. It looks like this:
from django import template
from BeautifulSoup import BeautifulSoup
import pygments
import pygments.lexers as lexers
import pygments.formatters as formatters
register = template.Library()
@register.filter(name='pygmentize')
def pygmentize(value):
try:
formatter = formatters.HtmlFormatter(style='trac')
tree = BeautifulSoup(value)
for code in tree.findAll('code'):
if not code['class']: code['class'] = 'text'
lexer = lexers.get_lexer_by_name(code['class'])
new_content = pygments.highlight(code.contents[0], lexer, formatter)
new_content += u"<style>%s</style>" % formatter.get_style_defs('.highlight')
code.replaceWith ( "%s\n" % new_content )
开发者_运维百科 content = str(tree)
return content
except KeyError:
return value
It looks for a code block like this and highlights and ads the relevant styles:
<code class="python">
print "Hello World"
</code>
This was all working fine until a block of code I was included had some html in it. Now, I know all the html I need, so I write my blog posts directly in it and when rendering to the template, just mark the post body as safe:
{{ post.body|pygmentize|safe }}
This approach results in any html in a code block just rendering as html (ie, not showing up). I've been playing around with using the django escape function on the code extracted from body by my filter, but I can never quite seem to get it right. I think my understanding of the content escaping just isn't complete enough. I've also tried writing the escaped version in the post body (eg <), but it just comes out as text.
What is the best way to mark the html for display? Am I going about this all wrong?
Thanks.
I've finally found some time to figure it out. When beautiful soup pulls in the content and it contains a tag, the tag is listed as a sub node of a list. This line is the culprit:
new_content = pygments.highlight(code.contents[0], lexer, formatter)
The [0] cuts off the other part of the code, it isn't being decoded incorrectly. Poor bug spotting on my part. That line needs to be replaced with:
new_content = pygments.highlight(code.decodeContents(), lexer, formatter)
The lessons here are make sure you know what the problem is, and know how your libraries work.
精彩评论