Python libxml2 parsing xml having Chinese characters

2023-01-26 11:14 问答作者：

i encountered encoding problems when using libxml2 in python to parse Chinese charactors

# coding=utf8
import libxml2

def output(data):
  doc = libxml2.parseMemory(data, len(data))
  ctxt = doc.xpathNewContext()
  res_rslt = ctxt.xpathEval("/r/e/attribute::Name")
  print res_rslt[0]

data =  '''<r><e RoleID="3247" Name="中文"></e></r>'''

o开发者_StackOverflow社区utput(data)

the out put is

Name="&#x4E2D;&#x6587;"

while i'm expecting

Name="中文"

how could i make it?

With lxml, things are easier and they work. It is Pythonic binding for the libxml2 library and works wonderfully.

>>> from lxml import etree
>>> x = etree.fromstring('''<r><e RoleID="3247" Name="中文"></e></r>''')
>>> name = x[0].get('Name')
>>> print name
中文

And yes, XPath is also supported. The documentation is here.

As for your program, have a look at this:

# -*- coding: utf-8 -*-

import libxml2

def output(data):
  doc = libxml2.parseDoc(data)
  ctxt = doc.xpathNewContext()
  res_rslt = ctxt.xpathEval("/r/e/attribute::Name")
  return res_rslt[0]

data =  u'''<?xml version="1.0" encoding="UTF-8"?><r><e RoleID="3247" Name="中文"></e></r>'''.encode("UTF-8")

print output(data)

My answer to these sorts of things always seems to be "use Beautiful Soup". And I always get upvoted for it, too (which shows, I think, that others agree with me that it's good).

>>> from BeautifulSoup import BeautifulSoup
>>> soup = BeautifulSoup(u'''<r><e RoleID="3247" Name="中文"></e></r>''')
>>> print soup.r.e['name']
中文

The thing is that libxml2 is converting those characters into the proper XML entities which for XML is correct. Beautiful Soup doesn't have any such notions of feeling a need to be correct - so it just gives you what you want.

(Note in this case that using either u'...' or '...' will work; I just put it as a unicode because it feels better that way - whatever you do, Beautiful Soup gives you Unicode.)

继续阅读：encoding libxml2 python utf-8

Python libxml2 parsing xml having Chinese characters

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？