converting a list of unicode character into a Hebrew string in python
Following this thread solution, I have managed to get a bunch of lists that each looks like:
[u'\u05ea\u05d0\u05de\u05d9\u05df \u05dc\u05d9']
I assume tha开发者_开发知识库t those are unicode character but for some reason, I can't convert them back into Hebrew.
I tried the suggested solution in the comments in the link. I also tried to use ''.join
but it didn't work. The error I get is:
Error Type: exceptions.UnicodeEncodeError 22:42:15 T:2806414192
M:2425589760 ERROR: Error Contents: 'ascii' codec can't encode characters in position 0-4: ordinal not in range(128)
I tried to wrap stuff in unicode()
but all I got is the same as the example above.
How do I achieve that?
Note:
I am trying to parse this link.Edit:
I am trying to convert the list into string usingjoin
and then print it. Here is the relevant pice of code:
soup = BeautifulStoneSoup(link, convertEntities=BeautifulStoneSoup.XML_ENTITIES)
programs = soup('ul')
for i,prog in enumerate(programs):
if i==(4+getLetterValue(name)):
j = 0
while j < len(prog('li')):
li = prog('li')[j]
link = li('a')[0]
url = link['href']
text = link.contents
print ''.join(text)
link
is a string. and getLetterValue(name)
returns an integer which tells what is the position in the html document.
This is a unicode
string, it is in Hebrew and you can even print it directly on a Python interactive shell. e.g.:
>>> print u'\u05ea\u05d0\u05de\u05d9\u05df \u05dc\u05d9'
תאמין לי
If you really need to convert it to a raw string of bytes (a str
object) for some reason, you have to specify the encoding of the byte string because text can represented in many different encodings.
Short answer: assuming you want to use UTF-8 to encode the text, you can use:
your_unicode_text.encode('utf-8')
If you are going to use a different encoding, just change the encoding name above.
For a reference on how Python deals with Unicode text and common problems, see: http://docs.python.org/howto/unicode.html
See also this answer for another short explanation of Unicode and string encodings.
精彩评论