How to fix unicode issue when using a web service with Python Suds
I am trying to work with the HORRIBLE web services at Commission Junction (CJ). I can get the client to connect and receive information from CJ, but their database seems to include a bunch of bad characters that cause a UnicideDecodeError.
Right now I am doing:
from suds.client import Client
wsdlLink = 'https://link-search.api.cj.com/wsdl/version2/linkSearchServiceV2.wsdl'
client = Client(wsdlLink)
result = client.service.searchLinks(developerKey='XXX', websiteId='XXX', promotionType='coupon开发者_StackOverflow社区')
This works fine until I hit a record that has something like 'CorpNet® 10% Off Any Service' then the ® causes it to break and I get
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 758: ordinal not in range(128)" error.
Is there a way to encode the ® on my end so that it does not break when SUDS reads in the result?
UPDATE: To clarify, the ® is coming from the CJ database and is in their response. SO somehow I need to decode the non-ascii characters BEFORE SUDS deals with the response. I am not sure how (or if) this is done in SUDs.
Implicit UnicodeDecodeErrors is something you get when trying to add str and unicode objects. Python will then try to decode the str into unicode, but using the ASCII encoding. If your str then contains anything that is not ascii, you will get this error.
Your solution is the decode it manually like so:
thestring = thestring.decode('utf8')
Try, as much as possible, to decode any string that may contain non-ascii characters as soo as you are handed it from whatever module you get it from, in this case suds.
Then, if suds can't handle Unicode (which may be the case) make sure you encode it back just before handing the text back to suds (or any other library that breaks if you give it unicode).
That should solve things nicely. It may be a big change, as you need to move all your internal processing from str to unicode, but it's worth it. :)
The "registered" character is U+00AE and is encoded as "\xc2\xae"
in UTF-8. It looks like you have a str object encoded in UTF-8 but some code is doing (probably by default) your_str_object.decode("ascii")
which will fail with the error message you showed.
What you need to do is show us a complete example (i.e. ALL the code necessary to get the error), plus the full error message and traceback, so that at least we can guess whether the problem is in your code or in imported code.
I am using SUDS to interface with Salesforce via their SOAP API. I ran into the same situation until I followed @J.F.Sabastian's advice by not mixing str and unicode string types. For example, passing a SOQL string like this does work with SUDS 0.3.9:
qstr = u"select Id, FirstName, LastName from Contact where FirstName='%s' and LastName='%s'" % (u'Jorge', u'López')
I did not seem to need to do str.decode("utf-8") either.
If you're running your script from PyDev on Eclipse, you might want to go into Project => Properties and under Resource, set "Text File Encoding" to UTF-8, on my Mac, this defaults to "MacRoman". I suppose on Windoze, the default is either Cp1252 or ISO-8859-1 (Latin). You could also set this in your Workspace of your Projects inherit this setting from their workspace. This only effects the program source code.
精彩评论