Subclassing db.TextProperty for storing python dict as JSON and setting default encoding to anything but ASCII
Using Google App Engine (python SDK), I created a custom JSONProperty() as a subclass of db.TextProperty(). My goal is to store a python dict on the fly as JSON and retrieve it easily. I followed various examples found via Google and setting up the custom Property class and methods is pretty easy.
However, some of my dict values (strings) are encoded in utf-8. When saving the model into the datastore, I get a dreaded Unicode error (for datastore text property default encoding is ASCII). Subclassing db.BlobProperty didn't solve the issue.
Basically, my code does the following thing : store Resource entities into the datastore (with URL as a StringProperty and POST/GET payloads stored in a dict as a JSONProperty), fetch them later (code not included). I choose not to use pickle for storing payloads because I'm a JSON freak and have no use storing objects.
Custom JSONProperty :
class JSONProperty(db.TextProperty):
def get_value_for_datastore(self, model_instance):
value = super(JSONProperty, self).get_value_for_datastore(model_instance)
return json.dumps(value)
def make_value_from_datastore(self, value):
if value is None:
return {}
if isinstance(value, basest开发者_Go百科ring):
return json.loads(value)
return value
Putting model into datastore :
res = Resource()
res.init_payloads()
res.url = "http://www.somesite.com/someform/"
res.param = { 'name': "SomeField", 'default': u"éàôfoobarç" }
res.put()
This will throw a UnicodeDecodeError related to ASCII encoding. Maybe it's worth noting that I only get this error (everytime) on production server. I'm using python 2.5.2 on dev.
Traceback (most recent call last): File "/base/data/home/apps/delpythian/1.350065314722833389/core/handlers/ResetHandler.py", line 68, in _res_one return res_one.put() File "/base/python_runtime/python_lib/versions/1/google/appengine/ext/db/init.py", line 984, in put return datastore.Put(self._entity, config=config) File "/base/python_runtime/python_lib/versions/1/google/appengine/api/datastore.py", line 455, in Put return _GetConnection().async_put(config, entities, extra_hook).get_result() File "/base/python_runtime/python_lib/versions/1/google/appengine/datastore/datastore_rpc.py", line 1219, in async_put for pbs in pbsgen: File "/base/python_runtime/python_lib/versions/1/google/appengine/datastore/datastore_rpc.py", line 1070, in __generate_pb_lists pb = value_to_pb(value) File "/base/python_runtime/python_lib/versions/1/google/appengine/api/datastore.py", line 239, in entity_to_pb return entity._ToPb() File "/base/python_runtime/python_lib/versions/1/google/appengine/api/datastore.py", line 841, in _ToPb properties = datastore_types.ToPropertyPb(name, values) File "/base/python_runtime/python_lib/versions/1/google/appengine/api/datastore_types.py", line 1672, in ToPropertyPb pbvalue = pack_prop(name, v, pb.mutable_value()) File "/base/python_runtime/python_lib/versions/1/google/appengine/api/datastore_types.py", line 1485, in PackString pbvalue.set_stringvalue(unicode(value).encode('utf-8')) UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 32: ordinal not in range(128)
My question is the following : is there a way to subclass a db.TextProperty() class and set/enforce a custom encoding ? Or am I doing something wrong ? I try to avoid using str() and follow the "Decode early, Unicode everywhere, encode late" rule.
Update : added code and stacktrace.
Here's a minimal example of moving a unicode string from a dictionary to a serialized JSON string to a TextProperty:
class Thing(db.Model):
json = db.TextProperty()
class MainHandler(webapp.RequestHandler):
def get(self):
data = {'word': u"r\xe9sum\xe9"}
json = simplejson.dumps(data, ensure_ascii=False)
Thing(json=json).put()
This works for me in both dev and prod.
Looking at the line: PackString pbvalue.set_stringvalue(unicode(value).encode('utf-8')) UnicodeDecodeError: 'ascii'
it seems that appengine expects all string values to be unicode. the call unicode(value) doesn't specify an encoding so will probably default to ascii unless value is already a unicode, eg:
>>> u = u"ąęćźż"
>>> s = u.encode('utf-8')
>>> unicode(u) # fine
>>> unicode(s, 'utf-8') # fine
>>> unicode(s) # blows up (try's ascii) (on my interpreter)
json.dumps will encode a utf-8 string (by default) and that's why unicode can't handle it.
try this:
>>> return unicode(json.dumps(...), 'utf-8')
and you should be fine.
As for why appengine blows up and your interpreter is fine, my guess would be some local settings, docstring for unicode says it defaults to the current default encoding, which aparently is utf-8 for you and ascii for gae.
精彩评论