开发者

google appengine mapper - map over range of dates

I would like to use the appengine mapper to iterate over a range of dates (from-date and to-date passed as properties to the configuration). For each date in the range, I would retrieve the entities that have this date as a property and operate on this set.

For example, if I have the following set of entities:

Key  Date           Value
a    2011/09/09     323
b    2011/09/09     132
c    2011/09/08     354
d    2011/09/08     432
e    2011/09/08     234
f    2011/09/07     423
g    2011/09/07     543

I would like to specify a date range of 2011/09/09 - 2011/09/07 which would create three mapper instances, for 2011/09/09, 2011/09/08 and 2011/09/07. In turn these would query for entities a+b, c+d+e and f+g respectively, and perform some operations on the values. (Each of the mappers would also make other开发者_运维知识库 datastore queries for additional data, hence the 'bonus question' below)

Presumably I need to create a custom InputFormat class, however I'm quite new to mapreduce/hadoop and I was hoping someone had some examples?

Bonus question: is it "bad form" to use a dao to load data in a mapper? Other distributed computing platforms I have worked with (eg DataSynapse) would require that you parcel all inputs up and provide with the task to prevent too much contention on a dataserver. However, with the appengine HR datastore I presume this isn't a concern?


It's not currently possible to iterate over a subset of entities of a given kind in App Engine's mapreduce implementaiton. If the entities make up a large proportion of the data, you can simply iterate over everything and ignore the unwanted entities; if they only make up a small proportion, you will have to roll-your-own update procedure using the task queue.


Based on Nick Johnson answer you will need to retrieve your date range from the context using custom parameters. Then mapper filters out (ignores) entity that falls out of range before processing it.

But if you insist on mapping across all entities of a given kind then there is a workaround solution that depending on your requirements may or may not be feasible. Suppose that you are pretty fixed on the date ranges (sounds unlikely but just maybe). Then for each expected range you create corresponding child entity kind with a parent key (or just a reference but parent key works better for consistency - think transaction across entity group) pointing to the main entity.

Thus each entity from the range receives a child entity of the kind corresponding to this range. Then setup a mapper on the child entity kind corresponding the range and retrieve its parent to work on it.

I do somewhat similar but in opposite direction and for single child entity kind when populating my data for Relation Index Entity pattern. Hence, the answer to your bonus question - go ahead use dao or whatever your data layer consists of.

While first approach is more sound, the latter may be feasible in cases when your ranges are not very dynamic and manageable. Given schema-less nature of the datastore creating new entity kinds is neither expensive nor a bad practice.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜