Inequality Filters on a date and a number

2023-02-07 17:44 问答作者：

Am trying to query my Google App Engine datastore [Python], which has a item_name, manufacturing_date and number_of_items_shipped. There are ~1.0 million records in the datastore and ever increasing.

The scenario: Get all the item_names which have been shipped more than x_items [user input] and manufactured after some_date [user input]. Basically, kind of an inventory check.

Effectively 2 inequalities on properties. But due to restrictions on queries in GAE, am not able to do this.

Searched SO for this issue. But, no luck till now. Did you come across this issue? If so, were you able to resolve this? Please let me know.

Also in Google I/O 2010, Next Gen Queries开发者_开发百科, Alfred Fuller mentioned that they are going to remove this restriction soon. Its been more than 8 months, but this restriction is in place even now. Unfortunately.

Appreciate if anyone can post an answer if they were able to circumvent this restriciton.

Thanks a lot.

Building on Sudhir's answer, I'd probably assign each record to a manufacture date "bucket", based on the granularity you care about. If your range of manufacturing dates is over a couple of years, use monthly buckets for example. If your range is just in the last year, weekly.

Now when you want to find records with > n sales and manufacturing date in a given range, do your query once per bucket in that range, and postfilter out the items you are not interested in.

For example (totally untested):

BUCKET_SIZE_DAYS = 10

def put(self):
    self.manufacture_bucket = int(self.manufacture_date.toordinal() / BUCKET_SIZE_DAYS)
    super(self.__class__, self).put()

def filter_date_after(self, date_start):
    first_bucket = int(date_start.toordinal() / BUCKET_SIZE_DAYS)
    last_bucket = int(datetime.datetime.today().toordinal() / BUCKET_SIZE_DAYS)

    for this_bucket in range(first_bucket, last_bucket+1):
        for found in self.filter("manufacture_bucket =", this_bucket):
            if found.manufacture_date >= date_start:
                yield found

You should be then able to use this like:

widgets.filter("sold >", 7).filter_date_after(datetime.datetime(2010,11,21))

Left as an exercise for the reader:

Making it play nicely with other filters added to the end
Multiple bucket sizes allowing you to always query ln(days in date range) buckets.

Unfortunately, you can't circumvent this restriction, but I can help you model the data in a slightly different way.

First off, Bigtable is suited to very fast reads off large databases - the kind you do when have a million people hitting your app at the same time. What you're trying to do here is a report on historical data. While I would recommend moving the reporting to a RDBMS, there is a way you can do it on Bigtable.

First, override the put() method on your item model to split the date before saving it. What you would do is something like

def put(self):
  self.manufacture_day = self.manufacture_date.day
  self.manufacture_month = self.manufacture_date.month
  self.manufacture_year = self.manufacture_date.year
  super(self.__class__, self).put()

You can do this to any level of granularity you want, even hours, minutes, seconds, whatever.

You can apply this retroactively to your database by just loading and saving your item entities. The mapper is very convenient for this.

Then change your query to use the inequality only on the item count, and select the days / months / years you want using normal equalities. You can do ranges by either firing multiple queries or using the IN clause. (Which does the same thing anyway).

This does seem contrived and tough to do, but keep in mind that your reports will run almost instantaneously if you do this, even when millions of people try to run them at the same time. You might not need this kind of scale, but well... that's what you get :D

继续阅读：date google-app-engine inventory

Inequality Filters on a date and a number

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？