开发者

converting native SQL to SQLAlchemy

I've got the following problen: I know SQL and I don't know how to work with SQLAlchemy but I ne开发者_开发知识库ed to change it in 1 place in the project that I've inherited. So, I've got this:

ModelCategories = request.sa.query(
    Model.category_id
    , Category.name
    , Category.alias).distinct().join(Category).order_by(Category.alias
    , Category.name )

And it generates a rather slow request:

SELECT DISTINCT 
  model.category_id AS model_category_id
  , category.name AS category_name
  , category.alias AS category_alias 
FROM model 
JOIN category ON category.id = model.category_id 
ORDER BY category.alias, category.name

And I need to change it with this:

SELECT 
  model.category_id AS model_category_id
  , category.name AS category_name
  , category.alias AS category_alias 
FROM ( SELECT DISTINCT model_category_id ) as model 
JOIN category ON category.id = model.category_id 
ORDER BY category.alias, category.name

But in terms of SQLAlchemy as is the first request.


First of all check the SQL execution plan. If you have an index on the model.category_id column, the query should not really be slow. Otherwise, following options are available:

Option-1: almost your current solution

# python
ModelCategories = session.query(Category).distinct().join(Model).order_by(Category.alias, Category.name)

# SQL
SELECT DISTINCT category.id AS category_id, category.name AS category_name, category.alias AS category_alias 
FROM category 
JOIN model ON category.id = model.category_id 
ORDER BY category.alias, category.name

This is like your current solution but somewhat cleaner in my view. I assume the performance issue might come from the fact that all table Model is used in the query, and this is also why you need to use distinct.

Option-2: use any() on relationship

# python (assumption: model mapping has relationship defined between Category and Model
mapper(Category, category_table, properties={
    'models': relationship(Model, backref="category") })

# python
ModelCategories = session.query(Category).filter(Category.models.any()).order_by(Category.alias, Category.name)

# SQL
SELECT category.id AS category_id, category.name AS category_name, category.alias AS category_alias 
FROM category 
WHERE EXISTS (SELECT 1 
    FROM model 
    WHERE category.id = model.category_id)
ORDER BY category.alias, category.name

This should boost your performance already. I prefer this to following option-3 as it is again more clean code

Option-3: use subquery

# python
q = select([Model.category_id]).distinct().alias("subq")
ModelCategories = session.query(Category).join(q, Category.id==q.c.category_id)

# SQL
SELECT category.id AS category_id, category.name AS category_name, category.alias AS category_alias 
FROM category
JOIN (SELECT DISTINCT model.category_id AS category_id FROM model) AS subq 
  ON category.id = subq.category_id
ORDER BY category.alias, category.name

This should give you exactly the SQL you asked for. As mentioned, I personally prefer version-2.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜