SQLAlchemy - only one result being returned when count() says there are more
I'm having trouble with a really large result set only returning one row.
Session.query(TestSet).join(Instance).count()
>> 4283878
Session.query(TestSet).j开发者_如何学运维oin(Instance).offset(0).limit(100).count()
>> 100
Session.query(TestSet).join(Instance).offset(0).limit(100).all()
>> [<model.testset.TestSet object at 0x043EC2F0>]
That is, all
returns only one instance of my model, instead of 100. Now, for something even stranger:
len(Session.query(TestSet).join(Instance).offset(0).limit(100).distinct().all())
>> 100
So if I add distinct
before all
, I get back all 100 results. What's going on here?
The Query
object, when asked to iterate through results representing an entity like TestSet
, performs uniquing on the result rows based on object identity, so that if the query were to return 100 rows each with the same TestSet
primary key, you'd get only one result object back. This behavior has its origins in the "eager joining" feature of Query
, where it's often the case that many result rows are being received each with the same primary identity, but also containing a varying secondary identity of a related row that's to be populated into a collection upon each primary identity - only one instance of the primary identity is desirable in this very common case.
Let's then consider what distinct()
does. Suppose your query for 4M objects returns 1000 rows with id=1, 1000 rows with id=2, etc. The query with limit(100) hits the first 100 rows with id=1, Query
uniquifies, and you get one result object back, since they are all id=1. But with distinct()
, suddenly we are getting 100 rows with distinct identities, i.e. "id=1", "id=2", "id=3". Query
then assigns each of these rows to a new TestSet
object in the identity map, and you get 100 rows back.
Setting echo='debug'
on your Engine
temporarily will show the SQL being emitted as well as the result rows coming back. When you see many result rows all with the same primary key, you know that Query
when asked to return full entities is going to unique all those redundant identities down to the single object represented for each row.
精彩评论