Thursday, April 28, 2016

Lazy Evaluation and SQL Queries in the Django Shell

In Django terms, a QuerySet is an iterable of database records. What's nice about them is that they are evaluated only when you're ready for the results.

This means that even if it takes you a few lines of code to chain multiple queries, the Django ORM combines them into a single query. Less queries mean your database doesn't have to work as hard, and your website runs faster.

Evaluating a QuerySet Repeatedly

Imagine that we work for Häagen-Dazs and have access to their Django shell. We can use this to our advantage by hunting for free ice cream promotions.

Here, we get the active Promo objects. We evaluate the results just to see what promos are available. Then we filter them on the word free.

>>> results =

>>> results
[<Promo: Free Flavors on Your Birthday>, <Promo: 10% Off All Cones>, 
<Promo: Buy 1, Get 1 Free>]

>>> results = results.filter(
>>>     Q(name__istartswith='free') |
>>>     Q(description__icontains='free')
>>> )

>>> results
[<Promo: Free Flavors on Your Birthday>]

The queries generated by the above are:

from django.db import connection

>>> connection.queries
[ {'sql': 'SELECT "flavors_promo"."id", "flavors_promo"."name", 
"flavors_promo"."description", "flavors_promo"."status" FROM 
"flavors_promo" WHERE "flavors_promo"."status" = \'active\' 
LIMIT 21',
  'time': '0.000'},
 {'sql': 'SELECT "flavors_promo"."id", "flavors_promo"."name", 
"flavors_promo"."description", "flavors_promo"."status" FROM 
"flavors_promo" WHERE ("flavors_promo"."status" = \'active\' 
AND ("flavors_promo"."name" LIKE \'free%\' ESCAPE \'\\\' OR 
"flavors_promo"."description" LIKE \'%free%\' ESCAPE \'\\\')) 
LIMIT 21',
  'time': '0.001'}]

There are 2 queries because we evaluated the results twice.

The first query was from the first time we retrieved all the active promos. It's pretty short. It just selects Promo records where promo.status is active.

The second query was from the second time we evaluated results, after we filtered for "free" in the promo names and descriptions.

As a side note, there is a bit of extra work in the second query as the second query still has that WHERE 'flavors_promo'.'status' = 'active' part. One might expect filter() to simply filter on the already-retrieved results rather than hitting the database again. But that's alright because the extra time is negligible.

Before we move on, let's reset the list of queries:

>>> from django.db import reset_queries
>>> reset_queries()

Evaluating a QuerySet Once

Now, let's look at what the queries would be if we only evaluated the results QuerySet once. Let's try building the same QuerySet again. Oh wait, just for fun, let's chain another operation so that we can be really sure that lazy evaluation is happening.

>>> results =

>>> results = results.filter(
...     Q(name__istartswith=name) |
...     Q(description__icontains=name)
... )

>>> results = results.exclude(status='melted')

>>> results
[<Promo: Free Flavors on Your Birthday>]

As you can see, there's only one query:

>>> connection.queries
[{'sql': 'SELECT "flavors_promo"."id", "flavors_promo"."name", 
"flavors_promo"."description", "flavors_promo"."status" FROM 
"flavors_promo" WHERE ("flavors_promo"."status" = \'active\' AND
("flavors_promo"."name" LIKE \'free%\' ESCAPE \'\\\' OR 
"flavors_promo"."description" LIKE \'%free%\' ESCAPE \'\\\') AND 
NOT ("flavors_promo"."status" = \'melted\')) LIMIT 21',
  'time': '0.001'}]

Thanks to lazy evaluation, only one query was constructed, despite chaining multiple operations. That was nice.

Sure, the query could have been more optimal without the AND NOT melted part, but arguably that wasn't Django's fault, it was mine. But it gives me a clue about which operation I didn't need to chain in the Python code.

Next Steps

Try this on one of your projects. Open the Django shell, then try out some queries and see how they are evaluated. In particular, look at queries from one of your slower views.

You can also do similar things with Django Debug Toolbar. However, in the shell you can dissect your Python code line by line, which can be very helpful.