skip to Main Content

Optimize Django memory usage

Django memory usage is usually quote good, but sometimes – if you use the Django ORM without really knowing what is doing behind the scenes – you can see a huge spike in RAM usage. Fortunately there exist some simple methods to optimize Django memory usage.

1. The problem

Consider this is the apparently innocent view

as you can see it’s very simple, the peculiarity here is the dimension of the queryset, because the table contains about 100k records, and each record has several ForeignKey fields to other models.

An experienced Django developer should immediately see the problem, but it can be interesting to analyze the problem a little bit more.

If you are impatient, go directly to my solution.

2. A bit of Django memory profiling

When you want to profile memory usage in Python you’ll find some useful tools. After reading this good article I choose to use objgraph. The only drawback is that objgraph is designed to work in a python console, while my code is running in a Django powered website.

So I put together some code to redirect the standard output used by objgraph to my beloved Django logging system, and here is the result.

In the Django log I saw something like this (I cut from the log some internal objects not so interesting in this example):

[24/09/2014 10:35:31] DEBUG [leaky_app.views:31] dict 106524 +106524
[24/09/2014 10:35:31] DEBUG [leaky_app.views:31] ModelState 101328 +101328
[24/09/2014 10:35:31] DEBUG [leaky_app.views:31] FirstModel 98327 +98327

[24/09/2014 10:36:23] DEBUG [leaky_app.views:31] dict 109526 +3002
[24/09/2014 10:36:23] DEBUG [leaky_app.views:31] ModelState 104329 +3001
[24/09/2014 10:36:23] DEBUG [leaky_app.views:31] SecondModel 1999 +1000
[24/09/2014 10:36:23] DEBUG [leaky_app.views:31] ThirdModel 1999 +1000
[24/09/2014 10:36:23] DEBUG [leaky_app.views:31] FourthModel 2000 +1000

[24/09/2014 10:37:17] DEBUG [leaky_app.views:31] dict 112874 +3348
[24/09/2014 10:37:17] DEBUG [leaky_app.views:31] ModelState 107330 +3001
[24/09/2014 10:37:17] DEBUG [leaky_app.views:31] FourthModel 3000 +1000
[24/09/2014 10:37:17] DEBUG [leaky_app.views:31] SecondModel 2999 +1000
[24/09/2014 10:37:17] DEBUG [leaky_app.views:31] ThirdModel 2999 +1000

FirstModel has some ForeignKey fields to SecondModel, ThirdModel and so on. That means that every 1000 row three different objects are loaded in memory. But why Django is putting all those objects in memory? After all I want only to write a record in my dumb.dump file.

It turns out that Django caches the results of each queryset, when you iterate over it. Each and every object in the queryset will be saved in memory. This lets the ORM to access the queryset again very efficiently, without the need to hit the database again.

3. Optimize Django memory usage: using iterator() method

You can avoid caching and simply iterate over a queryset using the iterator method on the queryset. The Django documentation clearly states:

“For a QuerySet which returns a large number of objects that you only need to access once, this can result in better performance and a significant reduction in memory.”

So the first solution to the problem is simply adding .iterator() to the iterated queryset, like this:

4. Optimize Django memory usage: using pagination

Even using iterator() the data is first fetched on the client side by the database driver, occupying a lot of memory. This happens because Django doesn’t yet support server side cursors.

A workaround for this issue is using an utility function like this:

You can use simply by passing a normal Django queryset as the single parameter, like this:

for obj in queryset_iterator(HugeQueryset.objects.all()):
   # do something with obj
   pass

This is useful when you want to delete an high number of objects that have a ForeignKey pointing to them. In that case using the Queryset .delete() method doesn’t help because Django has to fetch each object in memory to handle the deletion cascade policy given for the objects being deleted. You can do something like this to mitigate that effect:

for obj in queryset_iterator(HugeQueryset.objects.all()):
   obj.delete()

5. Conclusion

In this tutorial you learned how to optimize Django memory usage, especially when dealing with large querysets. Django ORM is really powerful and the abstraction it provides will let you write complex queries easily. But you have to be careful when using the ORM with large querysets because the effects on memory and CPU usage can be large as well.

augusto

Freelance developer and sysadmin

This Post Has 3 Comments
  1. Hi, regarding last (ie. delete an high number of objects), it is better to use HugeQueryset.objects.filter(pk__in=idsicollectedinlist).delete()

    1. Beware that if idsicollectedinlist is a big list of ids, and HugeQueryset objects have foreign keys pointing to them, a big amount of memory will be allocated to collect the objects that should be deleted in a cascade. The proposed method is not efficient in terms of database queries executed, but it is in terms of memory allocated.

  2. Oh my! I love you. Thanks! This solved a RAM leak issue I was having with one of my queries.

    Django just decided to cache it EVERY SINGLE TIME it was executed. And it was a rather big query so my server was running out of ram rapidly.

    Using .iterator() has resolved it!

Leave a Reply

Your email address will not be published. Required fields are marked *

Back To Top