skip to Main Content

Optimize Django memory usage

Recently I had a problem of memory usage in Django: when I accessed an apparently innocent view I saw the memory usage of my server grow without rest. The problem turned out to be very trivial to solve, but I think the process I used to find the leak is worth a blog post. 😉

This is the apparently innocent view

as you can see it’s very simple, the peculiarity here is the dimension of the queryset, because the table contains about 100k records, and each record has several ForeignKey fields to other models. An experienced Django developer should immediately see the problem, but I’m not one of them ;), so I had to investigate.

If you are impatient, go directly to my solution and have a good time! 🙂

When you want to profile memory usage in Python you’ll find some useful tools, after reading this good article I choose to use objgraph. But there is a problem: objgraph is designed to work in a python console, while my code is running in a Django powered website.

So I put together some code to redirect the standard output used by objgraph to my beloved Django logging system, and here is the result.

In the Django log I saw something like this (I cut from the log some internal objects not so interesting in this example):

[24/09/2014 10:35:31] DEBUG [leaky_app.views:31] dict 106524 +106524
[24/09/2014 10:35:31] DEBUG [leaky_app.views:31] ModelState 101328 +101328
[24/09/2014 10:35:31] DEBUG [leaky_app.views:31] FirstModel 98327 +98327

[24/09/2014 10:36:23] DEBUG [leaky_app.views:31] dict 109526 +3002
[24/09/2014 10:36:23] DEBUG [leaky_app.views:31] ModelState 104329 +3001
[24/09/2014 10:36:23] DEBUG [leaky_app.views:31] SecondModel 1999 +1000
[24/09/2014 10:36:23] DEBUG [leaky_app.views:31] ThirdModel 1999 +1000
[24/09/2014 10:36:23] DEBUG [leaky_app.views:31] FourthModel 2000 +1000

[24/09/2014 10:37:17] DEBUG [leaky_app.views:31] dict 112874 +3348
[24/09/2014 10:37:17] DEBUG [leaky_app.views:31] ModelState 107330 +3001
[24/09/2014 10:37:17] DEBUG [leaky_app.views:31] FourthModel 3000 +1000
[24/09/2014 10:37:17] DEBUG [leaky_app.views:31] SecondModel 2999 +1000
[24/09/2014 10:37:17] DEBUG [leaky_app.views:31] ThirdModel 2999 +1000

FirstModel has some ForeignKey fields to SecondModel, ThirdModel and so on, so every 1000 row each of them appears in memory. But why Django is putting all those objects in memory? After all I want only to write a record in my dumb.dump file.

It turns out that Django caches the results of each queryset, when you iterate over it.

That is the default behavior, but you can avoid caching and simply iterate over a queryset using the iterator method on the queryset. The Django documentation clearly states:

“For a QuerySet which returns a large number of objects that you only need to access once, this can result in better performance and a significant reduction in memory.”

So the solution to my problem was simply adding .iterator() to my queryset, like this:

Fortunately that was a trivial fix, but it was funny to discover the issue and fix it! 😉

Update:

It happened to me that even using .iterator() the data was first fetched on the client side (see python process memory) by the database driver, occupying a lot of memory. This happens because Django doesn’t yet support server side cursors.

A workaround for this issue is using an utility function like this:

You can use simply by passing a normal Django queryset as the single parameter, like this:

for obj in queryset_iterator(HugeQueryset.objects.all()):
   # do something with obj
   pass

This can be useful even when you want to delete an high number of objects that have a ForeignKey pointing to them. In that case using the Queryset .delete() method doesn’t help because Django has to fetch each object in memory to handle the deletion cascade policy given for the objects being deleted. You can do something like this to mitigate that effect:

for obj in queryset_iterator(HugeQueryset.objects.all()):
   obj.delete()

Happy coding! 😉

augusto

Freelance developer and sysadmin

This Post Has 3 Comments
  1. Hi, regarding last (ie. delete an high number of objects), it is better to use HugeQueryset.objects.filter(pk__in=idsicollectedinlist).delete()

    1. Beware that if idsicollectedinlist is a big list of ids, and HugeQueryset objects have foreign keys pointing to them, a big amount of memory will be allocated to collect the objects that should be deleted in a cascade. The proposed method is not efficient in terms of database queries executed, but it is in terms of memory allocated.

  2. Oh my! I love you. Thanks! This solved a RAM leak issue I was having with one of my queries.

    Django just decided to cache it EVERY SINGLE TIME it was executed. And it was a rather big query so my server was running out of ram rapidly.

    Using .iterator() has resolved it!

Leave a Reply

Your email address will not be published. Required fields are marked *

Back To Top