Optimize Django memory usage


Fri 27 March 2020

Django memory usage is usually quote good, but sometimes – if you use the Django ORM without really knowing what is doing behind the scenes – you can see a huge spike in RAM usage. Fortunately there exist some simple methods to optimize Django memory usage.

1. The problem

Consider this is the apparently innocent view

from django.http import HttpResponse

from .models import FirstModel

# this view will make crazy use of the RAM ;)
def my_view(request):
    # this queryset contains about 100k records
    # each of them has many ForeignKeys to other models
    huge_queryset = FirstModel.objects.all()

    f = open('dumb.dump', 'w')

    for record in huge_queryset:
        print >>f, record

    f.close()
    return HttpResponse('Dumb dump completed!')

as you can see it’s very simple, the peculiarity here is the dimension of the queryset, because the table contains about 100k records, and each record has several ForeignKey fields to other models.

An experienced Django developer should immediately see the problem, but it can be interesting to analyze the problem a little bit more.

If you are impatient, go directly to my solution.

2. A bit of Django memory profiling

When you want to profile memory usage in Python you’ll find some useful tools. I choose to use objgraph. The only drawback is that objgraph is designed to work in a python console, while my code is running in a Django powered website.

So I put together some code to redirect the standard output used by objgraph to my beloved Django logging system, and here is the result.

import objgraph
import sys
import logging

from django.http import HttpResponse

from .models import FirstModel

logger = logging.getLogger(__name__)

class LoggerWriter:
    def __init__(self, logger):
        self.logger = logger

    def write(self, message):
        if message != '\n':
            self.logger.debug(message)

def my_view(request):
    sys.stdout = LoggerWriter(logger)

    huge_queryset = FirstModel.objects.all()

    f = open('dumb.dump', 'w')

    row = 0
    for record in huge_queryset:
        print >>f, record
        row += 1
        if not (row % 1000):
            objgraph.show_growth()

    f.close()
    return HttpResponse('Dumb dump completed!')

In the Django log I saw something like this (I cut from the log some internal objects not so interesting in this example):

[24/09/2014 10:35:31] DEBUG [leaky_app.views:31] dict 106524 +106524
[24/09/2014 10:35:31] DEBUG [leaky_app.views:31] ModelState 101328 +101328
[24/09/2014 10:35:31] DEBUG [leaky_app.views:31] FirstModel 98327 +98327

[24/09/2014 10:36:23] DEBUG [leaky_app.views:31] dict 109526 +3002
[24/09/2014 10:36:23] DEBUG [leaky_app.views:31] ModelState 104329 +3001
[24/09/2014 10:36:23] DEBUG [leaky_app.views:31] SecondModel 1999 +1000
[24/09/2014 10:36:23] DEBUG [leaky_app.views:31] ThirdModel 1999 +1000
[24/09/2014 10:36:23] DEBUG [leaky_app.views:31] FourthModel 2000 +1000

[24/09/2014 10:37:17] DEBUG [leaky_app.views:31] dict 112874 +3348
[24/09/2014 10:37:17] DEBUG [leaky_app.views:31] ModelState 107330 +3001
[24/09/2014 10:37:17] DEBUG [leaky_app.views:31] FourthModel 3000 +1000
[24/09/2014 10:37:17] DEBUG [leaky_app.views:31] SecondModel 2999 +1000
[24/09/2014 10:37:17] DEBUG [leaky_app.views:31] ThirdModel 2999 +1000

FirstModel has some ForeignKey fields to SecondModel, ThirdModel and so on. That means that every 1000 row three different objects are loaded in memory. But why Django is putting all those objects in memory? After all I want only to write a record in my dumb.dump file.

It turns out that Django caches the results of each queryset, when you iterate over it. Each and every object in the queryset will be saved in memory. This lets the ORM to access the queryset again very efficiently, without the need to hit the database again.

3. Optimize Django memory usage: using iterator() method

You can avoid caching and simply iterate over a queryset using the iterator method on the queryset. The Django documentation clearly states:

“For a QuerySet which returns a large number of objects that you only need to access once, this can result in better performance and a significant reduction in memory.”

So the first solution to the problem is simply adding .iterator() to the iterated queryset, like this:

from django.http import HttpResponse

from .models import FirstModel

def my_view(request):
    # this queryset contains about 100k records
    # each of them has many ForeignKeys to other models
    huge_queryset = FirstModel.objects.all().iterator()

    f = open('dumb.dump', 'w')

    for record in huge_queryset:
        print >>f, record

    f.close()
    return HttpResponse('Dumb dump completed!')

4. Optimize Django memory usage: using pagination

Even using iterator() the data is first fetched on the client side by the database driver, occupying a lot of memory. This happens because Django doesn’t yet support server side cursors.

A workaround for this issue is using an utility function like this:

import gc

def queryset_iterator(qs, batchsize = 500, gc_collect = True):
    iterator = qs.values_list('pk', flat=True).order_by('pk').distinct().iterator()
    eof = False
    while not eof:
        primary_key_buffer = []
        try:
            while len(primary_key_buffer) < batchsize:
                primary_key_buffer.append(iterator.next())
        except StopIteration:
            eof = True
        for obj in qs.filter(pk__in=primary_key_buffer).order_by('pk').iterator():
            yield obj
        if gc_collect:
            gc.collect()

You can use simply by passing a normal Django queryset as the single parameter, like this:

for obj in queryset_iterator(HugeQueryset.objects.all()):
   # do something with obj
   pass

This is useful when you want to delete an high number of objects that have a ForeignKey pointing to them. In that case using the Queryset .delete() method doesn’t help because Django has to fetch each object in memory to handle the deletion cascade policy given for the objects being deleted. You can do something like this to mitigate that effect:

for obj in queryset_iterator(HugeQueryset.objects.all()):
   obj.delete()

5. Conclusion

In this tutorial you learned how to optimize Django memory usage, especially when dealing with large querysets. Django ORM is really powerful and the abstraction it provides will let you write complex queries easily. But you have to be careful when using the ORM with large querysets because the effects on memory and CPU usage can be large as well.


Share: