Django is a powerful web framework that excels at handling database queries through its ORM (Object-Relational Mapping). However, inefficient queries can result in performance bottlenecks, leading to slower response times. We can significantly improve our application’s scalability and performance by carefully optimizing queries. In this guide, we’ll explore how we can optimize our database queries to boost performance.

Real-World Performance Improvement: Before and After Optimization

At Riseup Labs, we use our internal Employee Management System (EMS) to effectively manage employee information and streamline various HR processes. In our EMS, we initially faced significant performance challenges, including slow page loading times and a large number of database queries. These challenges made the system inefficient and impacted productivity. To address this, we implemented optimization techniques to reduce queries and improve efficiency. After these optimizations, we observed a dramatic reduction in query count, faster load times, and a noticeably smoother user experience. This transformation has greatly improved EMS performance, allowing us to manage our operations more efficiently.

Example: Impact on Performance

Before Optimization: The dashboard page took 5.9 seconds to load, executing 112+ queries for a single request.
After Optimization: With techniques like select_related(), prefetch_related(), and exists(), the load time was reduced to 1.1 seconds with only 14 queries.

In the following sections, we illustrate how these techniques contributed to such optimizations.

Models Used in This Guide

In this guide, we will use the following models.py file as the database model for our examples.

##
# models.py inside of “project” app [project/models.py]

class Campaign(models.Model):
  STATUS_CHOICES = [
      ('active', 'Active'),
      ('inactive', 'Inactive'),
      ('completed', 'Completed')
  ]

  title = models.CharField(max_length=255)
  description = models.TextField(blank=True, null=True)
  status = models.CharField(max_length=20, choices=STATUS_CHOICES, default='active')
   def __str__(self):
      return self.title

class Shift(models.Model):
  campaign = models.ForeignKey(Campaign, on_delete=models.CASCADE,        related_name='shifts')
  title = models.CharField(max_length=255)
  start_time = models.TimeField(null=True, blank=True)
  end_time = models.TimeField(null=True, blank=True)

  def __str__(self):
      return f'{self.title} ({self.start_time} - {self.end_time})'

1. Efficiently Checking If an Instance Exists

Checking for object existence is a common task in Django applications. Performing unnecessary database queries with get(), first(), and last() here can slow things down. Django’s exists() method is optimized for this task because it only checks for the existence of matching records without loading them into memory.

Example:



from django.contrib.auth.models import User


# Checking if a user with a specific email exists
email_exists = User.objects.filter(email='user@risuplabs.com').exists()
if email_exists:
   # Proceed with logic

Raw SQL Output:


SELECT (1) AS  "a"  FROM  "auth_user"  WHERE "auth_user"."email" = 'user@risuplabs.com' LIMIT 1;

This query will stop at the first matching result, reducing unnecessary data retrieval. In contrast, retrieving the object itself might load the entire object into memory, which is wasteful if we only care about its existence.

When dealing with related objects, it’s easy to fall into the N+1 query trap, where additional queries are fired for every related object. Django offers select_related and prefetch_related to handle associated data more efficiently.

select_related() uses SQL joins to retrieve related objects in a single query, making it ideal for ForeignKey or One-to-One relationships.

Example:


# Fetch Shift with their Campaigns in a single query

shifts = Shift.objects.select_related('campaign').all()

for shift in shifts:
   print(shift.title)
   print(shift.campaign.title)

Raw SQL Output:

SELECT "project_shift"."id", "project_shift"."campaign_id", "project_shift"."title", "project_shift"."start_time", "project_shift"."end_time", "project_campaign"."id", "project_campaign"."title", "project_campaign"."description", "project_campaign"."status" FROM "project_shift" INNER JOIN "project_campaign" ON ("project_shift"."campaign_id" = "project_campaign"."id");

prefetch_related() fetches related objects with separate queries and handles the “join” in Python. This is ideal for reverse ForeignKey and Many-to-Many relationships.

Example:

# Get all campaigns with their related shifts using prefetch_related


campaigns = Campaign.objects.prefetch_related('shifts')


for campaign in campaigns:
   print(f"Campaign : {campaign.title}")
   for shift in campaign.shifts.all():
       print(f" Shift : {shift.title}, {shift.start_time} - {shift.end_time}")

Raw SQL Output:

SELECT "project_campaign"."id", "project_campaign"."title", "project_campaign"."description", "project_campaign"."status" 
FROM "project_campaign"
SELECT "project_shift"."id", "project_shift"."campaign_id", "project_shift"."title", "project_shift"."start_time", "project_shift"."end_time" 
FROM "project_shift" WHERE "project_shift"."campaign_id" IN (2, 1);

Here (2,1) defines the campaign Ids. In this case, prefetch_related executes two queries only one for the campaign and one for the shift. But avoids the N+1 query issue.

3. Aggregating Data Efficiently with annotate()

Performing calculations like counts, sums, or averages in Python is inefficient when our database can handle these operations much more efficiently using the annotate().

Example:



# Efficient: Using annotate to count related shifts per campaign


from django.db.models import Count


Campaigns_with_shift_count = Campaign.objects.annotate(total_shifts=Count('shifts')).all()


for campaign in campaigns_with_shift_count:
   print(campaign.title, campaign.total_shifts)

Raw SQL Output:



SELECT "project_campaign"."id", "project_campaign"."title", "project_campaign"."description", "project_campaign"."status",
COUNT("project_shift"."id") AS "total_shifts" 
FROM "project_campaign"
LEFT OUTER JOIN "project_shift" ON ("project_campaign"."id" = "project_shift"."campaign_id")
GROUP BY "project_campaign"."id";

This query retrieves the total number of shifts for each campaign in a single query, avoiding multiple round trips to the database.

4. Fetching Only Required Fields with values() and values_list()

If we don’t need the entire object, it’s better to fetch only the required fields, reducing memory usage and speeding up query performance.

Example:


# Fetch only specific fields: title and status
campaign_data = Campaign.objects.values('title', 'status')
for campaign in campaign_data:
   print(campaign['title'], campaign['status'])

This query retrieves only the title and status fields, minimizing data transfer from the database. Similarly, values_list() can be used to get lists of specific fields:

# Fetch only campaign titles
campaign_titles = Campaign.objects.values_list('title', flat=True)
for title in campaign_titles:
   print("Title is: ", title)

Raw SQL Output:

SELECT "project_campaign"."title", "project_campaign"."status" FROM "project_campaign";

This helps when we only need specific fields, avoiding the overhead of fetching unnecessary data.

5. Counting Instances Efficiently with count()

When counting objects, avoid loading all objects into memory. Just use count().

Example:

# Efficient: Using count() to get the number of users
user_count = User.objects.count()

Raw SQL Output:

SELECT COUNT(*) AS "count" FROM "auth_user";

This query avoids loading all objects into memory, performing the counting directly in the database and reducing query count.

Conclusion

By following the optimization techniques covered in this guide, such as using exists(), select_related(), prefetch_related(), annotate(), values( ) and count( ) we can significantly reduce the number of queries and improve the performance of our Django application. Profiling our queries with tools like silk,django-debug-toolbar etc. will give us insights into our query count and performance bottlenecks, helping us keep our application running efficiently.

This page was last edited on 28 November 2024, at 10:19 am