- Before and After Optimization
- Models Used in This Guide
- 1. Efficiently Checking If an Instance Exists
- 2. Optimizing Query Relationships with select_related And prefetch_related
- 3. Aggregating Data Efficiently with annotate()
- 4. Fetching Only Required Fields with values() and values_list()
- 5. Counting Instances Efficiently with count()
- Conclusion
Django is a powerful web framework that excels at handling database queries through its ORM (Object-Relational Mapping). However, inefficient queries can result in performance bottlenecks, leading to slower response times. We can significantly improve our application’s scalability and performance by carefully optimizing queries. In this guide, we’ll explore how we can optimize our database queries to boost performance.
Real-World Performance Improvement: Before and After Optimization
At Riseup Labs, we use our internal Employee Management System (EMS) to effectively manage employee information and streamline various HR processes. In our EMS, we initially faced significant performance challenges, including slow page loading times and a large number of database queries. These challenges made the system inefficient and impacted productivity. To address this, we implemented optimization techniques to reduce queries and improve efficiency. After these optimizations, we observed a dramatic reduction in query count, faster load times, and a noticeably smoother user experience. This transformation has greatly improved EMS performance, allowing us to manage our operations more efficiently.
Example: Impact on Performance
- Before Optimization: The dashboard page took 5.9 seconds to load, executing 112+ queries for a single request.
- After Optimization: With techniques like select_related(), prefetch_related(), and exists(), the load time was reduced to 1.1 seconds with only 14 queries.
In the following sections, we illustrate how these techniques contributed to such optimizations.
Models Used in This Guide
In this guide, we will use the following models.py file as the database model for our examples.
##
# models.py inside of “project” app [project/models.py]
class Campaign(models.Model):
STATUS_CHOICES = [
('active', 'Active'),
('inactive', 'Inactive'),
('completed', 'Completed')
]
title = models.CharField(max_length=255)
description = models.TextField(blank=True, null=True)
status = models.CharField(max_length=20, choices=STATUS_CHOICES, default='active')
def __str__(self):
return self.title
class Shift(models.Model):
campaign = models.ForeignKey(Campaign, on_delete=models.CASCADE, related_name='shifts')
title = models.CharField(max_length=255)
start_time = models.TimeField(null=True, blank=True)
end_time = models.TimeField(null=True, blank=True)
def __str__(self):
return f'{self.title} ({self.start_time} - {self.end_time})'
1. Efficiently Checking If an Instance Exists
Checking for object existence is a common task in Django applications. Performing unnecessary database queries with get(), first(), and last() here can slow things down. Django’s exists() method is optimized for this task because it only checks for the existence of matching records without loading them into memory.
Example:
from django.contrib.auth.models import User
# Checking if a user with a specific email exists
email_exists = User.objects.filter(email='user@risuplabs.com').exists()
if email_exists:
# Proceed with logic
Raw SQL Output:
SELECT (1) AS "a" FROM "auth_user" WHERE "auth_user"."email" = 'user@risuplabs.com' LIMIT 1;
This query will stop at the first matching result, reducing unnecessary data retrieval. In contrast, retrieving the object itself might load the entire object into memory, which is wasteful if we only care about its existence.
2. Optimizing Query Relationships with select_related And prefetch_related
When dealing with related objects, it’s easy to fall into the N+1 query trap, where additional queries are fired for every related object. Django offers select_related and prefetch_related to handle associated data more efficiently.
select_related() for ForeignKey Relationships:
select_related() uses SQL joins to retrieve related objects in a single query, making it ideal for ForeignKey or One-to-One relationships.
Example:
# Fetch Shift with their Campaigns in a single query
shifts = Shift.objects.select_related('campaign').all()
for shift in shifts:
print(shift.title)
print(shift.campaign.title)
Raw SQL Output:
SELECT "project_shift"."id", "project_shift"."campaign_id", "project_shift"."title", "project_shift"."start_time", "project_shift"."end_time", "project_campaign"."id", "project_campaign"."title", "project_campaign"."description", "project_campaign"."status" FROM "project_shift" INNER JOIN "project_campaign" ON ("project_shift"."campaign_id" = "project_campaign"."id");
prefetch_related() for Reverse ForeignKey and Many-to-Many Relationships:
prefetch_related() fetches related objects with separate queries and handles the “join” in Python. This is ideal for reverse ForeignKey and Many-to-Many relationships.
Example:
# Get all campaigns with their related shifts using prefetch_related
campaigns = Campaign.objects.prefetch_related('shifts')
for campaign in campaigns:
print(f"Campaign : {campaign.title}")
for shift in campaign.shifts.all():
print(f" Shift : {shift.title}, {shift.start_time} - {shift.end_time}")
Raw SQL Output:
SELECT "project_campaign"."id", "project_campaign"."title", "project_campaign"."description", "project_campaign"."status"
FROM "project_campaign"
SELECT "project_shift"."id", "project_shift"."campaign_id", "project_shift"."title", "project_shift"."start_time", "project_shift"."end_time"
FROM "project_shift" WHERE "project_shift"."campaign_id" IN (2, 1);
Here (2,1) defines the campaign Ids. In this case, prefetch_related executes two queries only one for the campaign and one for the shift. But avoids the N+1 query issue.
3. Aggregating Data Efficiently with annotate()
Performing calculations like counts, sums, or averages in Python is inefficient when our database can handle these operations much more efficiently using the annotate().
Example:
# Efficient: Using annotate to count related shifts per campaign
from django.db.models import Count
Campaigns_with_shift_count = Campaign.objects.annotate(total_shifts=Count('shifts')).all()
for campaign in campaigns_with_shift_count:
print(campaign.title, campaign.total_shifts)
Raw SQL Output:
SELECT "project_campaign"."id", "project_campaign"."title", "project_campaign"."description", "project_campaign"."status",
COUNT("project_shift"."id") AS "total_shifts"
FROM "project_campaign"
LEFT OUTER JOIN "project_shift" ON ("project_campaign"."id" = "project_shift"."campaign_id")
GROUP BY "project_campaign"."id";
This query retrieves the total number of shifts for each campaign in a single query, avoiding multiple round trips to the database.
4. Fetching Only Required Fields with values() and values_list()
If we don’t need the entire object, it’s better to fetch only the required fields, reducing memory usage and speeding up query performance.
Example:
# Fetch only specific fields: title and status
campaign_data = Campaign.objects.values('title', 'status')
for campaign in campaign_data:
print(campaign['title'], campaign['status'])
This query retrieves only the title and status fields, minimizing data transfer from the database. Similarly, values_list() can be used to get lists of specific fields:
# Fetch only campaign titles
campaign_titles = Campaign.objects.values_list('title', flat=True)
for title in campaign_titles:
print("Title is: ", title)
Raw SQL Output:
SELECT "project_campaign"."title", "project_campaign"."status" FROM "project_campaign";
This helps when we only need specific fields, avoiding the overhead of fetching unnecessary data.
5. Counting Instances Efficiently with count()
When counting objects, avoid loading all objects into memory. Just use count().
Example:
# Efficient: Using count() to get the number of users
user_count = User.objects.count()
Raw SQL Output:
SELECT COUNT(*) AS "count" FROM "auth_user";
This query avoids loading all objects into memory, performing the counting directly in the database and reducing query count.
Conclusion
By following the optimization techniques covered in this guide, such as using exists(), select_related(), prefetch_related(), annotate(), values( ) and count( ) we can significantly reduce the number of queries and improve the performance of our Django application. Profiling our queries with tools like silk,django-debug-toolbar etc. will give us insights into our query count and performance bottlenecks, helping us keep our application running efficiently.
This page was last edited on 28 November 2024, at 10:19 am
How can we help you?























