Life rushes back 1 year has passed, remembering the times when the deadline pushed me to code quickly but unexpectedly, it accidentally caused more or less impact on the quality of the whole system later on. One of the problems I used to have that was dragging down the performance of the api was N+1 queries.
1. Why N+1 ?
For example, I want to get list of posts and comment each post by userId.
As a habit, I will write the following handling code:
var result []*Data
posts, err := db.postRepo.GetPostsByUserId(userId)
for _, post := range posts {
data := &Data{}
comment, err := db.commentRepo.GetCommentByPostId(post.Id)
data.Post = post
data.Comment = comment
}
Cost 1 query to get list of posts by userId:
SELECT * FROM posts WHERE user_id = ...
// Result: posts.id [1, 2, 3, ... n]
It takes N query to get the comment corresponding to the post:
SELECT * FROM comments WHERE post_id = 1
SELECT * FROM comments WHERE post_id = 2
SELECT * FROM comments WHERE post_id = 3
...
SELECT * FROM comments WHERE post_id = n
Thus, to get a list of results that meet the above requirements, I had to spend N+1 queries. Calculating roughly if a query takes about 2-3ms, if N is 10k or 100k unlucky, then plowing the database, the time increases linearly like O(n), the system is slow and does not bring a good experience to the user.
2. Solution ?
Gathering from my leaders and past experience of doing wrong, I have 2 ways to overcome the problem of N+1, using: WHERE..IN or JOINS
- Where .. in
var result []*Data
posts, _ := db.postRepo.GetPostsByUserId(userId)
postIds := getPostIds(posts)
comments, err := db.commentRepo.GetCommentsByPostIds(postIds)
Similarly, I also spend 1 query to get the list of posts by userId. But I use Where .. In to get the list of comments but it only costs 1 query
SELECT * FROM posts WHERE user_id = ...
SELECT * FROM comments WHERE post_id in (1, 2, 3, ... n)
- Joins
Where .. in is a simple and effective solution that can overcome N+1, in addition, there is another useful trick that also takes advantage of the database's support, which is to use Join to filter and retrieve data like request that only requires 1 query.
SELECT * FROM posts
JOIN comments on comments.post_id = posts.id
WHERE posts.user_id = ...
At this point, I hope that my little knowledge about the N+1 problem can be shared with everyone on how to improve the system as well as reduce the work for the database. In addition, a tip to help me detect this error in the codebase is that when refactoring, I will note loops, it is usually very easy to set queries to get each item like the example at the beginning of the article I wrote.
The article certainly has many shortcomings, I hope everyone can contribute and share so that I can learn and improve. Thank you so much to everyone who took the time to read my post. If anyone has any other good solutions, please comment below to share with me.