Postgresql query optimization practice
Recently, I was develop a RESTful API for all business data in Australia. There are some thoughts I’d like to mark down for notes.
Guides
- Denormalize if necessary
- Understand sql query order
- Small table join big table
Denormalize
When designing models, we tend to normalize the database into a very detailed level because it makes sense or avoids duplication. However, how to normalize the database must depends on how the query is structured.
SQL query order
As far as I am concerned, SQL query order is essential to optimize a sql query.
(8)SELECT (9)DISTINCT (11)<Top Num> <select list>
(1)FROM [left_table]
(3)<join_type> JOIN <right_table>
(2)ON <join_condition>
(4)WHERE <where_condition>
(5)GROUP BY <group_by_list>
(6)WITH <CUBE | RollUP>
(7)HAVING <having_condition>
(10)ORDER BY <order_by_list>
To begin with, every step will produce a virutal table1.
FROM clause
The from clause is the first step of sql query.
- form t1
Will use t1 as vt1
- from t1, t2, ..t
Will cartesian product t1 to tn and produce vt1. Do not do this
- from (nest sql query)
Will use nest sql query as result for vt1.
ON clause
Based on the on clause, filter vt1 as vt2. On clause need to work with join clause.
JOIN clause
Add out fields on vt2 in terms of on clause and produce vt3
WHERE clause
Filter the vt3 according to where clause and produce vt4
GROUP BY clause
Group vt5 by fields in the cause and produce vt5
WITH clause
Will give a sub-query block a name and add to vt5 > vt6
HAVING clause
Work with GROUP BY filter filter vt6 > vt7
SELECT clause
Select fields from vt7 > vt8
DISTINCT clause
Will remove duplicate on vt8 based on fields from distinct clause > vt9
ORDER BY clause
Will sort vt9 according to order by clause > vt10
TOP Clause
Will return vt10 with certain number > vt11
Small table join big table
This is silver bullet of optimization sql query. Usually in the from clause we need to narrow the results by using subqueries then join other tables. The subquery should include limit clause if it is possible or we need to denormalize the database.
Reference
-
Inside Microsoft SQL Server 2005, Logical Query Processing, http://www.polyteknisk.dk/related_materials/9780735623132_chapter_01.pdf ↩