Relational databases

Tackling performance problems with more hardware usually just delays problems
Instead of just provisioning bigger database instance, one should try to optimize database schema and queries
- normalizing data is usually the first thing to do and check
  - normal forms
- in schema design, one should avoid data duplication
Primary keys can be also short strings instead of integers
- with this, some information can be retained
PostgreSQL has auto compress feature (TOAST)

Using EXPLAIN

shows the query plan
using EXPLAIN ANALYZE actually executes the statement and run time statistics are added to display

To simplify, indices in tables turn sequential scans to index scans
Most of the regularly used queries should have index
- however the come with a cost
  - especially when writing to a table
What if I have sequential scans even after adding an index
- small table (not lots of rows)
- on multi field indices, fields can be omitted only from the end
- using function in where clause
  - however function can be used in index itself
Postgres specific reasons why indices won't work
- VACUUM hasn't been run, so table statistics are not updated

Usually it's a good thing to write simple code with abstractions
- however, abstractions might have a performance cost
Using ORM is usually recommend, but when performance is critical it could be better to write pure SQL

Let the database do the heavy lifting
- when database works, app sleeps
- the is overhead going back and forth between application and database
Prefer small number of big queries instead of big number of small queries

My own notes and interpretations on my colleagues excellent presentation with title "Your Customer Might Not Need a Bigger Database Instances" (17.01.2025)

Last updated 7 months ago