PostgreSQL performance tuning

Imran Ahmad

4 min readSep 10, 2022

Understanding postgres vacuum.

👋 Say hi.. to me on Twitter.

There are many mechanism to fine tune the postgreSQL, vacuum is one the mechanism.

The problem:

In PostgreSQL, whenever rows in a table are deleted, The existing row or tuple is marked as dead, that is the rows are not physically removed.
Similarly during an update, the postgreSQL marks the corresponding existing tuples as dead and inserts a new tuples.
So technically UPDATE operations is equivalent to DELETE + INSERT
These dead tuples consume unnecessary storage and eventually you have a bloated PostgreSQL database.

I don’t trust you, show it to me!

Below are the commands to setup the demo.

-- Create a demo database and a tables named users
create database demo;
CREATE DATABASEpostgres=# \c demo;
You are now connected to database “demo” as user “imranahmad”.demo=# create table users (id SERIAL) with (autovacuum_enabled = off);
CREATE TABLE-- Insert 100000 records into the table
demo=# insert into users select * from generate_series(1,100000);
INSERT 0 100000-- Check the size of table
demo=# select pg_size_pretty(pg_relation_size(‘users’));
 pg_size_pretty
 — — — — — — — — 
 3544 kB
(1 row)

After inserting 100000 records the table size is 3544 kB.
Lets update all the columns and check the size again.

demo=# update users set id = id+1;
UPDATE 100000
demo=# select pg_size_pretty(pg_relation_size('users'));
 pg_size_pretty
----------------
 7080 kB
(1 row)

As we can see the table size is doubled, means existing records are marked as dead and new tuples have been created.

Why does Postgres create dead tuples?

Dead tuples can be created due to multiple reasons such as:

While doing the an update transaction if it fails then postgreSQL can rollback to the older entry.
Similarly if deletion fails it can rollback to the old tuple.
A concurrent transaction(MVCC) still might want to refer to the old data.

What is the impact of dead tuples on performance?

Table Bloat seriously affects the PostgreSQL query performance.
In PostgreSQL tables and indexes are stored as array of fixed-size pages
(usually 8KB in size)
Whenever a query requests for rows, the Postgres instance loads these pages into the memory and dead rows cause expensive disk I/O during data loading as a result it slows down the disk scan.

How to identify such bloating tables in the database?

The postgreSQL query select * from pg_stat_user_tables; return a column named n_dead_tup which returns the count of dead tuples.

-- Used to print the output properly
demo=# \x
Expanded display is on-- Find bloating table
demo=# select * from pg_stat_user_tables;
-[ RECORD 1 ]-------+-------
relid               | 73205
schemaname          | public
relname             | users
seq_scan            | 1
seq_tup_read        | 100000
idx_scan            |
idx_tup_fetch       |
n_tup_ins           | 100000
n_tup_upd           | 100000
n_tup_del           | 0
n_tup_hot_upd       | 0
n_live_tup          | 100000
n_dead_tup          | 100000
n_mod_since_analyze | 200000
n_ins_since_vacuum  | 100000
last_vacuum         |
last_autovacuum     |
last_analyze        |
last_autoanalyze    |
vacuum_count        | 0
autovacuum_count    | 0
analyze_count       | 0
autoanalyze_count   | 0

How to get rid of the dead tuple?

There are two ways to reclaim the storage occupied by dead tuples.

1: By using plain vacuum (non-full)
2: By using full vacuum

Time to vacuum

Let’s vacuum the table and observe the behaviour.

Using the vacuum command: VACUUM simply reclaims space and makes it available for re-use and freed up space is not returned to the operating system.

demo=# select pg_size_pretty(pg_relation_size('users'));
-[ RECORD 1 ]--+--------
pg_size_pretty | 7080 kBdemo=# vacuum users;
VACUUMdemo=# select pg_size_pretty(pg_relation_size('users'));
-[ RECORD 1 ]--+--------
pg_size_pretty | 7080 kBdemo=# update users set id = id+1;
UPDATE 100000

demo=# select pg_size_pretty(pg_relation_size('users'));
-[ RECORD 1 ]--+--------
pg_size_pretty | 7080 kB

As you can see unlike last time the table size is still same even after updating all the users’ id.
After running the vacuum command the space was freed-up but it was available to database not to operating system, hence 7080 Kb.

Using the vacuum fullcommand:

VACUUM FULL rewrites the entire contents of the table into a new disk file with no extra space, allowing unused space to be returned to the operating system.
This Vacuum full is much slower and requires an ACCESS EXCLUSIVE lock on each table while it is being processed.

demo=# select pg_size_pretty(pg_relation_size('users'));
-[ RECORD 1 ]--+--------
pg_size_pretty | 7080 kB
demo=# vacuum full users;
VACUUM

demo=# select pg_size_pretty(pg_relation_size('users'));
-[ RECORD 1 ]--+--------
pg_size_pretty | 3544 kB

As you can see in case of vacuum full space was freed-up and it was available to the operating system, hence 3544 kB.

Auto Vacuum VS manual Vacuum

Autovacuum is too slow.
In some cases autovacuum does not remove the dead tuples. (ref)
When VACUUM hits a table, it usually does so “full speed” which means that cleaning up a table can cause massive I/O bottlenecks.
A single VACUUM job can use hundreds of megabytes of I/O per second — which leaves less capacity for other operations, and can, in turn, lead to bad response times.

Conclusion:

Autovacuum is always recommended however, in specific cases it can be disabled and performed manually.
If your database is too busy in day time then vacuuming can be done at night time via some script/cron.
It’s better to keep checking for table bloats even if autovacuum is enabled.

Thanks for the quick read.
Show your love by clapping. Ah! you can clap 50 times.