Postgresql Memory Structure

When we compare both I/O and OIPS performances, RAM is many times faster than SSD disk. Magnetic disks are also slower than SSD disks. For performance, it is desirable to perform I/O operations on memory as much as possible. Memory structures in Postgresql have different uses and methods.

For example, one part (memory structure) is used to hold code, while another part is
used to hold data blocks.
Some memory areas are shared for all users, while others are private to the user.

Shared Memory: When the Postgresql instance is opened, it takes the memory allocated for it from the system.

The amount of memory it will take is determined by shared_buffer, wal_buffer, etc. parameters. Some of these areas are also divided into subsections within themselves.

Shared memory structures are shared with all users and backend processes.

Per Backend Memory (local memory area): It is used for SQL query operations. Each backend process takes this memory space from the system for its own use. It is not shared. It is divided into subsections within itself.

It will take memory equal to the product of (session number) * (Backend Memory)
from the system.
So, care should be taken when calculating the number of sessions and Backend
memory areas. You can find details in the following topics.

RAM: Random Access Memory
I/O: Read write
IOPS: Input/output operations per second. The number of operations that can be
performed per second.
(The speed of copying a 1 GB image file and a folder containing 10240 1 KB files are
different.)

Shared Memory

Shared Buffer

alter system set shared_buffers ='24GB';

It is shared. It contains data blocks.
It works integrated with OS Cache.
The shared buffer area is shared by all users and background processes.
The default value is 128MB.
If the unit (MB, GB) is not given when setting shared_buffers, it will perceive the given
number as the number of blocks and multiply it by the block size (size) and configure it
in KB.

Example: alter system set shared_buffers=100;
Since the block size is 8KB (default value), it assigns shared_buffers=800KB.

In order for the change made in shared_buffers to take effect, it is necessary to close
and open the instance.

The recommended value for the shared buffer is 1/4 of the RAM in the system.
In some high-load instances, up to 40% of RAM can be assigned to shared buffer,
since postgresql works as an integrated part of the operating system cache (OS
Cache), it does not mean that more than 40% of memory will have a positive effect on
performance.

Increasing the shared suffer will increase the size of modified (update/delete) and/
or new (insert) data (increasing DML capacity). The max_wal_size value should also
be increased in order to extend the writing process of memory to disk over a longer
period of time.

Note: The operating system has a dependency on the kernel settings. If you get an
error while updating the parameter values, it may be related to the kernel settings of
your operating system.
For detailed information about buffer cache usage, pg_buffercache extension can
be installed.

Shared Buffer and OS Cache

When a query is executed;

The backend process first checks the shared buffer if it is available, logical I/O takes
place.
If it is not in the buffer cache, the OS cache is checked. If it is found in the OS cache,
logical I/O takes place.
If they are not in the OS cache, the requested blocks are read from the disk and
physical I/O is done, and the cost of physical I/O is high.

You do not want to make settings in Postgresql, such as giving most of the RAM to the
buffer cache and ignoring the OS cache, because it will reduce performance at another
point.
The purpose of both cache areas is to keep the data with intensive I/O in memory and
to reduce the I/O operations on the disk as much as possible.

Continuous read and write operations are performed in the pg_xact (before 9.6, pg_clog) folder, where the commit log information is kept. Therefore, the operating
system wants to keep the files in the pg_xact folder in memory (OS Cache). In this
case, keeping the share of OS cache high also contributes to performance.

Postgresql is better at memory (shared buffer) management than OS Cache. Postgresql
assigns values in the range of 0-5 to the blocks, based on the processing frequency.
It gives high value to blocks with more transactions and low value to blocks with
a smaller number of transactions. When it is necessary to free up memory, blocks
with low values are removed from memory. This way of working is called the clock sweep algorithm.
OS Cache usually uses various methods of LRU (Least Recently Used) as an algorithm.
Blocks to be cached are given a chance or two. It does not go into a detailed analysis.
That’s why postgresql’s shared buffer management is better than OS Cache.

WAL Buffer

alter system set wal_buffer = '64MB' ;

It is shared, it keeps track of changes made in the database.
It keeps the information necessary to redo the changes made with DML and DDL.
Postgresql supports the write-ahead log mechanism, which is necessary to ensure data consistency.
WAL (transaction log) keeps the metadata information of the changes made in the database.
When necessary, it regenerates (REDO) or modifies the data using the records here.
WAL mechanism processes the committed data to disk even in case of system errors (crash, power outage, hardware failure, etc.) and prevents data loss and inconsistency by rolling back the uncommitted data.
WAL Buffer is the buffer area, where WAL data that has not yet been written to the disk is kept temporarily.
It is shared, and accessible by all background server and backend processes.
In each commit operation, the data in the WAL buffer area is written to the disk.Even MB level increases can provide performance boosts.
The default is “-1”, which is the automatic tuning setting. Takes 1 out of 32 of shared_buffers. (shared_buffers/32). For example, if shared_buffers=512MB, it takes wal_buffer=16MB.
It cannot be less than 64KB and larger than the WAL segment size.
It is a static parameter, the instance must be closed and opened for the change to take effect.

Note: In order for the WAL mechanism to work properly, your system and database must be adjusted according to the best practice rules.

Clog Buffer

It is the shared memory area where the commit logs are kept.
A transaction has 4 states. The commit log keeps the below statuses.
- in-progress
- committed
- aborted
- sub_committed
Accessible by all background server and backend processes.
It is part of the concurrent access mechanism (Concurrency Control mechanism).
Although there is no parameter that can change its size, it can be increased by
compiling from the source code (CLOGShmemBuffers()) or patching during the first
installation.
Writes to the pg_xact directory. (Before version 10 –> pg_clog).

Lock Space

It is shared. It has no parameters that can be changed directly.
It is the memory area where more than average locks are stored. This lock is shared by
the background and backend user processes.
Changes in max_locks_per_transaction and max_pred_locks_per_transaction parameters also affect lock space.

Per Backend Memory

Work Memory

It is not shared, each session has its own memory space.
Sort is used for order by, distinct, merge joins, and hash table operations.
If the work_mem space is not enough, it writes to temporary files on the disk (physical I/O).
Each session and process uses its own private space. It takes space from the memory space from the system based on the calculation, the number of sessions * work_mem. The default value of work_mem is 4MB. If your instance has 300 sessions, 4MB * 300 = 1200MB = 1GB memory from your operating system (not from shared_buffers).
In complex queries, several sort and hash operations can run in parallel, each of these operations can allocate a separate work_mem space for itself.

# To change system-wide;
alter system set work_mem= '16MB';

# To change the work_mem value of the current session ;
set local work_mem = 128MB;

# Due to a bug in Postgres 13, this setting gives an error. It can be run as follows;
set local work_mem = 102400\;

# To set work_mem specific to a specific user or role;
alter user mehmed set work_mem= '256MB';

Hash-based operations require more memory than sort operations.
In hash table operations, the amount of memory that can be used is calculated by
multiplying the value in the hash_mem_multiplier parameter with the value
of work_mem. Therefore, hash-based operations can use more than the standard
amount of work_mem. The default unit of the work_mem parameter is KB. If no unit is
specified when assigning a value, it assigns a value in KB.

Temp Memory

# To change system-wide;
alter system set set temp_buffers='16MB';

# To change the temp_buffers of the current session;
set local temp_buffers = 128MB;

# Due to a bug in Postgres 13, this setting gives an error.
# It does not give an error as below.
set local temp_buffers = 102400\;

# To set work_mem specific to a specific user or role;
alter user mehmed set work_mem='256MB';

It is the buffer area used for temp tables.
It is not shared.
Each session has its own temp buffer area.
temp_buffers determines the maximum temp buffer space that each session can use.
The default value is 8MB.
If the unit is not given when changing, it calculates the new value by multiplying it by the block size.
It does not reserve the amount of memory given in the parameter as soon as the session starts.
It is a field that is used when needed, even if large values are given. Changes can be made based on session.

Maintenance Work Memory

alter system set maintenance_work_mem= '128MB';

It is the memory area used for maintenance operations like vacuum, create index, alter table add foreign key etc.
It determines the upper limit of the memory that can be used for support and maintenance operations.
The default value is 64 MB.
If the value is changed without giving a unit, the default unit is kilobyte.
A session runs only one maintenance task. So large work_mem can be given. Thus, the performance of vacuum and dump restore operations is increased.
While the autovacuum process is running, it can take up to autovacuum_max_workers times memory.
The autovacuum_max_workers default value is 3, so be careful when increasing it.

Catalog Cache

It is the memory area where the system tables are kept.
Schema, table, view, index etc. in system tables. Contains information about objects.
Most transactions require catalog information. For example, the table select statement needs information like does table really exist, does it have a column queried, does the user requesting has privilege, etc. a lot of information is accessed from the system catalog. Reading this data from the disk every time will cause performance bottlenecks, so it is kept in the catalog cache.

Autovacuum Work Mem

alter system set autovacuum_work_mem = '128MB';

Determines the maximum amount of memory that each autovacuum worker process can use.
If the unit is changed without giving it, it takes KB as the default unit.
Its default value is “-1”. In this case it takes the same value as maintenance_work_mem.
Autovacuum can use a maximum of 1 GB of memory when collecting dead tuple identifiers. So having value of more than 1GB will have no effect