
Disk: IOPS (input/output per second) performance is very important for the database.
RAID 10, which will be created from disks with high fault-tolerant IOPS performance
(NVME, SSD, Flash) can be preferred.
RAID 10 has high read and write performance. Therefore, RAID 10 should be preferred,
especially for databases with heavy DML operation.
The read speed of RAID 5 is high while the write speed is low. Therefore, it is more
suitable for reporting (select) databases.
iostat 2 4 # Shows io stats 4 times every 2 seconds. Iotop # Shows the processes that are doing IO momentarily. top # A value of 25% or more indicates a high IO rate. sar # It can collect and report system activities by scheduling.
dd etc. While the read and write performances of the disks are tested with the tools,
the processor disk and memory usage can be monitored with the above monitoring tools.
dd if=/dev/zero of=/pgdata/test1.img bs=8KB count=1024
It performs 1024 writes of 8KB to the test1.img file in the directory /pgdata.
Memory: Postgresql uses the clock sweep algorithm that keeps the most used data
blocks in memory. The memory needs vary according to the purpose of the database,
data size, data types, application and SQL/PLpgSQL codes
| shared_buffers | ¼ of the physical memory can be given. |
| wal_buffers | The default setting is “-1”. Takes 1 share out of 32 of shared_buffers. (shared_buffers/32) It cannot be less than 64KB and larger than the WAL segment size. It is a static parameter, the instance must be stopped and started for the change to take effect. A static value can be given if the shared_ buffers value is too large. Especially in systems with a large number of DMLs, even small increase can contribute to performance. |
| effective_cache_size | Postgresql wants to use both dedicated memory (shared_buffers) and filesystem cache. Execution planner decides whether to use index in a query based on total memory (OS cache + shared_buffers). Although the use of indexes will increase performance, if the memory space use is small, it may not use indexes. Although it may seem that 50% – 75% of the physical memory can be allocated for effective_cache_size, it is useful to give the correct number by monitoring the operating system with top and free applications |
| work_mem | It is the memory area used for sorting and hash table query operations. If the work_mem space is not enough, it writes to temporary files ($PGDATA/base/pgsql_tmp) on the disk (physical I/O). It is not shared. Each session and process use its own private space. With a simple calculation, session number * work_mem memory space taken from the system. The default value is 4MB. It can be increased by considering the number of simultaneous sessions. It is an important parameter in reporting databases. In complex queries, several sort and hash operations can run in parallel, each of these operations can allocate a separate work_mem space for itself. Performance of reports can be increased by giving session specific value; SET LOCAL work_mem = ‘2GB’; |
| maintenance_work_mem | It determines the amount of memory that can be used for maintenance operations such as vacuum, reindexing. The default value is 64MB. |
CPU : Parallelism feature came with Postgresql 9.6 version. In instance level and/or SQL
scripts, the workload must be distributed across all CPUs. For this, hints can be entered
into the codes as well as the following parameters can be adjusted
| max_worker_processes | It determines the maximum number of background processes that the system will support. The default value is 8. It can be increased by the number of cores. |
| max_parallel_workers_per_gather | It determines the maximum number of workers that can be started for each gather and gather merge operation. (“workers planned 2” in Explain plan) Each worker allocates work_mem space for itself. Gather: The process of collecting and merging the results of all worker processes into a single result |
| max_parallel_workers | Specifies the maximum number of workers to be supported in parallel operations. The default value is 8. It can be increased as much as the number of cores. |
| max_parallel_maintenance_workers | Each maintenance tool can be started Sets the maximum number of workers. The default value is 2. |
# Shows the usage of all cores 22 times every 2 seconds. mpstat -P ALL 2 22 # Shows resource usage 11 times every 3 seconds. vmstat -P ALL 3 11 #By pressing #1, the load on all cores can be monitored. top # Comes with vmstat, mpstat and top systat packages. apt install sysstat
With Linux tools vmstat, mpstat, top etc., the distribution of the load to the cores can
be monitored.
If the query result is delayed and only a few of the cores are at full load and the other
cores are idle, the parallelism settings should be considered. SQL hint can be used if
needed.
Network

An entry level netwok diagram is given in the figure. Here, when the client connects
to the application and starts working, TCP packets pass through all these 1,2,3,4 and
5 network connections. Slowness in any of these connections can be perceived as a
database performance problem or transferred to the database team in this case.
Between application server and database server;
If possible, there should be 10GBPS and above connection,
Jumbo frame must be set (MTU 9000),
If there are active devices in between, routing should be preferred instead of NAT,
In addition, speed testing can be done by sending files between servers with scp, ftp,
sftp, etc. tools.
Note: Application and database servers must be separate.
# Retrieves statistics of all NICs. ip -s link # Shows connection information of applications. netstat -neopa | more # Returns incorrect packet statistics from UDP. netstat -su # Returns erroneous packet statistics. ethtool -S enp0s31f6 # hard interrupts cost more CPU. egrep “CPU0|enp0s31f6” /proc/interrupts
Network problem can be detected by examining errors, dropped, overrun,
carrier and collisions statistics.
The easiest way to determine that the problem is caused by the network;
Connect to the database server with SSH, execute the client’s SQL with \timing on in
the database’s locale, and compare the times.
Virtualization layer

Even if all performance improvements have been made in the virtual database server,
the maximum performance that can be achieved will be limited to the virtualization
layer.
For example, servers with SSD disks are rented from cloud companies, but their IOPS
capacity may be limited. Although all performance improvements are made in the
operating system of this virtual machine, in the instance and in the code, it is highly
likely that the desired values cannot be reached due to the IOPS limit.
Therefore, dedicated resources should be used in virtual database servers. In particular,
the disks must be dedicated or the IOPS capacity required by the application must be
guaranteed.
Instance Tuning: Memory, lock, vacuum, checkpoint, parallelism, hugepage, connection
etc. parameters can be adjusted for performance improvement according to the needs
of the application and/or the purpose of the database.
It is especially important to have updated statistics of tables where DML (insert,
update, delete) operations are intense so that the cost and execution plans can
be created.
For example, if billions of records are added to a table with 2-3 thousand rows of
data, if vacuum analyze does not work, postgresql still assumes that there are only 2-3
thousand rows in this table and performs a full table scan (FTS ) without using an index,
can cause a serious performance loss.
Index: While indexes increase the performance of reporting (select) operations,
they negatively affect the performance of write operations. Therefore, unused and
duplicate indexes should be detected and deleted.
When an index is added to a table, a record is added to the index with each insert.
So, it causes more write operations. It causes fragmentation in update and delete
operations.
Unused indexes:SELECT relname AS tabl0_ismi, indexrelname AS indeks_ismi, idx_
scan, idx_tup_read, idx_tup_fetch, pg_size_pretty(pg_relation_
size(indexrelname::regclass))
FROM pg_stat_all_indexes
WHERE schemaname = ‘public’
AND idx_scan = 0
AND idx_tup_read = 0
AND idx_tup_fetch = 0
ORDER BY pg_relation_size(indexrelname::regclass) DESC;
Duplicate indexes:SELECT pg_size_pretty(SUM(pg_relation_size(idx))::BIGINT) AS
B0YUT,
(array_agg(idx))[1] AS index1, (array_agg(idx))[2] AS index2,
(array_agg(idx))[3] AS index3, (array_agg(idx))[4] AS index4
FROM (
SELECT indexrelid::regclass AS idx, (indrelid::text ||E’\n’||
indclass::text ||E’\n’|| indkey::text ||E’\n’||
COALESCE(indexprs::text,’’)||E’\n’ || COALESCE(indpred::text,’’))
AS KEY
FROM pg_index) sub
GROUP BY KEY HAVING COUNT(*)>1
ORDER BY SUM(pg_relation_size(idx)) DESC;
Partition
CREATE TABLE uretim ( uid uuid NOT NULL, tarih date NOT NULL, bant bigint NOT NULL, lokasyon bigint NOT NULL, tpsno character varying(30) COLLATE pg_catalog.”default” ) PARTITION BY RANGE (tarih); PARTITION TABLOLARI CREATE CREATE TABLE IF NOT EXISTS t_2000_2001 PARTITION OF uretim ( CONSTRAINT “t_2000_2001_Unique_” PRIMARY KEY (uid, tarih) ) FOR VALUES FROM (‘2000-01-01’) TO (‘2001-01-01’); CREATE TABLE IF NOT EXISTS t_2001_2002 PARTITION OF uretim ( CONSTRAINT “t_2001_2002_Unique_” PRIMARY KEY (uid, tarih) ) FOR VALUES FROM (‘2001-01-01’) TO (‘2002-01-01’);
It is the process of dividing the table into smaller parts. In the example, range partition
is created for each year according to the date column. Thus, it filters by years, only
the year given in the where condition is searched instead of searching in all years,.
Postgresql supports range, list and hash partition types.
Distributing IO Operation Disks: Performance can be improved by placing pg_wal,
temp_tablespaces and application tablespaces on different disks.
It can be done with symbolic links at the operating system level or by mounting these
paths to different disks.
Reports can be moved to hot/logical standbys. Hot standby is exactly the same as the
primary, and it is open for read-only.
Logical standby is open for reading and writing. It is possible to replicate only the
desired tables in the primary database. Indexes can be added to these tables as they
are ready to write. View etc. can be created.
Application Tuning
explain select * from rental where rental_id = 622 ; QUERY PLAN --------------------------------------------------------------- Index Scan using rental_pkey on rental (cost=0.29..8.30 rows=1 width=36) Index Cond: (rental_id = 622) explain (ANALYZE, BUFFERS, FORMAT JSON) select * from film where film_id = 100;
Explain shows the execution plans of SQLs.
track_io_timing must be active. Problems at high-cost stages(node) are detected
and fixed.
These stats can be obtained and reported with performance reporting tools like
pgbadger etc.
SQLs that take longer than the value given in the log_min_duration_statement
parameter can be logged.
Performance can be improved:
By using Hint, it can be forced to use low-cost work plan and enhance the performance.
Frequently used tables can be kept in memory with pg_warm. If the whole table is
scanned, the updated statistics are checked. Index can be added according to where
conditions.
Explain Options
ANALYZE: It actually executes the SQL.
VERBOSE: Provides more detailed informatition.
COSTS: Provides the fifirst record and total costs.
BUFFERS: Provides shared read from disk, and shared blocks hit statitistitics.
WAL: Provides WAL registratition informatition.
TIMING: Provides the start titime and the elapsed titime at each node.
SUMMARY: It provides summary informatition such as total titime.
FORMAT: The format of the output is determined (TEXT, XML, JSON, YAML).
Lock
Although locks are related to applicatition setup and titiming, they are perceived as a
performance problem. When the lock occurs, the hourglass appears on the screen.
The user thinks that the system is running slow, and cannot fifigure out that the lock has
occurred. Locks can be released at the applicatition layer and some at the instance level.
This settittings (lock titimeout) should be changed in consultatition with the applicatition
developers.
Locked sessions can be killed aftfter consultatition with the app developers or support
team when quick resolutition is required. For details about Lock, chapter 6 can be
checked.