DisCopy


Wednesday 9 October 2024

PostgreSQL Performance Tuning and Optimization..

Availability > Functionality > Performance! 

Performance tuning is a continuous process in any database environments. Many organizations spend more time and resources on tuning the database than on the original development because of data growth and usage on top of a poor database design. 80% of performance issues can be avoided with a proper database design. In addition, developing SQL code is very different from developing front-end applications code and middle-tier components. The SQL code developed by front-end or middle-tier experts can often be optimized depending on the transparency of the code. Finally, there is a lot of requirement and necessity for optimizing databases when the volume of operations grows and the application that worked great with 5-10 users and 100-200GB no longer lives up to tera bytes of data and 1000s of users' expectations (Scalability).  Yes as all of us aware scalability is the root cause.

A common place where most people like to start tuning databases is tuning the Server/Database configuration. Generally, Most developers that are unhappy with the application performance will demand adding resources like more cpu and memory to the server and reconfiguring the dynamic configuration parameters. But, extending the memory and adding more cpus or keep on changing configuration parameters will only help optimizing the performance up to a certain point. In other words, if you keep adding memory and do not tune the application or the sql code in any other way, you will reach the point where additional memory or re-configuration produces marginal or no performance improvement. 

It is also important to realize that improvement in one area often means compromising others. If you can optimize 90% of the critical queries by slowing down the performance of other 10% it might be well worth of your time. Sometimes you can improve the performance of online transactions by increasing the response time with the expense of reducing concurrency or throughput. Therefore it is important to determine the application performance requirements.  If we improve the performance of selects obviously we compromise DML statements/transactions (may be like adding additional indexes).

It is common to spend more time identifying the cause of the problem then actually troubleshooting and fixing it. If all other areas of application are working properly and you can be sure there is a problem with the database code, then you need to investigate your code modules and decide which one is causing problems. Many times improvement in only one stored procedure or trigger can fix most of the issues with its replicating/cascading performance by reducing locking/blocking.

The Optimizer of the PostgreSQL database engine takes a query and finds the best way to execute it. The optimization is done based on the statistics and available indexes for a view or a table. The optimized plan stays in effect until the statistics are updated or the query changes. So vacuuming frequently and rightly is crucial for Optimizer choosing optimal query plan.

Carefully considered indexes, built on top of a good database design, are the foundation of a high-performance PostgreSQL configuration. However, adding indexes without proper analysis can reduce the overall performance of your system. Insert, update, and delete operations can take longer when a large number of indexes need to be updated.

 

What to observe - high level (We have many more ... :)  Here top 5:

  1. Server error logs
  2. OS error logs
  3. Configuration files
  4. Workload pattern
  5. Peak hours and Off-peak Hours processes

 

What to set/configure - high level (We have many ... :)  Here top 5:

1.   shared_buffers

Most important parameter which Caches frequently accessed data in memory blocks also known as Cache hit ratio. This parameter allows PostgreSQL to read data from memory instead of Storage disk, which can speed up query execution significantly.

Recommended value :  15%-25% of Total RAM 

2.   work_mem

Defines the amount of memory used for internal sort operations and hash tables before writing to disk. Insufficient memory allocation can lead to slower query performance as Sort operations are used for order by, distinct, and merge join operations. Hash tables are used in hash joins and hash based aggregation.

Recommended value :  25% of Total RAM divided by max_connections

3.   effective_cache_size

Informs the optimizer about the level of cache available in the kernel/OS. Index scans are most likely to be used against higher values; otherwise, sequential scans will be used if the value is low. Setting this value too low can cause the query planner to avoid using certain indexes. 

Recommended value :  50% of Total RAM 

4.   wal_buffers

The wal_buffers setting controls the amount of shared memory used for the Write Ahead Log (WAL) data that has not yet been written to disk. The default size is 16 MB, but a higher value can improve performance on a busy server.

Recommended value :  16MB or higher

5.   effective_io_concurrency

Defines the number of simultaneous read and write operations that can be operated by the underlying storage disk. The allowed range is 1 to 1000, or zero to disable asynchronous I/O requests.

Recommended value :  HDD - 2, SSD – 200 and SAN - 300

 

Query tuning, Object optimization, maintenance window for vacuum and analyze, monitoring and reacting on blocked processes/locks, analyzing and fine-tuning long running and slow running Queries including index optimization, access path and right join type will be discussed in the next blog post.

-HTH :)

No comments:

Post a Comment