Availability > Functionality > Performance!
Performance
tuning is a continuous process in any database
environments. Many organizations spend more time and resources on tuning the
database than on the original development because of data growth and usage on
top of a poor database design. 80% of performance issues can be avoided with a
proper database design. In addition, developing SQL code is very different from
developing front-end applications code and middle-tier components.
The SQL code developed by front-end or middle-tier experts can often be
optimized depending on the transparency of the code. Finally, there is a
lot of requirement and necessity for optimizing databases when the volume of
operations grows and the application that worked great with 5-10 users and
100-200GB no longer lives up to tera bytes of data and 1000s of users'
expectations (Scalability). Yes as all of us aware scalability is
the root cause.
A
common place where most people like to start tuning databases is tuning the
Server/Database configuration. Generally, Most developers that are unhappy with
the application performance will demand adding resources like more cpu and
memory to the server and reconfiguring the dynamic configuration parameters.
But, extending the memory and adding more cpus or keep on changing
configuration parameters will only help optimizing the performance up to a
certain point. In other words, if you keep adding memory and do not tune the
application or the sql code in any other way, you will reach the point where
additional memory or re-configuration produces marginal or no performance
improvement.
It is
also important to realize that improvement in one area often means compromising
others. If you can optimize 90% of the critical queries by slowing down the
performance of other 10% it might be well worth of your time. Sometimes you can
improve the performance of online transactions by increasing the response time
with the expense of reducing concurrency or throughput. Therefore it is
important to determine the application performance requirements. If
we improve the performance of selects obviously we compromise DML
statements/transactions (may be like adding additional indexes).
It is
common to spend more time identifying the cause of the problem then actually
troubleshooting and fixing it. If all other areas of application are working
properly and you can be sure there is a problem with the database code, then
you need to investigate your code modules and decide which one is causing
problems. Many times improvement in only one stored procedure or trigger can
fix most of the issues with its replicating/cascading performance by reducing
locking/blocking.
The Optimizer of the
PostgreSQL database engine takes a query and finds the best way to execute it.
The optimization is done based on the statistics and available indexes for a
view or a table. The optimized plan stays in effect until the statistics are updated
or the query changes. So vacuuming frequently and rightly is crucial for
Optimizer choosing optimal query plan.
Carefully considered
indexes, built on top of a good database design, are the foundation of a
high-performance PostgreSQL configuration. However, adding indexes without
proper analysis can reduce the overall performance of your system. Insert,
update, and delete operations can take longer when a large number of indexes
need to be updated.
What to
observe - high level (We have many more ... :) Here top 5:
- Server error logs
- OS error logs
- Configuration files
- Workload pattern
- Peak hours and Off-peak Hours processes
What to
set/configure - high level (We have many ... :) Here top 5:
1. shared_buffers
Most important parameter
which Caches frequently accessed data in memory blocks also known as Cache hit
ratio. This parameter allows PostgreSQL to read data from memory instead
of Storage disk, which can speed up query execution significantly.
Recommended value
: 15%-25% of Total RAM
2. work_mem
Defines the amount of
memory used for internal sort operations and hash tables before writing to
disk. Insufficient memory allocation can lead to slower query performance
as Sort operations are used for order by, distinct, and merge join operations.
Hash tables are used in hash joins and hash based aggregation.
Recommended value : 25%
of Total RAM divided by max_connections
3. effective_cache_size
Informs the optimizer
about the level of cache available in the kernel/OS. Index scans are most
likely to be used against higher values; otherwise, sequential scans will be
used if the value is low. Setting this value too low can cause the query planner
to avoid using certain indexes.
Recommended value
: 50% of Total RAM
4. wal_buffers
The wal_buffers setting
controls the amount of shared memory used for the Write Ahead Log (WAL) data
that has not yet been written to disk. The default size is 16 MB, but a
higher value can improve performance on a busy server.
Recommended value
: 16MB or higher
5. effective_io_concurrency
Defines the number of
simultaneous read and write operations that can be operated by the underlying
storage disk. The allowed range is 1 to 1000, or zero to disable asynchronous
I/O requests.
Recommended value
: HDD - 2, SSD – 200 and SAN - 300
Query tuning, Object optimization, maintenance window for vacuum and analyze, monitoring and reacting on blocked processes/locks, analyzing and fine-tuning long running and slow running Queries including index optimization, access path and right join type will be discussed in the next blog post.
-HTH :)