DisCopy


Tuesday, 7 July 2020

PostgreSQL deets for Tyros (specially for Sybase and MSSQL DBAs :))

Most of us love Elephant as it also resembles lord Ganesha (In Japan Lord Kangiten), believed to be the First GOD to be offered Pujas/Prayers. Fortunately for DBAs with its scale/size the Elephant became favorite one and for PostgreSQL community the LOGO :)

PostgreSQL, commonly pronounced “Post-GRES,” is an open source database, successor of Ingres (PostGres originally developed in 1986 as a follow-up to INGRES which was begun in the early 1970s), whereas Sybase started its development in 1984.

POSTGRES was the brainchild of Michael Stonebraker, a computer science professor at Berkeley. In 1994, added support for SQL and the name PostgreSQL came about, is now one of the top 5 leading Database products. The portability and flexibility of PostgreSQL supports both non-relational and relational data types, makes it one of the most compliant, stable, and mature relational databases available today.

Today, PostgreSQL continues to evolve, maintained by a worldwide team that is passionate about regularly improving this free and open source database project as databases are a key component of various applications including websites, portals and are at the core of how data is stored, maintained and accessed across the internet. Being an open source database and can scale upto 32TB PostgreSQL is one of the best choices to choose from.


A Quick tour to understand the landscape of the PostgreSQL database:

PostgreSQL Architecture - PostgreSQL consists of shared memory, background processes and data files.

Shared Memory

Shared Memory refers to the memory reserved for database caching and transaction log caching. The main components in shared memory are Shared Buffer and WAL buffers.

1.   Shared Buffer

It is like the data cache of Sybase. It optimizes :

·         Access very large (tens, hundreds of gigabytes) buffers quickly.

·         Minimize contention when many users access it at the same time.

·         Frequently used blocks must be in the buffer for as long as possible

2.   WAL Buffer

It is like the Sybase transaction log and to temporarily store changes to the database. WAL buffer contents are written to the WAL file at a predetermined point in time. WAL buffers and WAL files are very important for recovery.

 

PostgreSQL has four process types.

1.   Postmaster Process

It’s like a dataserver component of Sybase.

The Postmaster process is the main process started when we start the PostgreSQL. It performs recovery, initialize shared memory, and run background processes. It also creates a backend process when there is a connection request from the client process.

2.   Background Process

The main background processes of PostgreSQL are cited.

            Process

Functionality

1.         logger

Write the error message to the log file.

2.         checkpointer

When a checkpoint occurs, the dirty buffer is written to the file.

3.         writer

Periodically writes the dirty buffer to a file.

4.         wal writer

Write the WAL buffer to the WAL file.

5.         Autovacuum launcher

Fork autovacuum worker when autovacuum is enabled.It is the responsibility of the autovacuum daemon to carry vacuum operations on bloated tables on demand

6.         archiver

When in Archive.log mode, copy the WAL file to the specified directory.

7.         stats collector

DBMS usage statistics such as session execution information ( pg_stat_activity ) and table usage statistical information ( pg_stat_all_tables ) are collected.


    3.   Backend Process

 It’s like number of user sessions in Sybase.

  • The maximum number is set by the max_connections parameter, and the default value is 100.
  • Some memory structures are required for query execution, which is called local memory. The main parameters associated with local memory are:
  • work_mem Space used for sorting, bitmap operations, hash joins, and merge joins. The default setting is 4 MB.
  • Maintenance_work_mem Space used for Vacuum and CREATE INDEX . The default setting is 64 MB.
  • Temp_buffers Space used for temporary tables. The default setting is 8 MB.

4.   Client Process

Client Process refers to the background process that is assigned for every backend user connection. Usually the postmaster process will fork a child process that is dedicated to serve a user connection.


Data Files/Items for the Database/Instance

1.    The pg_default and pg_global tablespaces are created immediately after initdb().

2.    If We do not specify a tablespace at the time of table creation, it is stored in the pg_dafault tablespace.

3.    Tables managed at the database cluster level are stored in the pg_global tablespace.

4.    The physical location of the pg_default tablespace is $PGDATA\base.

5.    The physical location of the pg_global tablespace is $PGDATA\global.

6.    One tablespace can be used by multiple databases. At this time, a database-specific subdirectory is created in the table space directory.

7.    Creating a user tablespace creates a symbolic link to the user tablespace in the $PGDATA\tblspc directory.


The DataFiles per table (There are 3 files)

·         One is a file for storing table data. The file name is the OID of the table.
·         One is a file to manage table free space. The file name is OID_fsm .
·         One is a file for managing the visibility of the table block. The file name is OID_vm .
·         The index does not have a _vm file. That is, OID and OID_fsm are composed of two files.


Ø     How to check whether PostgreSQL is installed, running or not?

PostgreSQL is supported on three major platforms. To connect using psql we will first need to locate its location on the operating system that we might be using:

1.    Linux

                         [root@localhost]# which psql

 

2.   Windows:

On Windows, we can find psql in the Program Files, and we should be able to launch it in a command prompt simply by clicking on it.

3.   Mac:

On a Mac we should be able to locate psql under the Applications > PostgreSQL (version number) > SQL Shell (psql). 

psql -V                                     -           To check PostgreSQL is installed or not

systemctl status postgresql     –          to find the status of postgresql server

 

Ø          How to connect to PostgreSQL?

The basic parameters to connect to postgres database on any of the platforms are as follows:

Server [localhost]: IPADDRESS of the Host

This is the address for the server. We can use an IP address or the hostname of the machine on which database server is running. If we do not give any value, by default it is localhost.

Database [postgres]: Similar to Master database in Sybase/MSSQL

The name of the database with which we want to connect. The default name of the database is the same as that of the user. (If we are using Advanced Server, the default name would be edb.)

 

Port [5432]: By default the postgres post is always 5432. Similar to 5000 for Sybase

Username [postgres]: The default username is postgres. Similar to sa for Sybase

 

On a Mac or Windows, we are able to connect to the default instance by simply hitting enter at the shell or command prompt when trying to run psql and keying in the password.

On Linux, we will explicitly need to pass these parameters at the shell:

psql -d postgres -U postgres 

(similar to isql and additional optional parameters can be added)

 

Ø        Top 10 commands to understand the environment:   (-- Sybase equivalent command)

1)   Finding the version of the PostgreSQL                -- select @@version

SELECT version();    

 

2)   Finding the configuration:                       -- sp_configure

SELECT * from pg_settings;

 

The view pg_settings provides access to run-time parameters of the server. It is essentially an alternative interface to the SHOW and SET commands. It also provides access to some facts about each parameter that are not directly available from SHOW, such as minimum and maximum values.

 

3)  Finding the databases list and sizes:                    -- sp_helpdb

SELECT pg_database.datname as "dbname", pg_database_size(pg_database.datname)/1024/1024 AS “db_size_in_mb”

FROM pg_database ORDER by 1;

 

4)  Finding what is currently running:                     -- sp_who and kill

-- show all the processes currently running and active

SELECT pid, age(clock_timestamp(), query_start), usename, query

FROM pg_stat_activity

WHERE query != '<IDLE>' AND query NOT ILIKE '%pg_stat_activity%'

ORDER BY query_start desc;

 

-- kill a running query which is running long or hung

SELECT pg_cancel_backend(procpid);

 

-- kill an idle process

SELECT pg_terminate_backend(procpid);

 

-- Identify the resource contention using wait events

SELECT pid, wait_event_type, wait_event FROM pg_stat_activity

WHERE wait_event is NOT NULL;                            

 

5)   Finding sizes of the tables largest first :              

SELECT

  nspname || '.' || relname AS "Table",

  pg_size_pretty(pg_relation_size(C.oid)) AS "size"

FROM pg_class C

LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace)

WHERE nspname NOT IN ('pg_catalog', 'information_schema')

ORDER BY 2 DESC limit 10;

 

select indexrelname as “Index”, pg_size_pretty(pg_relation_size(relid))

from pg_stat_user_indexes where schemaname NOT IN ('pg_catalog', 'information_schema') order by 2 desc limit 10;

 

6)   Finding statistics of the tables:                 --

SELECT relname AS TableName ,n_live_tup AS LiveTuples ,n_dead_tup AS DeadTuples,

(n_live_tup- n_dead_tup) * 100 / n_live_tup AS “Bloat %”

FROM pg_stat_user_tables;;

 

7)  Update statistics and ReORg data of the tables:              --

VACUUM [ FULL ] [ FREEZE ] [ VERBOSE ] [ table ]

 

VACUUM reclaims storage occupied by dead tuples. In normal PostgreSQL operation, records/ rows that are deleted or obsoleted by an update are not physically removed from their table; they remain until a VACUUM is done. Therefore it's necessary to do VACUUM periodically, especially on frequently-updated tables.

 

With no parameter, VACUUM processes every table in the current database that the current user has permission to vacuum. With a parameter, VACUUM processes only that table.

 

VACUUM ANALYZE performs a VACUUM and then an ANALYZE for each selected table. This is a handy combination form for routine maintenance scripts.

 

8)  View important parameter files:              --  select * from sysconfigures

select * from pg_file_settings;

show all;

show config_file;

show hba_file;

show ident_file;

 

To Modify a parameter value, such as the parameter work_mem (similar to sp_configure)

alter system set work_mem=’8MB’;

 

9)   Dump database and Restore Databases

PostgreSQL provides 2 kinds of backup tools i.e. pg_dump and pg_basebackup

 

pg_dump is a utility for backing up a PostgreSQL database. It makes consistent backups even if the database is being used concurrently. pg_dump does not block other users accessing the database (readers or writers).

Dumps can be output in script or archive file formats. Script dumps are plain-text files containing the SQL commands required to reconstruct the database to the state it was in at the time it was saved. To restore from such a script, feed it to psql. Script files can be used to reconstruct the database even on other machines and other architectures; with some modifications, even on other SQL database products.

The alternative archive file formats must be used with pg_restore to rebuild the database. They allow pg_restore to be selective about what is restored, or even to reorder the items prior to being restored. The archive file formats are designed to be portable across architectures.

When used with one of the archive file formats and combined with pg_restore, pg_dump provides a flexible archival and transfer mechanism. pg_dump can be used to backup an entire database, then pg_restore can be used to examine the archive and/or select which parts of the database are to be restored. The most flexible output file format is the "custom" format (-Fc). It allows for selection and reordering of all archived items, and is compressed by default.

Some important option of pg_dump

-F - format --format=format

Selects the format of the output. format can be one of the following:

 

p - plain

Output a plain-text SQL script file (the default).

 

c - custom

Output a custom-format archive suitable for input into pg_restore. Together with the directory output format, this is the most flexible output format in that it allows manual selection and reordering of archived items during restore. This format is also compressed by default.

 

d - directory

Output a directory-format archive suitable for input into pg_restore. This will create a directory with one file for each table and blob being dumped, plus a so-called Table of Contents file describing the dumped objects in a machine-readable format that pg_restore can read. A directory format archive can be manipulated with standard Unix tools; for example, files in an uncompressed archive can be compressed with the gzip tool. This format is compressed by default.

 

t - tar

Output a tar-format archive suitable for input into pg_restore. The tar format is compatible with the directory format: extracting a tar-format archive produces a valid directory-format archive. However, the tar format does not support compression. Also, when using tar format the relative order of table data items cannot be changed during restore.

 

Various options to dump the database and refresh accordingly:

To dump a database called mydb into a SQL-script file:

$ pg_dump mydb > db.sql

 

To reload such a script into a new database named newdb:

$ psql -d newdb -f db.sql

 

To dump a database into a custom-format archive file:

$ pg_dump -Fc mydb > db.dump

 

To dump a database into a directory-format archive:

$ pg_dump -Fd mydb -f dumpdir

 

To reload an archive file into a new database named newdb:

$ pg_restore -d newdb db.dump

 

To dump a single table named mytab:

$ pg_dump -t mytab mydb > db.sql

 

To dump all tables whose names start with emp in the detroit schema, except for the table named employee_log:

$ pg_dump -t 'detroit.emp*' -T detroit.employee_log mydb > db.sql

 

To dump all schemas whose names start with east or west and end in gsm, excluding any schemas whose names contain the word test:

$ pg_dump -n 'east*gsm' -n 'west*gsm' -N '*test*' mydb > db.sql

 

The same, using regular expression notation to consolidate the switches:

$ pg_dump -n '(east|west)*gsm' -N '*test*' mydb > db.sql

 

To dump all database objects except for tables whose names begin with ts_:

$ pg_dump -T 'ts_*' mydb > db.sql

 

To specify an upper-case or mixed-case name in -t and related switches, you need to double-quote the name; else it will be folded to lower case (see Patterns). But double quotes are special to the shell, so in turn they must be quoted. Thus, to dump a single table with a mixed-case name, you need something like

$ pg_dump -t '"MixedCaseName"' mydb > mytab.sql

 

 

pg_basebackup is used to take base backups of a running PostgreSQL database cluster. These are taken without affecting other clients to the database, and can be used both for point-in-time recovery and as the starting point for a log shipping or streaming replication standby servers.

pg_basebackup makes a binary copy of the database cluster files, while making sure the system is put in and out of backup mode automatically. Backups are always taken of the entire database cluster; it is not possible to back up individual databases or database objects.

For individual database backups, a tool such as pg_dump must be used.

To create a base backup of the server at mydbserver and store it in the local directory /usr/local/pgsql/data:

$ pg_basebackup -h mydbserver -D /usr/local/pgsql/data

 

To create a backup of the local server with one compressed tar file for each tablespace, and store it in the directory backup, showing a progress report while running:

$ pg_basebackup -D backup -Ft -z -P

 

To create a backup of a single-tablespace local database and compress this with bzip2:

$ pg_basebackup -D - -Ft -X fetch | bzip2 > backup.tar.bz2

 

               We can restore pg_basebackup backup for the below two scenarios:

ü  Restore in the same directory structure of the backup database.

ü  Restore in the different directory structures of the backup database.

  

10)  Some important system functions - Session Information Functions

Name

Return Type

Description

current_catalog

name

name of current database (called "catalog" in the SQL standard)

current_database()

name

name of current database

current_query()

text

text of the currently executing query, as submitted by the client (might contain more than one statement)

current_role

name

equivalent to current_user

current_schema[()]

name

name of current schema

current_schemas(boolean)

name[]

names of schemas in search path, optionally including implicit schemas

current_user

name

user name of current execution context

inet_client_addr()

inet

address of the remote connection

inet_client_port()

int

port of the remote connection

inet_server_addr()

inet

address of the local connection

inet_server_port()

int

port of the local connection

pg_backend_pid()

int

Process ID of the server process attached to the current session

pg_conf_load_time()

timestamp

configuration load time

pg_listening_channels()

setof text

channel names that the session is currently listening on

pg_my_temp_schema()

oid

OID of session's temporary schema, or 0 if none

pg_postmaster_start_time()

timestamp

server start time

pg_trigger_depth()

int

current nesting level of PostgreSQL triggers (0 if not called, directly or indirectly, from inside a trigger)

session_user

name

session user name

user

name

equivalent to current_user

version()

text

PostgreSQL version information

 

* Thanks much for the PostgreSQL community and bloggers!

2 comments:

  1. Very nicely explained the architecture of postgresql.. really its impressive...:)

    ReplyDelete
  2. Kasi...this is very helpful. Thanks a lot.

    ReplyDelete