DisCopy


Thursday 23 June 2022

Interesting Behaviour of PostgreSQL character data types!

Every RDBMS supports multiple data types namely numeric, monetary, character, date(time), binary, Boolean and blobs etc. PostgreSQL supports special datatypes like Geometric, Network-Address, UUID, XML, JSON, Text Search types etc.

Out of these 70% of columns are typically character data types. Character data types are strings of ASCII characters. Upper and Lower case alphabetic characters are accepted literally. There are three kinds of character data types as cited:

1.       fixed-length, blank padded - char

2.       variable-length with limit - varchar                       and

3.       variable unlimited length  - text

The storage requirement for a character datatype for a short string (up to 126 bytes) is 1 byte plus the actual string, which includes the space padding in the case of character. Longer strings have 4 bytes of overhead instead of 1 byte. Long strings are compressed by the system automatically, so the physical requirement on disk might be less. Very long values are also stored in background tables so that they do not interfere with rapid access to shorter column values.

BTW, In any case, the longest possible character string that can be stored is about 1 GB. (The maximum value that will be allowed for n in the data type declaration is less than that. It wouldn't be useful to change this because with multibyte character encodings the number of characters and bytes can be quite different. If you desire to store long strings with no specific upper limit, use text or character varying without a length specifier, rather than making up an arbitrary length limit.)

Most importantly interesting behaviour in PostgreSQL is how it handles these character data types.

There is no performance difference among these three data types, apart from increased storage space when using the blank-padded type, and a few extra CPU cycles to check the length when storing into a length-constrained column. While character(n) has performance advantages in some other database systems, there is no such advantage in PostgreSQL; in fact character(n) is usually the slowest of the three because of its additional storage costs. In most situations text or character varying should be used instead.

 

Source: postgresql.org