DisCopy


Thursday 17 January 2019

Optimizing Disk I/O - Controlling Physical Data Placement.


Sybase/SAP ASE is robust and popular for its stability, versatility and performance for mixed data loads (Both OLTP and DSS). As database scalability is a biggest concern over years, continuous optimization of critical resources like CPU, MEMORY, effective QUERY writing and Storage for IO is necessary to satisfy the Clients requirements and keep the response-time and throughput inline with expectations.

I would like to write this article on IO/Storage as Disk waits are a huge performance bottleneck among critical resources. This problem becomes more apparent when the amount of data starts to grow so large that 99%+ logical IO is being impossible and some physical IO is always needed. Increase the number of available disk spindles (and thereby reduce the seek overhead) by either create devices in different disks or striping the disks besides portioning the table.

Let us first see the OS Level optimization options later see Sybase options.
If you do not need to know when files were last accessed (which is not really useful on a database server), you can mount your file systems with the -o noatime option. That skips updates to the last access time in inodes on the file system, which avoids some disk seeks.
On many operating systems, you can set a file system to be updated asynchronously by mounting it with the -o async option. If your computer is reasonably stable, this should give you better performance without sacrificing too much reliability. (This flag is on by default on Linux.)

Striping
Striping means that you have many disks and put the first block on the first disk, the second block on the second disk, and the N-th block on the (N MOD number_of_disks) disk, and so on. This means if your normal data size is less than the stripe size (or perfectly aligned), you get much better performance. Striping is very dependent on the operating system and the stripe size, so benchmark your application with different stripe sizes. The speed difference for striping is very dependent on the parameters. Depending on how you set the striping parameters and number of disks, you may get differences measured in orders of magnitude. You have to choose to optimize for random or sequential access.

For reliability, you may want to use RAID 0+1 (striping plus mirroring), but in this case, you need 2 × N drives to hold N drives of data. This is probably the best. However, you may also have to invest in some volume-management software to handle it efficiently.
A good option is to vary the RAID level according to how critical a type of data is. For example, store semi-important data that can be regenerated on a RAID 0 disk, but store really important data such as host information and logs on a RAID 0+1 or RAID N disk. RAID N can be a problem if you have many writes, due to the time required to update the parity bits.

Specific to Sybase (ASE way of accessing Storage):

To make the most of physical database tuning, understand these distinctions between logical and physical devices:
  • The physical disk or physical device is the hardware that stores the data.
  • A database device or logical device is all or part of a physical disk that has been initialized (with the disk init command) for use by Database Server. A database device can be an operating system file, an entire disk, or a disk partition.
  • A segment is a named collection of database devices used by a database. The database devices that make up a segment can be located on separate physical devices.
  • A partition is a subset of a table. Partitions are database objects that can be managed independently. You can split partitioned tables, so multiple tasks can access it simultaneously. You can place a partition on a specific segment. If each partition is on a different segment and each segment has its own database device, queries accessing these tables benefit from improved parallelism.

Improving performance by controlling object placement

Adaptive Server allows you to control the placement of databases, tables, and indexes across physical storage devices, which can improve performance by equalizing the reads and writes to disk across many devices and controllers. For example, you can:
  • Place database data segments on a specific device or devices, storing the database log on a separate physical device so that reads and writes to the log do not interfere with data access.
  • Spread large, heavily used tables across several devices.
  • Place specific tables or nonclustered indexes on specific devices. For example, you might place a table on a segment that spans several devices and its nonclustered indexes on a separate segment.
  • Place the text and image page chain for a table on a separate device from the table. The table stores a pointer to the actual data value in the separate database structure, so each access to a text or image column requires at least two I/Os.
  • Distribute tables evenly across partitions on separate physical disks to provide optimum parallel query performance and improve insert and update performance.

For multiuser and multi-CPU systems that perform a lot of disk I/O, be especially aware of physical and logical device issues and the distribution of I/O across devices:

·         Plan a balanced separation of objects across logical and physical devices.
·         Use enough physical devices, including disk controllers, to ensure physical bandwidth.
·         Use an increased number of logical devices to ensure minimal contention for internal I/O queues.
·         Determine and use a number of partitions that allows parallel scans and meets query performance goals.

If you experience problems due to disk contention and other problems related to object placement, check for and correct these issues:
·         Random-access (I/O for data and indexes) and serial-access (log I/O) processes use the same disks.
·         Database processes and operating system processes use the same disks.
·         Serial disk mirroring.
·         Database logging or auditing takes place on the same disk as data storage.


ASE provides options to Improve I/O performance by:
  • Using segments
  • Partitioning tables for performance
  • RAID devices and partitioned tables

To improve I/O performance in Adaptive Server, you can also try:
  • ·         Spreading data across disks to avoid I/O contention
  • ·         Isolating server-wide I/O from database I/O
  • ·         Separating data storage and log storage for frequently updated databases
  • ·         Keeping random disk I/O away from sequential disk I/O
  • ·         Mirroring devices on separate physical disks
  • ·         Using partitions to distribute table data across devices
A segment is a label that points to one or more logical devices. Use segments to improve throughput by:
  • Splitting large tables across disks, including tables that are partitioned for parallel query performance
  • Separating tables and their nonclustered indexes across disks
  • Separating table partitions and index across the disks
  • Placing the text and image page chain on a disk other than the one on which the table resides, where the pointers to the text values are stored.
In addition, you can use segments to control space usage:
  • Tables or partitions cannot grow larger than their segment allocation. You can use segments to limit the table or partition size.
  • Tables or partitions on other segments cannot use the space allocated to objects on another segment.
  • The threshold manager monitors space usage.
Partitioning a table can improve performance for several types of processes.
  • Partitioning allows parallel query processing to access each partition of the table. Each worker process in a partitioned-based scan reads a separate partition.
  • Partitioning allows you to load a table in parallel with bulk copy.
  • For more information on parallel bcp, see the Utility Programs manual.
  • Partitioning allows tou to distribute a table’s I/O over multiple database devices.
  • Semantic partitioning (range-, hash- and list-partitioned tables) improves response time because the query processor eliminates some partitions.
  • Partitioning provides multiple insertion points for a heap table.
The tables you choose to partition and the type of partition depend on the performance issues you encounter and the performance goals for the queries on the tables.

RAID

A striped redundant array of independent disks (RAID) device can contain multiple physical disks, but Adaptive Server treats such a device as a single logical device. You can use multiple partitions on the single logical device and achieve good parallel query performance.
To determine the optimum number of partitions for your application mix, start with one partition for each device in the stripe set. Use your operating system utilities (vmstat, sar, and iostat on UNIX; Performance Monitor on Windows) to check utilization and latency.

To check maximum device throughput, use select count(*), using the index table_name clause to force a table scan if a nonclustered index exists. This command requires minimal CPU effort and creates very little contention for other resources.

The Adaptive Server optimizer attempts to find the most efficient access path to your data for each table in a query by estimating the cost of the physical I/O needed to access the data, and the number of times each page must be read while in the data cache.

Performance is critically depends on access methods in ASE.
In most database applications, there are many tables in the database, and each table has one or more indexes. Depending on whether you have created indexes, and what kind of indexes you have created, the optimizer’s access method options include:·  
  • Table scan – reading all the table’s data pages, sometimes hundreds or thousands of pages.
  • Index access – using the index to find only the data pages needed, sometimes as few as three or four page reads in all.
  • Index covering – using only an index to return data, without reading the actual data rows, requiring only a fraction of the page reads required for a table scan.

Make sure your IO is controlled with the above options so as to use optimized way of the storage and better throughput and response-time. 



- Thanks for many blogs/sites, especially SAP/Sybase sites,  helps me write this besides my experience :)
.