How update operations are performed in ASE?
Adaptive Server handles updates in different ways, depending on the changes being made to the data and the indexes used to locate the rows. The two major types of updates are deferred updates and direct updates. Adaptive Server performs direct updates whenever possible.
Direct updates
Adaptive Server performs direct updates in a single pass:
· It locates the affected index and data rows.
· It writes the log records for the changes to the transaction log.
· It makes the changes to the data pages and any affected index pages.
There are three techniques for performing direct updates:
· In-place updates
· Cheap direct updates
· Expensive direct updates
Direct updates require less overhead than deferred updates and are generally faster, as they limit the number of log scans, reduce logging, save traversal of index B-trees (reducing lock contention), and save I/O because Adaptive Server does not have to refetch pages to perform modifications based on log records.
In-place updates
Adaptive Server performs in-place updates whenever possible.
When Adaptive Server performs an in-place update, subsequent rows on the page are not moved; the row IDs remain the same and the pointers in the row offset table are not changed.
For an in-place update, the following requirements must be met:
· The row being changed cannot change its length.
· The column being updated cannot be the key, or part of the key, of a clustered index on an allpages-locked table. Because the rows in a clustered index on an allpages-locked table are stored in key order, a change to the key almost always means that the row location is changed.
· One or more indexes must be unique or must allow duplicates.
· Updates and deletes that involve joins can be performed in direct, deferred_varcol, or deferred_index mode when the table being updated is the outermost table in the join order, or when it is preceded in the join order by tables where only a single row qualifies.
· The affected columns are not used for referential integrity.
· There cannot be a trigger on the column.
· The table cannot be replicated (via Replication Server).
An in-place update is the fastest type of update because it makes a single change to the data page. It changes all affected index entries by deleting the old index rows and inserting the new index row. In-place updates affect only indexes whose keys are changed by the update, since the page and row locations are not changed.
Cheap direct updates
If Adaptive Server cannot perform an update in place, it tries to perform a cheap direct update—changing the row and rewriting it at the same offset on the page. Subsequent rows on the page are moved up or down so that the data remains contiguous on the page, but the row IDs remain the same. The pointers in the row offset table change to reflect the new locations.
A cheap direct update, must meet these requirements:
· The length of the data in the row is changed, but the row still fits on the same data page, or the row length is not changed, but there is a trigger on the table or the table is replicated.
· The column being updated cannot be the key, or part of the key, of a clustered index. Because Adaptive Server stores the rows of a clustered index in key order, a change to the key almost always means that the row location is changed.
· One or more indexes must be unique or must allow duplicates.
· The affected columns are not used for referential integrity.
Cheap direct updates are almost as fast as in-place updates. They require the same amount of I/O, but slightly more processing. Two changes are made to the data page (the row and the offset table). Any changed index keys are updated by deleting old values and inserting new values. Cheap direct updates affect only indexes whose keys are changed by the update, since the page and row ID are not changed.
Expensive direct updates
If the data does not fit on the same page, Adaptive Server performs an expensive direct update, if possible. An expensive direct update deletes the data row, including all index entries, and then inserts the modified row and index entries.
Adaptive Server uses a table scan or an index to find the row in its original location and then deletes the row. If the table has a clustered index, Adaptive Server uses the index to determine the new location for the row; otherwise, Adaptive Server inserts the new row at the end of the heap.
An expensive direct update must meet these requirements:
· The length of a data row is changed so that the row no longer fits on the same data page, and the row is moved to a different page, or the update affects key columns for the clustered index.
· The index used to find the row is not changed by the update.
· The affected columns are not used for referential integrity.
An expensive direct update is the slowest type of direct update. The delete is performed on one data page, and the insert is performed on a different data page. All index entries must be updated, since the row location is changed.
Deferred updates
Adaptive Server uses deferred updates when direct update conditions are not met. A deferred update is the slowest type of update.
In a deferred update, Adaptive Server:
· Locates the affected data rows, writing the log records for deferred delete and insert of the data pages as rows are located.
· Reads the log records for the transaction and performs the deletes on the data pages and any affected index rows.
· Reads the log records a second time, and performs all inserts on the data pages, and inserts any affected index rows.
When deferred updates are required
Deferred updates are always required for:
· Updates that use self-joins
· Updates to columns used for self-referential integrity
· Updates to a table referenced in a correlated subquery.
Deferred updates are also required when:
· The update moves a row to a new page while the table is being accessed via a table scan or a clustered index.
· Duplicate rows are not allowed in the table, and there is no unique index to prevent them.
· The index used to find the data row is not unique, and the row is moved because the update changes the clustered index key or because the new row does not fit on the page.
Deferred updates incur more overhead than direct updates because they require Adaptive Server to reread the transaction log to make the final changes to the data and indexes. This involves additional traversal of the index trees.
For example, if there is a clustered index on title, this query performs a deferred update:
update titles set title = "Portable C Software"
where title = "Designing Portable Software"
Deferred index inserts
Adaptive Server performs deferred index updates when the update affects the index used to access the table or when the update affects columns in a unique index. In this type of update, Adaptive Server:
· Deletes the index entries in direct mode
· Updates the data page in direct mode, writing the deferred insert records for the index
· Reads the log records for the transaction and inserts the new values in the index in deferred mode
Deferred index insert mode must be used when the update changes the index used to find the row or when the update affects a unique index. A query must update a single, qualifying row only once—deferred index update mode ensures that a row is found only once during the index scan and that the query does not prematurely violate a uniqueness constraint.
The update in below pix changes only the last name, but the index row is moved from one page to the next. To perform the update, Adaptive Server:
1. Reads index page 1133, deletes the index row for "Greene" from that page, and logs a deferred index scan record.
2. Changes "Green" to "Hubbard" on the data page in direct mode and continues the index scan to see if more rows need to be updated.
3. Inserts the new index row for "Hubbard" on page 1127.
Below pix shows the index and data pages prior to the deferred update operation, and the sequence in which the deferred update changes the data and index pages.
Assume a similar update to the titles table:
update titles
set title = "Computer Phobic's Manual",
advance = advance * 2
where title like "Computer Phob%"
This query shows a potential problem. If a scan of the nonclustered index on the title column found "Computer Phobia Manual," changed the title, and multiplied the advance by 2, and then found the new index row "Computer Phobic's Manual" and multiplied the advance by 2, the advance wold be very skewed against the reality.
A deferred index delete may be faster than an expensive direct update, or it may be substantially slower, depending on the number of log records that need to be scanned and whether the log pages are still in cache.
During deferred update of a data row, there can be a significant time interval between the delete of the index row and the insert of the new index row. During this interval, there is no index row corresponding to the data row. If a process scans the index during this interval at isolation level 0, it will not return the old or new value of the data row.