Ixora News - September 2001

Welcome to the September 2001 issue of Ixora News.
To unsubscribe send a blank email to quit-news@ixora.com.au.

In this issue ...

Over committed   - Are your commits hurting you?
Making a hash of it   - Tuning hash joins with event 10104
America under attack   - Free performance audits for former WTC tenants
Performance internals   - An Ixora seminar following OpenWorld in December

Over committed

Many applications commit more frequently than necessary, and their performance suffers as a result. In isolation a commit is not a very expensive operation, but lots of unnecessary commits can nevertheless cause severe performance problems. While a few extra commits may not be noticed, the cumulative effect of thousands of extra commits is very noticeable. Try this test. Insert 1,000 rows into a test table -- first as a single transaction, and then committing after every row. Your mileage may vary, but these results on an otherwise idle system show a performance blowout of more than 100% when committing after every row.

The performance difference observed in such tests may be due three possible factors. Each of these factors can have a major performance impact, particularly in high concurrency environments. First, unnecessary commits cause extra redo generation and can trigger contention for the redo allocation latch. Second, users may have to wait for LGWR to perform physical I/O each time the application commits rather than just once per user transaction. And third, extra commits normally cause unnecessary commit cleanout activity, and thus much higher CPU usage and associated delays.

1. Redo generation

Unnecessary commits generate extra redo at both the beginning and end of each transaction. When a transaction begins, a rollback segment is allocated and a change is made to the transaction table and transaction control record in the rollback segment header block. Similarly, when a transaction commits, a change is made to the transaction status and commit SCN recorded in the rollback segment header block. These are database block changes for which redo must be generated -- the redo opcodes are 5.2 and 5.4 respectively. Unnecessary commits can also cause extra redo generation for data block ITL changes if distinct transactions change multiple rows in the same block. Apart from the extra CPU time needed to generate this redo and apply the changes, the extra redo generation can greatly increase the volume of log file I/O.

The redo generated for a commit (often called a commit marker) normally occurs in a separate redo record and thus requires an extra allocation of space in the log buffer under the protection of the redo allocation latch. In general, LGWR is signaled to write just after a commit marker has been copied into the log buffer, so unnecessary commits also increase redo wastage which further increases the volume of log file I/O. Moreover, LGWR needs the redo allocation latch before it writes to determine which log buffer blocks are available for writing, and again after each write to free the space in the log buffer. Thus frequent commits stress the solitary redo allocation latch in several ways. If each user session commits frequently and if there are many user sessions committing concurrently, contention for the redo allocation latch is inevitable. Of course, the severity of that contention depends on the commit rate and degree of concurrency, but it can be a major scalability limitation.

2. Syncing LGWR

When a session commits it must wait for LGWR to flush its commit marker to disk before returning any response to the user. This delay is called a log file sync wait. In general, log file sync waits are brief and one such wait per user transaction will not be noticed by the end user. However, if each user transaction consists of numerous database transactions because of the introduction of extraneous commits by the application, then the combined log file sync time can easily dominate the response time of the user transaction -- particularly if no other physical I/O is required. It is not uncommon to see systems where the total log file sync time accounts for more than 80% of the overall user response time.

Frequent commits also make LGWR over active in that it is constantly signaled to write out tiny amounts of redo. This causes the operating system priority of the LGWR process to be degraded. Then if system CPU usage is high, LGWR can be starved of CPU time with the result that log file sync waits take longer and can even begin to timeout.

The timeout interval for log file sync waits is set at 1 second. If that is exceeded then the waiting process wakes up and takes the redo allocation latch again to check whether the required redo has in fact been written out (because LGWR might not yet have posted the waiting process). Thus log file sync timeouts exacerbate redo allocation latch contention (and vice versa).
Interestingly, commits from within PL/SQL procedures and anonymous blocks do not normally wait for LGWR to flush their commit marker to disk. Instead, a single log file sync wait occurs at the end of the execute call if necessary.

3. Commit cleanout

Oracle's row-level locking mechanism maintains transaction and row lock information together with the data in the data blocks. After a transaction has committed, that row-level locking information can be cleaned out. The cleanout can be delayed indefinitely, but it is most efficient if it is done at commit time. Therefore, when a transaction commits, it revisits the blocks that it has modified most recently and attempts to clean out its row-level locking information in those blocks. This commit cleanout is limited to 10% of the cache and is subject to numerous restrictions. In particular, the blocks must not yet have been written to disk by DBWn or modified by another transaction.

While commit cleanouts are more efficient than delayed block cleanouts, their cost in terms of CPU usage and buffer cache concurrency control is far from trivial. If frequent commits cause the same blocks to be cleaned out repeatedly by the commit cleanout mechanism, then the total cost of the commit cleanouts can easily exceed the cost of equivalent delayed block cleanouts. In the 1,000 row test above we used PL/SQL to avoid extra log file sync delays, and an otherwise idle system to avoid redo allocation latch contention. Thus much of the delay observed in that test was due to the CPU cost of additional commit cleanout activity.

Given that frequent commits damage performance, why do developers so often use a series of small transactions in preference to fewer larger transactions? There are two main reasons. In OLTP applications developers often introduce extra commits in the mistaken belief that they are needed to avoid lock contention and deadlocks. That may be true in non-Oracle database that do not use row-level locking, but in Oracle any lock contention reflects true data contention and committing prematurely to avoid it actually compromises data integrity as well as performance. (This issue is superbly treated in Tom Kyte's new book.) In batch processing applications the main reason why developers introduce extra commits is to reduce rollback segment space usage. Again, this is a false economy. Apart from degrading performance, the periodic commits can compromise data integrity by allowing concurrent processes to see an incomplete set of changes. Rollback segments should be sized to reflect the requirements of the transactions, not vice versa. Transaction scope should be dictated by the requirements of the business rules and the application logic, and nothing else. Every COMMIT that is not absolutely necessary is absolutely superfluous at best, and may cause data integrity and performance problems at worst.

Making a hash of it

What is the best join mechanism for large joins? Of course, for the best "first rows" response time you cannot go past a nested loops join using an index-based access path to the inner relation. Beyond that, conventional wisdom suggests that a hash join may be appropriate if one of the row sources is relatively small, but otherwise a sort merge join is normally recommended. This reluctance to recommend a hash join, unless one of the row sources is small, reflects an historical distrust of hash joins that is no longer appropriate.

A lot of people got their fingers burnt with hash joins in release 7.3. That was in part due to optimizer issues, but more often due to the fact that their hash joins were poorly tuned. Hash join performance is very sensitive to the setting of the hash_area_size and hash_multiblock_io_count parameters. The problem in release 7.3 was that the default value of 8 blocks for the hash_multiblock_io_count parameter was much too large for the modest hash_area_size settings that were typically being used, and there was no adequate user documentation to explain how to tune these parameters. The problem was alleviated somewhat under release 8.0 (and 7.3.4) by reducing the default hash_multiblock_io_count value to just 1 block. But it has been resolved almost entirely by making the hash_multiblock_io_count parameter self-tuning under Oracle8i, and accordingly it has become a hidden parameter under Oracle9i.

Under Oracle8i and Oracle9i hash joins perform very well without careful tuning, and often much better than equivalent sort merge joins. So if ever you are tempted to use a sort merge join, try making a hash of it instead. The only parameter that you may need to tune is the hash_area_size. For small hash joins, the hash_area_size should be large enough allow the hash join to be completed in memory. For this to be possible, the smaller row source must be comfortably less than 75% of the hash_area_size because no more than 75% of the hash_area_size may be used to buffer row data. For larger hash joins that cannot be completed in memory the rows from each of the row sources are sorted into hash buckets (called partitions) based on the hash value of the join key. The sorted data is buffered in memory and written to temporary segment storage as required. Rows from the corresponding pairs of hash buckets are then hash joined. The hash_area_size should be large enough to ensure that each of these second phase hash joins can be completed in memory in one pass if possible.

The numerical event 10104 can be used to check whether the hash_area_size is large enough for a particular hash join. This event causes detailed hash join statistics to be written to the process trace file. If the trace file contains any statistics about "PHASE 2" of the hash join, that indicates that the hash join was too large to be completed in memory. If the trace file contains any PHASE 2 statistics for which the value "Number of rows left to be iterated over (end of function)" is non-zero, that indicates that one or more second phase hash joins were not able to be completed in memory in one pass. In general, the correct tuning action is simply to make the hash_area_size larger. As with sort memory, this memory is only allocated if needed and is released as soon as the hash join has completed. Even if memory is severely constrained such that it is not possible to fully optimize a large hash join, the hash join will still commonly out-perform an equivalent sort merge join using the same amount of memory.

America under attack

Ixora is offering free performance audits to companies directly affected by the terrorist strike of September 11 in New York and Washington. Once your database system is operational again, we will investigate its performance for up to one week and make detailed performance tuning recommendations free of charge. This offer will remain open for 6 months. Please email ixora@ixora.com.au to take up the offer.

Performance internals

Ixora's principle consultant, Steve Adams will be presenting a special one day seminar at the Marriott Hotel in San Francisco on Friday December 7, just after Oracle OpenWorld. The seminar is called "Oracle Internals and Performance Tuning". Topics to be covered include:
How the optimizer works
Tuning execution plans
Optimizing data density
Tuning data caching
Diagnosing and avoiding deadlocks
Library cache performance
Log buffer performance
Please see the Ixora Seminars page on the Ixora web site for more information.


Copyright © Ixora Pty Ltd Send Email Home