| Basic queuing | - An undocumented SQL statement for implementing a simple queuing mechanism. |
| Tuning the redo copy latches under 8.1 | - There are big changes in how the redo copy latches are used in Oracle 8.1 |
| The 8.1 log_buffer "maximum" | - A shrewd documentation illusion. |
| objects_on_hot_latches.sql | - An APT script explained. |
| Scaling Oracle8i | - An important new Oracle performance book. |
The first big change in the use of the redo copy latches under Oracle 8.1 is that they are now held much longer by foreground processes. Previously the redo copy latch was released as soon as the copy into the log buffer was complete. Under Oracle 8.1 the redo copy latch is retained until the change vectors have been applied to the target buffers. The redo can then be marked as valid in the log buffer prior to releasing the redo copy latch. If the application of the change vectors fails (possibly due to a data corruption or Oracle bug) then the error will be caught before the logging of invalid redo jeopardizes recoverability. However, from a performance point of view, this increases the code path for the redo copy latches and thus increases the risk of sleeps on these latches.
The second big change in the use of the redo copy latches under Oracle 8.1 is that LGWR no longer waits for the redo copy latches directly. Instead, LGWR inspect the redo copy data structures without taking the latches to determine whether there are any incomplete copies into the redo buffers that it intends to write. If so, LGWR sleeps on a LGWR wait for redo copy wait event, and is posted when the required copy latches have been released. So, if a foreground process sleeps while holding a redo copy latch, the LGWR activity will not magnify the problem in the latching statistics as under previous releases.
Tuning
1. From Oracle 8.1, sleeps against the redo copy latches
do now indicate a higher degree of concurrency than supported by the current number of latches.
In such cases, the number of redo copy latches may be raised, using the _log_simultaneous_copies parameter.
Note, however, that there is a fixed limit of 32 redo copy latches,
and that some platforms support shareable redo copy latches.
2. Some waits on the LGWR wait for redo copy wait event should be expected.
The risk is proportional to the sync write rate and the degree of redo generation concurrency,
and may be exacerbated if foreground processes are starved for CPU time.
If waits on this event are unacceptably high, check for log file syncs by DBWn and tune down checkpointing if appropriate.
Otherwise, application tuning to reduce the commit rate and redo generation will be required.
It seems that the documentation is making the point that a large log_buffer is unnecessary. In fact, the suggested starting value for busy systems of 65536 bytes is actually the minimum value possible on most platforms. (The minimum is 4 times the maximum database block size supported on the platform.) Indeed, a large log buffer is not only normally unnecessary and a waste of memory, but can in fact be bad for performance because, in the absence of a rapid commit rate, a large log buffer increases the average log file sync time. Many readers of the documentation have concluded that the apparent new maximum for the size of the log buffer was intended to prevent such performance degradation.
Default value: maximum: 500K or 128K * cpu_count, whichever is greater ... In a busy system, the value 65536 or higher would not be unreasonable.
Others, however, have been quick to cite the chief situation in which a large log buffer is needed - namely, to reduce contention for the redo allocation latch under intensive redo generation. The apparent limit on the size of the log buffer would be inadequate for many such cases.
Now look more closely. "500K or 128K * cpu_count, whichever is greater" is actually the "default value" for the log_buffer parameter. The text "maximum:" is redundant, and just conveys the same meaning as the words "whichever is greater". If the log_buffer parameter were omitted from the init.ora file, the value of the parameter would default to either 500K or 128K * cpu_count, whichever is greater. That is, it would take the "maximum" of these two values. (Incidentally, the first value is actually 512K, not 500K). Thus there is in fact no limit on the size of the log buffer, and it is still possible to use a large log buffer when necessary. Could it be that the documentation is intentionally misleading in order to protect people from using too large a log buffer?
This script is intended for cases in which the contention for the child library cache latches, as shown by V$LATCH_CHILDREN.SLEEPS, is not evenly distributed. That is, when at least one of the latches has seen far more sleeps than the others. Such latches are said to be "hot". The script regards any latch that has seen more than twice the average number of sleeps as hot.
Hot latches are due, at least in part, to intensive access to particular library cache objects. These intensively accessed objects are called the "hot" objects. To resolve the contention, it is necessary to identify the hot objects protected by the hot latches, and to control the mapping of hot objects to latches. There are two cases to consider. In some cases, the contention is due to the fact that multiple hot objects have coincidentally been mapped to the same latch. In other cases, the contention is due to a single extremely hot object, normally an SQL statement.
The objects_on_hot_latches.sql script identifies the objects protected by the hot latches. However, there is a significant complicating factor - namely, that the mapping of library cache objects to latches is not static. It is therefore possible for sleeps against a particular child latch to be due to a coincidence of hot objects on that latch for some period earlier in the life of the instance, but for there no longer to be any such coincidence. This is due to the dynamic resizing of the library cache hash table. Therefore, before objects_on_hot_latches.sql can be used effectively, it is best to stabilize the size of the library cache hash table, possibly using kgl_bucket_count.sql, and thus to prevent the dynamic remapping of objects to latches.
If the problem continues to be a coincidence of hot objects on particular child latches, then it can be addressed by renaming some of those objects carefully, so that their new hash values map onto cold latches. However, it is normally just as effective, and much less intrusive, to permute the effect of the hashing algorithm by adjusting the number of child latches using the _kgl_latch_count parameter.
The more difficult problem to resolve is that of a single extremely hot object. In this case it is necessary to modify the application to use a set of differently named equivalent objects that map to different latches. In the case of an extremely hot SQL statement, it is simply a matter of pseudo randomly introducing one of a set of different comments into the statement text so that each variant of the statement will hash to a different latch. In this way the load for the statement can be borne by multiple latches. We have applied the same principle to a hot sequence, and it could conceivably have other applications as well.
| Copyright © Ixora Pty Ltd |
|