| Fingerprints | - Using V$LATCH_MISSES to characterize latching problems |
| Buying time | - How purchasing decisions affect Oracle CPU usage |
| No more double trouble | - The Oracle8i snapshot problems are fixed |
Latch waits only begin to be a problem when there are frequently two or more processes waiting for one latch. Because when that latch becomes free, the waiting processes will contend for the latch. Only one can be immediately successful. The other waiting processes will use CPU time ineffectively as they continue to spin and wait. Latch contention intensifies if the number of waiting processes continues to increase, but it dissipates quickly when the underlying problem is resolved.
There are three types of latch contention problems:
This type of latch contention is sensitive to the value of the _spin_count parameter. If the spin cycle is shorter than the length of time for which the latch is typically retained, then if two processes request the latch in quick succession the second process will stop spinning and go to sleep before the first process has finished using the latch. In these cases adjusting the spin count can be very effective. But this should only be done if it is not feasible to reduce the average time of latch retention.
If the contention affects a set of child latches evenly, such as the cache buffers lru chain latches, then raising the number of child latches will normally help. But the best and often the only way to address this problem is to reduce the number of gets on the latch. Raising the spin count does not help this type of latch contention at all. It may make the spin hit rate statistics look better and reduce dead time on the latch slightly, but the increased cost of spinning will more than negate the savings. If for some reason you cannot reduce load on the latch, and if it is not a "long" latch for which latch posting is enabled by default, then a better way to reduce latch dead time is to set _latch_wait_posting to 2 to enable the latch wait posting facility for all latch types.
These statistics may not be entirely accurate, because they are not always updated under the protection of a latch. Nevertheless, they are not only sufficient to show whether a latch contention problem is due to long retention of a latch or high demand for that latch, but they also show exactly which location in the Oracle kernel is causing the problem. Thus, the V$LATCH_MISSES data is an excellent fingerprint for any latch contention problem. Here are two rather unusual examples from tuning assignments that we have completed in the last month:
What you may not realize is that main memory access is very expensive in terms of CPU cycles, and accounts for most of Oracle's CPU usage. Memory operates at much lower hardware clock speeds than CPUs do, and there is also a recovery time component required after each memory access before that memory bank can service another memory access. This is why computer manufacturers put so much effort into their CPU caching technology. Three types of information are cached in each CPU: user data, executable code, and page table entries.
Page table entries are cached by each CPU in a translation lookaside buffer (TLB). Before an instruction can be executed, all memory references have to converted from virtual memory addresses to physical memory addresses. If the page table entries required for address translation are not in the TLB, then they have to be fetched from the page tables in main memory. Because main memory access is so slow, system performance is very sensitive to the TLB hit rate. You should therefore attempt to optimize the TLB hit rate by keeping the page tables small, and making sure that you don't do anything to defeat process to processor affinity. To keep the page tables small you should be careful not to make the SGA and PGAs any bigger than necessary; use a large page size for the oracle executable if that is supported; and ensure that intimate shared memory is used where possible. Intimate shared memory allows all Oracle processes to share a single set of page table entries for the SGA. This greatly increases the TLB hit rate, and can have a performance impact of 30% or more.
Not only are page table entries cached in the TLB, but portions (called cache lines) of user data and executable code are cached as well. On multi-CPU systems, sophisticated mechanisms are needed to maintain consistency between main memory and these CPU caches. The performance cost of these cache consistency mechanisms increases exponentially with the number of CPUs. This is the primary reason for the scalability limitation of SMP architectures. You can minimize the performance cost of these cache coherency mechanisms by buying fewer, faster CPUs where possible.
Buying fewer, faster CPUs also reduces the risk of memory access contention between CPUs. CPUs sometimes need to wait to access main memory because the target memory bank is busy servicing (or has recently serviced) a memory access by another CPU. You can further reduce the risk of memory access contention between CPUs by buying a large number of small memory boards. Beyond that, you can reduce the cost of memory access for each CPU by buying the fastest available memory. However, if fast memory implies only a few large memory boards, and if you expect to scale beyond six CPUs, then prefer slower memory in more, smaller boards. Note that there are pitfalls associated with mixing different types of memory in the same system. Avoid this, unless your hardware vendor assures you that it is OK.
These purchasing decisions, together with factors such as the breadth and speed of the memory bus, have a significant impact on memory access performance and thus on the amount of CPU time that Oracle will use. So, the right time to start cutting down on Oracle CPU usage is before you order the hardware!
To support this, the first DML on a snapshot master after a snapshot refresh actually now changes the metadata for the base table to record the SCN of the DML operation in TAB$.SPARE3. Snapshot refresh operations record their SCN in SUM$.LASTREFRESHSCN. When a refresh is due, if the last refresh SCN is still higher than the SCN in TAB$, then no work is needed. However, maintaining these SCNs makes it necessary to read and update the metadata for simple DML operations, and that means taking X locks in the library cache and the dictionary cache. These locks were the cause of both the problems we described in August, and also caused a variety of library cache deadlock problems.
Oracle have fixed the null-refresh optimization code for 9i by calling the dictionary cache primitives directly to update the metadata, so that the X lock in the library cache will not be needed any more. The changes are not however feasible to backport to Oracle8i, so two short-term fixes have been provided. Firstly, under bug 1348501 the behavior of calling the dictionary cache primitives directly has been simulated, although the library cache locks are still taken. This avoids the performance problems associated the invalidation of dependent SQL statements when the SCN metadata is updated. This fix is part of 8.1.6.3 and available as a patch on some platforms at 8.1.6.2.
Secondly, for those struggling with snapshot related deadlock problems an event has been introduced at 8.1.7.1 under bug 1376209 to disable the troublesome SCN metadata updates. The event 32333 can be set at 6 different levels to disable certain or all aspects of the null-refresh optimization. This fix is available as a patch at 8.1.6.3 on several platforms. Please contact Oracle support (not Ixora) for help with using this event.
| Copyright © Ixora Pty Ltd |
|