| From: | Steve Adams |
| Date: | 29-Aug-2000 10:58 |
| Subject: | Library cache latch contention problem |
Thanks for all the information.
I suspect that your shared pool may be too large. The output of the
shared_pool_free_lists.sql script from the Ixora web site would help, as would
the output of the latch_where.sql script.
An increase in session_cached_cursors (and maybe open_cursors) might help. Yes,
this parameter is a good simulation of cursor reuse. Please run the
session_cursor_cache.sql script to check, and please also supply the relevant
statistics from V$SYSSTAT.
Also, check for high version counts using the version_count.sql script. For
completeness, please also check for invalidations in V$LIBRARYCACHE.
The spin_count count change can go either way, and it is hard to say from this
distance.
PL/SQL tables reside in session memory and thus are not protected by latches.
However, execute calls do require library cache (and sometimes shared pool)
latching, so a high execute rate can be a problem.
Yes, fewer, faster CPUs suffer less cache line contention and memory access
contention which reduces the impact of latch contention.
-----Original Message-----
Sent: Sunday, 27 August 2000 2:51
I can only hope that my question will be challenging enough for excusing
the fact of disturbing You again. Please note that although I took the
risk of giving You perhaps to many details about the problem I have, I
will certainly be content with a few hints showing me Your opinion or
some 'further investigation' directions.
So, first a description of our system characteristics and behaviour :
Host (Server): DEC TurboLaser (8400), SMP 10 alpha processors
Oracle Version: 7.3.4, 300 GB Enterprise Customer Care Database
Monitoring: ORACLE 'monitoring stuff', some of Your APT (which don't
require access to X$) and a really 'special' monitoring&tuning tool,
Precise/SQL from PreciseSoft.
Application Tx-Profile: komplex OLTP, 3-tier
Bottleneck Analysis:
Typical behaviour: I/O bound (due to hot-spots), peak: I/O bound and
CPU-bound but no theatening CPU-waits (CPU queue max of 4 times the
#CPUs, and we have a lot of SQL gremlins to tune to mitigate this bottleneck)
ORACLE Bottleneck Analysis:
Response time (waits) is dominated by following events:
'db_sequential_read' (I think these are the hot-spots) and 'latch_free'.
Precise/SQL shows us thet latch contention escalates especially in peak
hour behaviour.
Now, for some bottlenecks (CPU, I/O) we have some solutions but we are
not able to solve the latch contention problem. In order to give You a
better chance for a 'remote diagnose' please notice the following details:
1. ALL database accesses are PL/SQL (Packages) based. Even the primitive
operations INSERT, UPDATE, DELTE and primary key-based or array SELECTS go
through PL/SQL packages. So ALL database requests are something of the
form (PL/SQL calling block):
'result := SOME_PACKAGE.some_stored_procedure (its_signature);'
2. ALL packages, sequences, triggers are pinned, all SQLs uses host variables
3. the application is C++ and accesses the database packages using only
ONE generic class; so for every session there is an object of this class
which sends for the lifetime of the session a 'stream' of calls of the
type I described above, using a single cursor (I am not sure if the
cursor is opened and closed for every call, but there is for sure a
'parse', 'bind' and 'execute' OCI call for each package-based stored
procedure). Supported by the Precise/SQL tool we identified that these
PL/SQL calling blocks are the statements which have the most
'latch_free' waits (these calls/cursors are run-time and so are not pinned).
4. SESSION_CACHE_CURSORS is set to 40
5. Shared Pool is 600 MB
6. The output from the APT latch-Scripts is:
LATCH TYPE IMPACT SLEEP RATE WAITS HOLDING LEVEL
------------------------------------- ----------- ---------- ------------- -----
library cache 229572846 3.99% 162843 5
cache buffers chains 5212093 0.01% 0 1
shared pool 2047662 1.90% 273447 7
redo allocation 1955246 0.39% 0 7
row cache objects 1926733 0.58% 1391 4
system commit number 1630958 0.17% 22244 8
cache buffers lru chain 1012002 0.49% 245054 3
latch wait list 515318 0.24% 1091 9
enqueue hash chains 314375 0.08% 183 4
undo global data 284642 0.18% 0 5
enqueues 146500 0.06% 26345 5
global tx hash mapping 104841 0.29% 0 1
multiblock read objects 65429 0.01% 6990 3
sequence cache 38349 0.05% 1 8
session idle bit 31677 0.01% 0 1
transaction allocation 28941 0.03% 54 8
session allocation 25348 0.02% 0 5
dml lock allocation 24348 0.02% 0 3
list of block allocation 15194 0.04% 0 3
messages 14276 0.02% 0 8
modify parameter values 6670 1.31% 0 0
process allocation 1465 1.06% 0 0
user lock 597 0.11% 0 3
library cache load lock 188 0.01% 0 5
global tx free list 78 0.00% 26 2
cache buffer handles 46 0.00% 1 3
NLS data objects 16 80.00% 0 1
NAME SIMPLE_GETS SPIN_GETS SLEEP_GETS
-------------------------------- -------------------- -------------- --------------
archive control 221 100.00% 0 0.00% 0 0.00%
cache buffer handles 721558 100.00% 0 0.00% 7 0.00%
cache buffers chains 48357994431 99.99% 2958784 0.01% 1950363 0.00%
cache buffers lru chain 96277511 99.64% 122345 0.13% 223243 0.23%
dml lock allocation 51931037 99.99% 230 0.00% 6731 0.01%
enqueue hash chains 210806511 99.94% 52789 0.03% 83669 0.04%
enqueues 124326527 99.96% 3263 0.00% 43471 0.03%
global transaction 150406148 100.00% 0 0.00% 0 0.00%
global tx free list 3991908 100.00% 0 0.00% 26 0.00%
global tx hash mapping 20448295 99.82% 3444 0.02% 33868 0.17%
ktm global data 18969 100.00% 0 0.00% 0 0.00%
latch wait list 124983046 99.86% 13956 0.01% 167504 0.13%
library cache 2753968612 97.72% 9427783 0.33% ####### 1.94%
library cache load lock 1555666 100.00% 2 0.00% 47 0.00%
list of block allocation 19309477 99.98% 154 0.00% 3717 0.02%
messages 36205723 99.98% 729 0.00% 4885 0.01%
modify parameter values 123137 99.68% 3 0.00% 395 0.32%
multiblock read objects 301900707 99.99% 1427 0.00% 18331 0.01%
NLS data objects 4 80.00% 0 0.00% 1 20.00%
process allocation 67320 99.47% 2 0.00% 354 0.52%
redo allocation 294725487 99.68% 285652 0.10% 668334 0.23%
row cache objects 161683191 99.67% 74501 0.05% 453653 0.28%
sequence cache 39934606 99.96% 4974 0.01% 10094 0.03%
session allocation 59910657 99.99% 834 0.00% 6140 0.01%
session idle bit 276990602 100.00% 381 0.00% 8487 0.00%
session switching 20121 100.00% 0 0.00% 0 0.00%
shared pool 40841892 99.26% 9169 0.02% 296959 0.72%
sort extent pool 33474 100.00% 0 0.00% 0 0.00%
system commit number 540556635 99.90% 32635 0.01% 520722 0.10%
transaction allocation 44332826 99.98% 263 0.00% 6591 0.01%
undo global data 93704040 99.83% 53278 0.06% 101724 0.11%
user lock 254690 99.94% 37 0.01% 123 0.05%
NAME SPIN_GETS SLEEP_GETS HIT_RATE
--------------------------- --------- ---------- ---------------
library cache 9372958 54465477 14.68%
cache buffers chains 2955952 1946761 60.29%
redo allocation 284592 666259 29.93%
system commit number 32438 518247 5.89%
row cache objects 74170 451718 14.10%
shared pool 9149 296558 2.99%
cache buffers lru chain 122011 222637 35.40%
latch wait list 13877 166780 7.68%
undo global data 53030 101356 34.35%
enqueue hash chains 52463 83222 38.67%
enqueues 3238 43245 6.97%
global tx hash mapping 3417 33583 9.24%
multiblock read objects 1425 18271 7.23%
sequence cache 4951 10063 32.98%
session idle bit 381 8450 4.31%
dml lock allocation 230 6715 3.31%
transaction allocation 262 6580 3.83%
My 'BIG Question' is the following:
What could be the cause of such a severe 'library cache latch'
contention, i.e demand and 'hold-time' respective? Is it parsing
(despite of pining and SESSION_CACHE_CURSORS), is it the allocation of
PL/SQL tables (which are intensively used by the stored procedures
signatures, or the 'execution' of the PL/SQL code itself due to the
PL/SQL engine ('an inherent contention') ?
Should we:
reduce the Shared Pool Size ?
replace the Stored Procedures which fetch the data with REF CURSOR fetches.
Increase SESION_CACHE_CURSORS (has this parameter the same effect as
cursor re-use ? , which we vcan't use because we have only one C++
cursor in the class)
Reduce the 'spin-count' (it is set now on default) also considering the
observed CPU-saturation during peak hours ?
In this context I just want to further ask You if You think that the
following assertion may hold: when going from a SMP host with fewer
but heavier CPUs (like the Dec) to a SMP host with more but ligther CPUs
(like the Sun E10K) is it likely that the latch contention problem will
escalte due to its inherent 'single thread bound' character (or
'common-sense queueing theory')?