Questions and Answers

CPU Usage

?

System CPU time

3 October 1999

What tools do I use to determine what process causes increases in system time? I am on Digital Unix 4.0d running on Alpha 8400.

Will increasing the min_free_list (or similar) or decreasing the UBC cache help reduce the system time from the following vmstat snapshot? I know you don't have enough info here ... but please point me in the right direction.

  procs    memory         pages                          intr        cpu
  r  w  u  act  free wire fault cow zero react pin pout  in  sy  cs  us  sy  id
 71109 33  231K   98  24K 225M  32M  63M   2M  23M  69K 152  7K  2K  11   4  86
 71104 37  231K  105  24K 1966  113 1664   45   99    0 983 27K  7K  29  22  48
111105 35  231K   96  24K 3982  240 1182  140  130    0  1K 31K  9K  23  26  51
 91103 37  231K  119  24K  15K 3177 6147    0 2514    0 928 35K  7K  40  42  18
111105 35  231K  181  24K 4975  486 1745   10  372    0 806 40K  9K  41  28  31
101103 34  231K  156  24K 6618  519 2760  514  354    0 728 27K  6K  36  27  36
 51108 34  231K  176  24K 3896  109 1350    8  181    0 633 23K  6K  31  24  46
111103 32  231K  310  24K 2208  149 1314    4  129    0 590 29K  7K  32  28  40
 81104 34  231K  137  24K 2989  248 1351    3  169    0 679 26K  6K  46  19  34
? A common misconception about system mode CPU usage is that it is due to the activity of system processes rather than user processes. This is not correct. The majority of system time is normally accounted for by user processes executing in system mode. Whenever a user process performs a true system call, the CPU switches into system mode in order to be able to access and update kernel data structures that the user process would not otherwise be able to see or change. For example, a simple read or write system call uses a fair amount of system mode CPU time.

The sy column under the intr heading in vmstat shows the number of system calls per second over the interval. If all system calls used exactly the same amount of CPU time, then there would be a perfect correlation between that column and the system mode CPU usage column. However, the CPU usage of a system call is heavily dependent on what call it is, and on the kernel workload at the time, and various features of the user environment. If system mode CPU usage seems out of proportion with user mode CPU usage, you need to narrow down which factor is to blame for the difference. It can be an I/O intensive activity such as backups using a lot of system mode CPU time for read and write calls. It can be due to contention for a kernel data structure such as the semaphore data structures because of heavy concurrent load. This would result in each semop() call using more CPU time than normal. Or it can be that fork() calls are taking a long time because the parent process has a badly fragmented memory map, due to a history of poor memory management within the process. I suppose what I am saying is that this is a very difficult thing to track down, and fix.

Firstly, you need to have a good idea as to what is the "normal" user to system CPU usage ratio for your system under a particular load. If you then notice a variation, your first step should be to see whether it can be accounted for by just one or a few processes. If not, you can try to trace which type of system call is responsible. Under HP-UX, the kernel is instrumented to allow this - I'm not sure whether it is possible under Digital Unix, or how you would do it. If you can nail it down to a particular process, you may them be able to profile that process and see where it is burning its CPU time.

 
 
?

Process binding

15 October 1999

I'm going to be involved with the implementation of a batch processing systems on a four or six way HP-UX 10.20 box using Oracle 7.3.4. The system consists of three processes that receive files and translate them to a different format. The translated files are then loaded into the database by three daemon-like processes, and the data is then validated and manipulated. In order to minimize the context switching overhead, we are going to bind the translate processes to one or more processor. I have read in Oracle8 & Unix Performance Tuning that you should bind the background Oracle server processes, with the exception of DBWR and LGWR. Do you concur with this?
 
? If I understand you correctly, the translate process is C code that you are developing, and you are intending to use mpctl () to do the binding? If so, you should check with HP, because last time I talked with them about this, I was told that it was undocumented and unsupported - that is, not for customers to use.

I disagree strongly with the idea of binding an Oracle process that is not also in the real-time priority class. However, I have seen dramatic improvements in performance in some cases from making the key Oracle background processes real-time without binding. Foreground Oracle processes should not be bound or made real-time, unless the entire instance is real-time.

 
 
?

Process binding

5 November 1999

Could you elaborate on your comments in your 15 October answer on process binding? I am interested in the part where you said, "I have seen dramatic improvements in performance in some cases from making the key Oracle background processes real-time".

I thought Oracle always recommends leaving all priorities at default. Which background processes have given rise to the observed improvement? Why does it work? Do you invoke it via HP's rtprio command?

 
? I have done this with LGWR and DBWR in most cases. In fact, I did it just last week on a SAP site still running 7.2.3 on HP. Yes, using rtprio. The rationale is explained in my book. Basically, it reduces the IPC latencies significantly.
 
 
?

log file sync waits

24 November 1999

We are benchmarking an insert-intensive application, using Oracle 8.0.5 on HP-UX. We would like to get every ounce of performance possible. Presently our big issue is log file sync waits. What can be done about this? A full report.txt is attached.
 
? Firstly, try to limit commits to an absolute minimum in the application. Commit only where it is essential for the application logic. Also, consider running LGWR as a real-time priority process. Use rtprio with a priority of 60. If possible, use hardware mirroring for the online log files, rather than Oracle log file multiplexing. Also, check to make sure that the online log files are raw and on dedicated disks. Use our hold_logs_open.sh script to improve log switch performance. There are a lot of other issues here, but that should keep you busy for a few days. Note that the rtprio change is very important.
 
 
?

Single-task export

26 January 2000

I remember reading somewhere that one can speed up exports by running them in single-task mode. Does it work, and if so, Why?
 
? In normal Oracle connections there are two processes: the client process which has no special permissions, and the shadow or server process which runs as the Oracle owner and has operating system permissions to attach to and modify the SGA, read and write the data files and so on. If the two processes are running on the same box, communication between these two processes is bequeathed by SQL*Net to the operating system inter-process communication (IPC) facilities. However, IPC involves scheduling latencies. While one process is working, the other is waiting for it, and vice versa. There is always a delay between one process passing the baton to the other, and that process being scheduled to run on a CPU. Single-task merges the functions into a single process, thereby eliminating the IPC latency. People have benchmarked between 5% and 15% savings on elapsed time when using single-task export.
 
 
?

64-bit Oracle

1 February 2000

We are on HP-9000 V series boxes, running HP-UX 11.0. All production databases are on 7.3.4, 32-bit version. Would there be any major performance improvement in going for the 64-bit version of Oracle?
 
? It would be worth going 64-bit if you want a VLM Oracle buffer cache to reduce disk I/O. Other than that, I would expect the impact to be minimal but positive. The reason for the difference would be that Oracle does all its expression evaluation using longs to maximize precision and then casts the result back to an integer. Long addition and multiplication is much faster in a 64-bit executable, although division is a little slower. On the other hand, the executable itself would be larger, and that might reduce TLB hits.
 
 
?

ora_kstat

14 February 2000

One of our developers has just brought to our attention a line in the system startup file, inittab, that I have no clue about. Does this look familiar?
orakstat:2:wait:/etc/loadext -l /etc/ora_kstat
? Yes, this is associated with using the post-wait driver under AIX.
 
 
?

Tuning CPU usage

16 February 2000

I'm trying to tune a system that is CPU bound. In terms of the overhead of AIX having to create shadow processes, is there anything to be gained from moving from the conventional two task architecture to MTS? Also, do you know of any web sites that have decent Unix performance white papers, in particular AIX tuning material?
 
? MTS will use more CPU, not less. In an Oracle environment, the relative cost of process creation is insignificant. Your three main strategies for reducing CPU usage should be to reduce physical I/O, buffer gets and parsing. For AIX tuning information, have a look at the chapter on Monitoring and Tuning CPU Use in the AIX Performance Tuning Guide.
 
 
?

Real-time LGWR

18 February 2000

You suggested that we make LGWR a real-time priority process to speed up our data loads. HP-UX has priority levels 0 (highest) to 127 (lowest). Should we use the lowest? Also, a Unix guy here is concerned about this.
 
? Yes, 127 is fine, just as long as it is in the real-time class so that there is no priority degradation. I understand your Unix guy being concerned. If there were a bug in LGWR, it could chew up all of one CPU. On a single CPU machine, "best practice" is to have a higher priority real time shell running on the console - just in case. On your multi-CPU machine that is not really necessary.
 
 
?

NT performance glitch

16 March 2000

I'm doing a bit of work on a high performance web system that can only be bounced about once every 6 weeks. It's Oracle 8.0.5 on NT 4.0. There is an odd performance glitch. The average log file sync time on lots of the sessions is about 1 second, (which is the timeout time) but:
a) The average log file parallel write time is about 1/100 seconds
b) The redo allocation latch is not stressed
c) The average log write size is about 2K
When I try to stress the system, I don't get any trouble. Another interesting feature is that we are getting tens of thousands of buffer busy waits per day, also of about 1 second average. I can't think of any reason why buffer busy waits might be related to log file syncs, but the numbers are similar. Any thoughts? I get one chance to bounce the database next week, and it would be quite nice to know that the problem will disappear.
 
? This might be an NT priority inversion problem. The priorities of runnable NT threads do not appreciate with time. Therefore, if the CPU is 100% busy, lower priority threads hardly get a look-in. By default the processes of Oracle threads under NT are in the variable priority class, which means that thread priorities are adjusted by the dispatcher according to its rules. This can mean that processes waiting to be posted don't see the post until their timeout expires. It can also mean that low priority processes holding a resource are unable to free that resource (until they get a brief random priority boost). The solution is to set ORACLE_PRIORITY in the registry, as documented in appendix C of the Getting Started manual. If this is your problem, you may want to consider using a low real-time priority for Oracle.
 
 
Copyright © Ixora Pty Ltd Send Email Home