| The best sort | - Using event 10032 to set sort_area_size |
| What's in a name? | - Why to avoid long file names and big directories |
| Large log buffers | - Not so bad since Oracle8 |
| Copyright issues | - A few people have been naughty |
| Practical Oracle8i | - A new book from Jonathan Lewis |
The performance of a disk sort is largely a factor of the number of merge passes required, because that is what dictates the amount of temporary tablespace I/O. The number of comparisons performed is an almost trivial factor.
In a disk sort the input rows are sorted into a number of "sort runs". These sort runs are then merged. The number of sort runs that can be merged simultaneously is called the "merge width". If the initial number of sort runs is greater than the merge width, then multiple merge passes will be required. The merge width depends both on the sort_area_size and the sort_multiblock_read_count. Up to 90% of the sort_area_size may be used for read buffers during a merge, and the rest is used for write buffers. The size of each read buffer is sort_multiblock_read_count * db_block_size, and 2 read buffers are needed per sort run to enable asynchronous read ahead (but only if disk_asynch_io is TRUE). The same write buffer configuration is used during the initial sort phase as during the merge phase, therefore, the size of each sort run is approximately 90% of the sort_area_size.
This information can be used to calculate the sort_area_size that would be needed to avoid a secondary merge phase for a large sort. For example, if it is known that the largest sort in a certain batch process sorts 12G of data, and the sort_multiblock_read_count is 8 blocks (of 8K each), then the number of initial sort runs will be 12G / (0.9 * sort_area_size), and the maximum merge width will be round(0.9 * sort_area_size / 64K) / 2. Therefore, to ensure that the number of initial sort runs is no greater than the maximum merge width, the sort_area_size must be at least sqrt(12G * 64K * 2 / 0.81) or about 43.6M. There would be virtually no benefit in using a sort_area_size greater than 43.6M but less than 12G.
The only other consideration is the number of comparison operations performed, and thus the CPU time used, during the sort and merge phases. A larger sort_area_size requires more comparisons during the initial sort phase because sorting is anThis technique can be used to set the sort_area_size to the optimum value for batch processes that do large disk sorts. First you need to determine the largest disk sort that is performed by the batch process. This can be done by setting event 10032 at level 1 in the session and then examining the process trace file. The trace file will contain the following information for each disk sort performed.n log(n) complexity operation, but having fewer sort runs saves comparisons during the merge phase (which has linear complexity). These two factors largely cancel each other out.
The size of each disk sort can be determined from the sort_area_size and the number of initial sort runs required. The optimal sort_area_size can then be calculated as explained above. If the batch process is then tuned to use this sort_area_size in its session, the "Number of merges" shown in the trace file will be 1 even for the largest disk sorts. In this case, giving the session its own sort_area_size of 44M resulted in a 71% reduction in elapsed time.---- Sort Parameters ------------------------------ sort_area_size 1048576 sort_area_retained_size 1048576 sort_multiblock_read_count 8 max intermediate merge width 7 ---- Sort Statistics ------------------------------ Initial runs 13553 Intermediate runs 2258 Number of merges 2259 ...
The process of consulting a directory to get the inode number for a particular file is called an iget(). Some inode gets can be resolved without disk access if the inode for the directory is in the inode cache and if the directory has sufficiently few and small entries for the data to be stored directly in the inode. Such directories are called immediate directories. If the directory is not an immediate directory, then its data blocks must be read from disk sequentially until the required file name and inode number have been found. Directory blocks are small and are not necessarily contiguous so directory access can be expensive. For this reason, it is better to have deep directory structures with a low branching factor, rather than a shallow directory structure with many files in each directory. In particular, beware of allowing large numbers of files to accumulate in the archive, audit and dump destination directories. If this does occur, the directory concerned should be entirely removed and then recreated. Ideally, the sum of the lengths of all the file names in a directory should never exceed 16K.
It is also best to ensure that frequently accessed files, particularly sub-directories are created first in each directory so that they can be found more quickly. Many file systems have an administrative utility to perform this optimization for you. Another saving is available if you use symbolic links. The path to a symbolic link can be stored directly in the inode if it is no more than 59 bytes in length.
On Unix, name cache statistics are available from the sar -a command, which reports the number of inode gets, path name lookups and directory block reads per second. Think twice before representing these figures as a hit rate, because multiple inode gets may be needed to resolve a single path name lookup. If your application uses a large number of files, you may be able to improve the name cache performance by enlarging the cache. The size of the name cache can normally be tuned via a kernel parameter, commonly ncache. In some cases, such as HP-UX, the size of the name cache cannot be tuned directly, but is derived from the size of the inode cache.
One more important thing to know about the name cache is that making it bigger may have no effect on performance if you use long directory and file names. Most systems only cache file and directory names of 14 bytes or less. However, this is changing. For example, Solaris now caches names up to 31 bytes in length, and HP-UX now caches names up to 38 bytes in length. Check the system header files on your system to see what the limit is on your operating system, and remember to allow for a terminating null byte. In particular, beware of using long names for the top level directories in your Oracle file systems. A single long directory name can cause name cache misses and thus directory access for every inode below it in the file system.
| Copyright © Ixora Pty Ltd |
|