Ixora News - April 2001

Welcome to the April 2001 issue of "Ixora News".
To unsubscribe send a blank email to quit-news@ixora.com.au.

In this issue ...

Histogram helper   - Help with histogram creation
Dependent SQL   - Profiling current table usage
Keeper fix   - A fix to keeper.sql

Histogram helper

Histograms enable the optimizer to more accurately estimate the selectivity of literal predicates against columns with skewed data distributions. This can help the optimizer to choose a better access path and possibly join order for certain queries than might otherwise be the case. To build a histogram Oracle sorts the non-null values for the column and then groups them into the specified number of "buckets" so that each bucket contains the same number of data points (plus or minus one). It then records the end-point values for each bucket in the data dictionary. This is called a height-balanced histogram as opposed to a width-balanced histogram. Width-balanced histograms have a uniform number of distinct values in each bucket and count the number of data points.

To use a height-balanced histogram to estimate the selectivity of predicates against the column, Oracle regards any value that occurs as the end point for more than one bucket as a "popular" value. For popular values, the selectivity of an equality predicate is estimated to be the proportion of histogram endpoints that it spans. Similarly for range predicates. For non-popular values, a density function is used. For example, if a column has 10 popular values and 75% of the rows in the table have popular values for this column, and if there are 1000 distinct values for the column, then the density is (1 - 0.75) / (1000 - 10). That is, it is the proportion of non-popular values divided by the number of non-popular values. This is taken to be the selectivity of equality predicates for non-popular values.

Prior to release 8.1 character values that were not distinct within the first 5 characters were indistinct within histograms because Oracle only stored a numeric representation of the first 5 characters of each endpoint value in the data dictionary. This sometimes made non-popular values appear to be popular and resulted in poor execution plans. From release 8.1 Oracle now records the actual endpoint values in the data dictionary, making histograms more reliable.
An exception to the use of height-balanced histograms is made for columns with fewer distinct values than the number of buckets. In this case Oracle uses a normal width-balanced histogram with one bucket for each value and stores the number of occurrences in place of the ENDPOINT_NUMBER. In this case, the selectivity of all predicates can be predicted exactly.

Histograms can enable dramatic performance gains if literal predicates are used against columns with skewed data distributions. However, it is difficult to determine which columns are skewed enough to warrant histograms, and even more difficult to determine the optimal number of buckets of buckets to use. So most DBAs just create histograms FOR ALL COLUMNS or FOR ALL INDEXED COLUMNS and use either the default (75) or maximum (254) number of buckets throughout. But, redundant histograms on columns with uniformly distributed data, and unduly large histograms on columns with skewed data distributions just increase the CPU cost of parsing and waste space in the shared pool. Therefore histograms should only be created where they are beneficial, and should not be any larger than necessary.

Firstly, there is no point in creating a histogram unless literal predicates are used against the column. Beyond that, a histogram is only warranted if it can identify one or more popular values, or if it can identify a significant skew in the distribution of non-popular values and literal range predicates are used against the column. Accordingly, the number of buckets must be large enough to identify all possible popular values, and the maximum number of buckets should be used if the non-popular values are skewed and subject to range predicates. Of course, a smaller number of buckets can be used if the column has less than 254 distinct values allowing Oracle to use a width-balanced histogram. Please note that histograms on columns that are not indexed can nevertheless be beneficial because they inform the optimizer of the cardinality of their table for the query and can thus influence the join order. The criterion to consider here is whether the column is subject to WHERE clause predicates, not whether it is indexed.

The new APT script consider_histogram.sql does much of the hard work for you. It reports whether a particular column of a particular table has any popular values or a sufficiently skewed data distribution to warrant a histogram. If so, it recommends the optimal number of histogram buckets to use. This script is intended to be used selectively on columns that are subject to literal predicates in expensive queries. It has to do a full scan of the table (or an index on the column) each time you run it, so you won't want to run it needlessly.

Dependent SQL

When considering histogram creation the first thing to check is how the column is used in WHERE clause predicates. In some cases an effective way of doing that is to look at the SQL currently in the library cache on the assumption that it is representative. Similarly, when considering indexing changes or other physical changes that will affect the optimizer, it is important to consider all the SQL statements that access the table, rather than just the "problem query". Again a look into the library cache may quickly identify other SQL statements that would either benefit from or be harmed by the proposed change.

The new APT script dependent_sql.sql can be used to look into the library cache in situations like these. It identifies all the SQL statements that are currently in the library cache and that are dependent on a particular table. Of course, some SQL statements are never cached or are used so infrequently that they will not normally be present in the cache. Therefore detailed application knowledge is needed to confirm any tentative conclusions you may draw from the information obtained with this script.

Keeper fix

Our thanks are due to Connor McDonald and a few very helpful folk at Oracle Support for explaining an apparent anomaly in the management of large free and permanent memory chunks in the shared pool after instance startup. The explanation can be read at http://www.oracledba.co.uk/tips/shared_pool_perm.htm. This was causing the APT script keeper.sql to be almost ineffective for certain shared pool sizes. The script has now been fixed to exclude very large free chunks from the memory counted as having previously been used. Those who have based their keeping strategy on keeper.sql may want to make the same change in their own versions of the script.


Copyright © Ixora Pty Ltd Send Email Home