| Histogram helper | - Help with histogram creation |
| Dependent SQL | - Profiling current table usage |
| Keeper fix | - A fix to keeper.sql |
To use a height-balanced histogram to estimate the selectivity of predicates against the column, Oracle regards any value that occurs as the end point for more than one bucket as a "popular" value. For popular values, the selectivity of an equality predicate is estimated to be the proportion of histogram endpoints that it spans. Similarly for range predicates. For non-popular values, a density function is used. For example, if a column has 10 popular values and 75% of the rows in the table have popular values for this column, and if there are 1000 distinct values for the column, then the density is (1 - 0.75) / (1000 - 10). That is, it is the proportion of non-popular values divided by the number of non-popular values. This is taken to be the selectivity of equality predicates for non-popular values.
Prior to release 8.1 character values that were not distinct within the first 5 characters were indistinct within histograms because Oracle only stored a numeric representation of the first 5 characters of each endpoint value in the data dictionary. This sometimes made non-popular values appear to be popular and resulted in poor execution plans. From release 8.1 Oracle now records the actual endpoint values in the data dictionary, making histograms more reliable.An exception to the use of height-balanced histograms is made for columns with fewer distinct values than the number of buckets. In this case Oracle uses a normal width-balanced histogram with one bucket for each value and stores the number of occurrences in place of the ENDPOINT_NUMBER. In this case, the selectivity of all predicates can be predicted exactly.
Histograms can enable dramatic performance gains if literal predicates are used against columns with skewed data distributions. However, it is difficult to determine which columns are skewed enough to warrant histograms, and even more difficult to determine the optimal number of buckets of buckets to use. So most DBAs just create histograms FOR ALL COLUMNS or FOR ALL INDEXED COLUMNS and use either the default (75) or maximum (254) number of buckets throughout. But, redundant histograms on columns with uniformly distributed data, and unduly large histograms on columns with skewed data distributions just increase the CPU cost of parsing and waste space in the shared pool. Therefore histograms should only be created where they are beneficial, and should not be any larger than necessary.
Firstly, there is no point in creating a histogram unless literal predicates are used against the column. Beyond that, a histogram is only warranted if it can identify one or more popular values, or if it can identify a significant skew in the distribution of non-popular values and literal range predicates are used against the column. Accordingly, the number of buckets must be large enough to identify all possible popular values, and the maximum number of buckets should be used if the non-popular values are skewed and subject to range predicates. Of course, a smaller number of buckets can be used if the column has less than 254 distinct values allowing Oracle to use a width-balanced histogram. Please note that histograms on columns that are not indexed can nevertheless be beneficial because they inform the optimizer of the cardinality of their table for the query and can thus influence the join order. The criterion to consider here is whether the column is subject to WHERE clause predicates, not whether it is indexed.
The new APT script consider_histogram.sql does much of the hard work for you. It reports whether a particular column of a particular table has any popular values or a sufficiently skewed data distribution to warrant a histogram. If so, it recommends the optimal number of histogram buckets to use. This script is intended to be used selectively on columns that are subject to literal predicates in expensive queries. It has to do a full scan of the table (or an index on the column) each time you run it, so you won't want to run it needlessly.
The new APT script dependent_sql.sql can be used to look into the library cache in situations like these. It identifies all the SQL statements that are currently in the library cache and that are dependent on a particular table. Of course, some SQL statements are never cached or are used so infrequently that they will not normally be present in the cache. Therefore detailed application knowledge is needed to confirm any tentative conclusions you may draw from the information obtained with this script.
| Copyright © Ixora Pty Ltd |
|