From:Steve Adams
Date:27-Mar-2001 23:49
Subject:   Deleted rows and empty blocks in indexes

I'll answer the theoretical question first, and then talk about your test.

To maximize concurrency, and for several other obvious reasons, most changes to the structure of an index have to be performed in recursive transactions that commit independently of the parent transaction (which is also why index leaf blocks need at least 2 transaction slots). Thus structural changes persist even when the parent transaction rolls back. This means that Oracle has to do some moderately surprising things for internal index space management to ensure that blocks cannot be "lost".

When the last key is deleted from an index leaf block, that block is placed on a free list, but it is not yet unlinked from the B*-tree structure. So, whenever a new block is obtained from a free list for an index block split, Oracle first checks that the block contains no extant index entries, and that the interested transactions have all completed. If so, the block is unlinked from its old location in the B*-tree structure and linked into its new location. If not, it is either skipped over or unlinked from the free list, depending on whether it has extant index entries without pending deletes.

Incidentally, blocks are not unlinked from the free list when they are linked into a new location in the B*-tree structure. What actually happens is that the next time a free block is needed, the previously used block will still be on the head of the free list but will now have one or more new index entries, and so it is unlinked from the free list at that time. This can be seen with event 10224.

That should explain why it is sometimes necessary to navigate entirely empty blocks during an index range scan. However, index leaf blocks to the "left" of the minimum extant key, and to the "right" of the maximum extant key will never be visited. This explains why the number of deleted leaf rows shown by INDEX_STATS was much lower than you expected. There were indeed 1.5 million deleted leaf rows in that index, but Oracle did not count those blocks that you would never have to navigate.

Please see the two pieces of the document below headed "Deleting an index entry" and "Deletion of many rows in the B*Tree"

If blocks that are entirely empty are on a free list why do these still have to be accessed when traversing the index?

I did some testing and created a table of 1.5 million rows with each row containing a unique number. I created a unique index on this id and performed an analyze validate structure on this index. After this I did an update of the id field adding the maximum value of this id to it in order to force every leaf row to be moved to a different block. After this I did another analyze validate structure.

I expected to see about 1.5 million deleted rows at a total of 3.0 million rows. But this did not happen I had only about 15,000 deleted rows. But btree bytes increased from about 60Mb to 100Mb and leaf blocks increased from about 8,000 to about 13,000. On the other hand btree bytes used remaind a little blow 60Mb.


This article explains how Oracle B*tree indexes are maintained.

It is intended to assist users who are trying to understand how an Oracle B*tree index is maintained.

RELATED DOCUMENTS Oracle8 Server Concepts Manual, Schema Objects, indexes Oracle8 Server SQL Reference Manual, CREATE INDEX syntax [NOTE:33343.1] Index Space Usage [PR:1019722.6] Script to report information on indexes

Oracle version 8 provides five indexing schemes:

B*tree indexes B*tree cluster indexes hash cluster indexes reverse key indexes bitmap indexes

This article is only concerned with B*tree indexes which are currently the most commonly used. The theory of B*tree indexes is beyond the scope of this article; for more information refer to computer science texts dealing with data structures.

Format of Index Blocks ~~~~~~~~~~~~~~~~~~~~~~

Within a B*tree index, index blocks are either branch blocks, the upper blocks within the B*tree index, or leaf blocks, the lowest level index blocks. Branch blocks contain index data that point to lower level index blocks. Leaf blocks contain every indexed data value and a corresponding ROWID used to locate the actual row.

Index Block Format

|-----------------------------------------------------| | | | Index Block Header | | | ------------------------------------------------------| | | | Space reserved for future updates | | and inserts of new rows with the | | appropriate key values | | | |-----------------------------------------------------| <- PCTFREE say 10 | | | Index Key Data | | | | | | | | | | | | | | | | | | | |-----------------------------------------------------|

B*tree Index Creation ~~~~~~~~~~~~~~~~~~~~~ When a B*tree index is created using the CREATE INDEX statement, the parameter PCTFREE can be specified. PCTFREE specifies the percentage of space to leave free for future updates and insertions to an index block. A value of 0 reserves no space for future inserts and updates. It allows the entire data area of the block to be filled when the index is created. If PCTFREE is not specified, it defaults to 10. This value reserves 10% of each block for updates to existing key values, and inserts of new key values.

Thus PCTFREE is only relevant at initial index creation. It causes optimal splitting of the B*-tree in preparation for subsequent growth. The idea is to do as much splitting as possible during initial creation and avoid having to pay the penalty later during insertion into the table. This is what a high PCTFREE setting on an index gives you. However, if your inserted keys are monotonically increasing (say a date/time field) a PCTFREE=0 is best. Only the rightmost index leaf block will be inserted into, so there's no point leaving room in the other leaf blocks at creation time.

Following index creation, an index block can accommodate keys up to the full available data area including space for ITLs. Thus an index block will not require splitting until the available data area is fully used. The bottom line is PCTFREE is not looked at once you pass the index creation phase. One thing to remember is that each row in the index has only one correct block it can live in, based on the key value.

Inserting an index entry after index creation ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ After index creation, a new table row will create a new index entry. This entry is inserted into the appropriate index leaf block based on the index key values until the leaf block is full. If on insert the index leaf block is full, then an index block split will occur putting half of the index entries into each new index leaf block. Within an index data block, space is reserved for the index block header. The rest of the block is available for index keys.

Where an index key value has multiple entries, these entries are made into the leaf block in ROWID order. So all the blocks for the index key value are scanned in turn until an entry is found with a greater ROWID than the new row, and the row is inserted into that block (which may cause extra block splitting).

The reason for this is for queries such as:

SELECT some_columns FROM table WHERE COL_A = valueA and COL_B = valueB;

If COL_A and COL_B both have non-unique indexes, then because the entries in each index are stored in ROWID order, it makes it easy to find the ROWID's that occur in both indexes. Otherwise we would have to sort the ROWID's before we could find the ROWID's that occur in both indexes.

Updating an index entry ~~~~~~~~~~~~~~~~~~~~~~~ There is really no concept of an UPDATE to an index. When a table row is updated, the old index key is deleted and a new key inserted (at the correct location in the B*tree).

Deleting an index entry ~~~~~~~~~~~~~~~~~~~~~~~ When a index entry is to be deleted, the row is deleted from the index leaf block and the space within the index leaf block released to the block for further inserts with the appropriate key range. If a leaf block has even one entry it is still part of the tree, hence only entries that belong in it positionally can be accommodated. Once a leaf block is completely empty it is put on the free list, at which point it can be used to service an index block split.

Index Fragmentation ~~~~~~~~~~~~~~~~~~~ To ascertain index fragmentation, the following SQL statement can be used:

ANALYZE INDEX &&index_name VALIDATE STRUCTURE;

col name heading 'Index Name' format a30 col del_lf_rows heading 'Deleted|Leaf Rows' format 99999999 col lf_rows_used heading 'Used|Leaf Rows' format 99999999 col ibadness heading '% Deleted|Leaf Rows' format 999.99999

SELECT name, del_lf_rows, lf_rows - del_lf_rows lf_rows_used, to_char(del_lf_rows / (lf_rows)*100,'999.99999') ibadness FROM index_stats where name = upper('&&index_name');

undefine index_name

As a rule of thumb if 10-15% of the table data changes, then you should consider rebuilding the index.

B*Tree Balancing ~~~~~~~~~~~~~~~~ Oracle indexes are implemented as B* Trees which are always balanced.

In an Oracle B*tree the root of the tree is at level 0. In a very small B*tree the root block can also be a Leaf block.

In most cases, blocks on levels 0 through N-2 (where N is the height of the tree) are Branch blocks. Branch blocks do not contain data, they simply contain separators which are used to navigate from the root block to the Leaf blocks.

All Leaf blocks are at level N-1 in Oracle B*trees. All data stored in a B*tree is stored in the Leaf blocks.

The definition of a 'Balanced Tree' is that all the data is on the same level. Which means that the path to all data in the tree is the same length. Since all the data is stored in Leaf blocks and all the Leaf blocks are on the same level the B*trees are always balanced. There is no way to unbalance a B* tree.

Deletion of many rows in the B*Tree ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ If a table has 100,000 rows and 99,999 of 100,000 rows and index entries are deleted. How is the index balanced?

In this case the rows are deleted from the index, and the empty blocks inserted onto the index free list. These blocks are still part of the index and will need to be accessed when traversing the index.

Thus if an index has one entry in it, there is only one block (a root/leaf block) in the tree. When searching the tree only one block needs to be accessed to answer the query. If you load a B* Tree with 100,000 rows and get a tree with say 3 Levels. Levels zero and one are Branch blocks used to access the data in the Leaf blocks on level 2. When querying this tree you first access the root block using the search key to find correct Branch block in level one of the Tree. Next you use the search key and the Branch block to find the correct Leaf block that should contain the key being sought. So it takes three block accesses to answer the same query. Now if 99,999 rows were deleted from the tree the rows are removed from the index but the index is not collapsed. In this case you still have a 3 level index to store the one row and it will still take three block accesses to access that row. The fact that we need three block accesses instead of one does not mean this tree is unbalanced. The tree is still balanced it just contains a lot of empty blocks.

The reason for doing this is that, when data is inserted into a Leaf block, and there is no room for the insert, a very expensive operation called a split occurs. The split creates a new Leaf block and possibly new Branch blocks as well to maintain the balance of the tree. The split operation is by far the most expensive operation that is done in the maintenance of B* trees so we go to great lengths to avoid them. By not collapsing the now unused levels out of the B*Trees after large deletes these levels (with splits) do not have to be recreated during future inserts.

If most of the rows in a table are going to be deleted, and not refilled soon after, it is advisable to drop the index, delete the rows, and then recreate the index.

By dropping the index you save the index maintenance that needs to be done during the delete thus speeding up the delete. By recreating the index after the delete you create an index of optimal height and with the Leaf blocks filled to an optimal level.