Please see the two pieces of the document below headed
"Deleting an index entry" and "Deletion of many rows in the B*Tree"
If blocks that are entirely empty are on a free list why do these still have
to be accessed when traversing the index?
I did some testing and created a table of 1.5 million rows with each row
containing a unique number.
I created a unique index on this id and performed an analyze validate
structure on this index.
After this I did an update of the id field adding the maximum value of this
id to it in order to force
every leaf row to be moved to a different block.
After this I did another analyze validate structure.
I expected to see about 1.5 million deleted rows at a total of 3.0 million
rows.
But this did not happen I had only about 15,000 deleted rows.
But btree bytes increased from about 60Mb to 100Mb and leaf blocks increased
from
about 8,000 to about 13,000. On the other hand btree bytes used remaind a
little blow
60Mb.
This article explains how Oracle B*tree indexes are maintained.
It is intended to assist users who are trying to understand how an Oracle
B*tree index is maintained.
RELATED DOCUMENTS
Oracle8 Server Concepts Manual, Schema Objects, indexes
Oracle8 Server SQL Reference Manual, CREATE INDEX syntax
[NOTE:33343.1] Index Space Usage
[PR:1019722.6] Script to report information on indexes
Oracle version 8 provides five indexing schemes:
B*tree indexes
B*tree cluster indexes
hash cluster indexes
reverse key indexes
bitmap indexes
This article is only concerned with B*tree indexes which are currently the
most commonly used. The theory of B*tree indexes is beyond the scope of
this article; for more information refer to computer science texts dealing
with data structures.
Format of Index Blocks
~~~~~~~~~~~~~~~~~~~~~~
Within a B*tree index, index blocks are either branch blocks, the upper
blocks within the B*tree index, or leaf blocks, the lowest level index
blocks. Branch blocks contain index data that point to lower level index
blocks. Leaf blocks contain every indexed data value and a corresponding
ROWID used to locate the actual row.
Index Block Format
|-----------------------------------------------------|
| |
| Index Block Header |
| |
------------------------------------------------------|
| |
| Space reserved for future updates |
| and inserts of new rows with the |
| appropriate key values |
| |
|-----------------------------------------------------| <- PCTFREE say 10
| |
| Index Key Data |
| |
| |
| |
| |
| |
| |
| |
| |
| |
|-----------------------------------------------------|
B*tree Index Creation
~~~~~~~~~~~~~~~~~~~~~
When a B*tree index is created using the CREATE INDEX statement, the
parameter PCTFREE can be specified. PCTFREE specifies the percentage of
space to leave free for future updates and insertions to an index block.
A value of 0 reserves no space for future inserts and updates. It allows
the entire data area of the block to be filled when the index is created.
If PCTFREE is not specified, it defaults to 10. This value reserves 10%
of each block for updates to existing key values, and inserts of new key
values.
Thus PCTFREE is only relevant at initial index creation. It causes optimal
splitting of the B*-tree in preparation for subsequent growth. The idea is
to do as much splitting as possible during initial creation and avoid having
to pay the penalty later during insertion into the table. This is what a
high PCTFREE setting on an index gives you. However, if your inserted keys
are monotonically increasing (say a date/time field) a PCTFREE=0 is best.
Only the rightmost index leaf block will be inserted into, so there's no
point leaving room in the other leaf blocks at creation time.
Following index creation, an index block can accommodate keys up to the full
available data area including space for ITLs. Thus an index block
will not require splitting until the available data area is fully used.
The bottom line is PCTFREE is not looked at once you pass the index creation
phase. One thing to remember is that each row in the index has only one
correct block it can live in, based on the key value.
Inserting an index entry after index creation
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
After index creation, a new table row will create a new index entry. This
entry is inserted into the appropriate index leaf block based on the index
key values until the leaf block is full. If on insert the index leaf block
is full, then an index block split will occur putting half of the index
entries into each new index leaf block. Within an index data block, space
is reserved for the index block header. The rest of the block is available
for index keys.
Where an index key value has multiple entries, these entries are made into
the leaf block in ROWID order. So all the blocks for the index key value
are scanned in turn until an entry is found with a greater ROWID than the
new row, and the row is inserted into that block (which may cause extra
block splitting).
The reason for this is for queries such as:
SELECT some_columns FROM table WHERE COL_A = valueA and COL_B = valueB;
If COL_A and COL_B both have non-unique indexes, then because the entries
in each index are stored in ROWID order, it makes it easy to find the
ROWID's that occur in both indexes. Otherwise we would have to sort the
ROWID's before we could find the ROWID's that occur in both indexes.
Updating an index entry
~~~~~~~~~~~~~~~~~~~~~~~
There is really no concept of an UPDATE to an index. When a table row is
updated, the old index key is deleted and a new key inserted (at the
correct location in the B*tree).
Deleting an index entry
~~~~~~~~~~~~~~~~~~~~~~~
When a index entry is to be deleted, the row is deleted from the index
leaf block and the space within the index leaf block released to the block
for further inserts with the appropriate key range. If a leaf block has
even one entry it is still part of the tree, hence only entries that belong
in it positionally can be accommodated. Once a leaf block is completely
empty it is put on the free list, at which point it can be used to service
an index block split.
Index Fragmentation
~~~~~~~~~~~~~~~~~~~
To ascertain index fragmentation, the following SQL statement can be
used:
ANALYZE INDEX &&index_name VALIDATE STRUCTURE;
col name heading 'Index Name' format a30
col del_lf_rows heading 'Deleted|Leaf Rows' format 99999999
col lf_rows_used heading 'Used|Leaf Rows' format 99999999
col ibadness heading '% Deleted|Leaf Rows' format 999.99999
SELECT name,
del_lf_rows,
lf_rows - del_lf_rows lf_rows_used,
to_char(del_lf_rows / (lf_rows)*100,'999.99999') ibadness
FROM index_stats
where name = upper('&&index_name');
undefine index_name
As a rule of thumb if 10-15% of the table data changes, then you should
consider rebuilding the index.
B*Tree Balancing
~~~~~~~~~~~~~~~~
Oracle indexes are implemented as B* Trees which are always balanced.
In an Oracle B*tree the root of the tree is at level 0. In a very small
B*tree the root block can also be a Leaf block.
In most cases, blocks on levels 0 through N-2 (where N is the height of
the tree) are Branch blocks. Branch blocks do not contain data, they
simply contain separators which are used to navigate from the root block
to the Leaf blocks.
All Leaf blocks are at level N-1 in Oracle B*trees. All data stored in
a B*tree is stored in the Leaf blocks.
The definition of a 'Balanced Tree' is that all the data is on the same
level. Which means that the path to all data in the tree is the same length.
Since all the data is stored in Leaf blocks and all the Leaf blocks
are on the same level the B*trees are always balanced. There is no way
to unbalance a B* tree.
Deletion of many rows in the B*Tree
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
If a table has 100,000 rows and 99,999 of 100,000 rows and index entries are
deleted. How is the index balanced?
In this case the rows are deleted from the index, and the empty blocks inserted
onto the index free list. These blocks are still part of the index and will
need to be accessed when traversing the index.
Thus if an index has one entry in it, there is only one block (a root/leaf
block) in the tree. When searching the tree only one block needs to be accessed
to answer the query. If you load a B* Tree with 100,000 rows and get a tree
with say 3 Levels. Levels zero and one are Branch blocks used to access the
data in the Leaf blocks on level 2. When querying this tree you first access
the root block using the search key to find correct Branch block in level
one of the Tree. Next you use the search key and the Branch block to find
the correct Leaf block that should contain the key being sought. So it takes
three block accesses to answer the same query. Now if 99,999 rows were
deleted from the tree the rows are removed from the index but the index
is not collapsed. In this case you still have a 3 level index to store the
one row and it will still take three block accesses to access that row.
The fact that we need three block accesses instead of one does not mean
this tree is unbalanced. The tree is still balanced it just contains a lot
of empty blocks.
The reason for doing this is that, when data is inserted into a Leaf block,
and there is no room for the insert, a very expensive operation called a split
occurs. The split creates a new Leaf block and possibly new Branch
blocks as well to maintain the balance of the tree. The split operation is
by far the most expensive operation that is done in the maintenance of B* trees
so we go to great lengths to avoid them. By not collapsing the now unused levels
out of the B*Trees after large deletes these levels (with splits) do not
have to be recreated during future inserts.
If most of the rows in a table are going to be deleted, and not refilled soon
after, it is advisable to drop the index, delete the rows, and then recreate
the index.
By dropping the index you save the index maintenance that needs
to be done during the delete thus speeding up the delete. By recreating the
index after the delete you create an index of optimal height and with
the Leaf blocks filled to an optimal level.