For most entities identified in a logical data model, there should be at least one set of attributes which constitute a natural candidate key. For example, a customer may be identified by their social security number, or an item of equipment may be identified by its serial number.
However, instead of such natural keys, a synthetic key (also called a surrogate key) of a single numeric column generated from a sequence number generator may well provide a better primary key. There are six factors to consider in making this decision:
In general, it is necessary to index the natural key regardless of whether a synthetic primary key is used. So adding a synthetic key increases the index maintenance overhead for inserts and deletes.
If a synthetic key is used, the natural key is not available in the child tables, and thus an extra join back to the parent table may be needed in some cases.
Because a synthetic key would be smaller, both the primary and foreign key indexes would be smaller, and quite possibly have shallower B*-trees. If so, joins using these indexes would be faster, as would fast full index scans.
Although the addition of a synthetic key may make a table slightly larger, and hence need more multiblock reads for a full table scan, the related tables containing the foreign key would be significantly smaller and would thus scan more efficiently.
Natural key values do sometimes need to be updated, but synthetic keys do not. If a natural key is used as the primary key, then key updates will have to be cascaded. Apart from the extra work involved, for both the developers and the database engine, such cascading updates can dramatically reduce concurrency.
To use a synthetic key in a distributed environment with disconnected nodes that need to insert records, each node needs to generate a mutually exclusive range of key values from its own sequence generator. The allocation of key ranges needs careful management, whereas there is no such consideration if a natural key is used.
Using a synthetic key severely limits the available partitioning strategies.
The net effect of these considerations is normally to use a synthetic key in preference to a natural key.
There is, however, a major scalability danger with automatically generated synthetic keys. Every insertion to a table requires a corresponding insertion to its primary key index. If the primary key values are being generated in ascending order, then all inserts will need to change the high-order leaf block in the B*-tree. There is an obvious danger here of contention for that block of the index, if several users attempt concurrent inserts. (Whereas the inserts to the table itself can easily be distributed to a variety of blocks by using multiple process freelists).
Prior to Oracle8, the standard strategy to avoid this problem was to ensure that the synthetic key values were not generated in order. This was done by permuting the values generated by the sequence number generator before using them. Various permutation schemes such as adding a leading check digit, or reversing the order of the digits, have been used. These schemes have the effect of distributing inserts evenly over the range of values in the index, thus preventing leaf block contention. In Oracle8, the same effect may be obtained by using a reverse key index.
The major disadvantage of distributing inserts in this way is that the data density of index leaf blocks will be typically only 75% of capacity rather than almost 100%, making fast full index scans on the primary key index less efficient. However, this access path is not typical and is seldom performance critical when used. So, reverse key indexes should be used in general for synthetic primary key indexes.
|Copyright © Ixora Pty Ltd||