What are the causes of bucket overflow in a hash file organization? What can be done to reduce the occurrence of bucket overflow?
When a record is inserted, the bucket to that it is mapped has space to store the record. If the bucket does not have sufficient space, a bucket overflow is said to occur.
Bucket overflow can occur for various reasons:
Insufficient buckets: The number of buckets, that we indicate nb, must be chosen such than nb>nc/ff, where n, denote the total number of records that will be stored, and fr denotes the number of records in which will fit in a bucket. This designation, of course, supposes that the total number of records is known while the hash function is chosen.
Skew : Some buckets are assigned more records than are others, so a bucket might overflow even while other buckets still have space. This situation is known as bucket skew.
Skew can occur for two reasons:
1. Multiple records might have the same search key.
2. The chosen hash function may result in non-uniform distribution of search keys.
So, that the probability of bucket overflow is reduced, the number of buckets is selected to be (n/f)*(1+d), where d is a fudge factor typically around 0.2. Some space is wasted:r r
About 20 percent o the space in the buckets will be empty. But the advantages are that the probability of overflow is decreased.
Despite allocation of a few more buckets than needed, bucket overflow can still occur. We handle bucket overflow through using overflow buckets. If a record must be inserted into a bucket b, and b is already full, the system gives an overflow bucket for b, and inserts the record within the overflow bucket, and so on.
All the overflow buckets of a given bucket are chained together in a linked list. Overflow handling using like linked list is known as overflow chaining.