## Balanced Trees.

Balanced trees are traditionally used for indexing linearly ordered data, - a task also known as the problem of maintaining dynamic dictionaries. A number of balance conditions have been developed over time.

The first example of a balanced tree, an AVL-tree, was introduced in 1962 by G.M.Adelson-Velskii and E.M.Landis. In 1970 J.Hopcroft developed 2-3-trees. A generalization of the latter, called B-trees, was proposed by R.Bayer and E.McCreight in 1971. A number of variants of the classical B-trees have been examined in literature by now. B*-trees, B+-trees, symmetric B-trees, and (a,b)-trees are among them.

## The history of balanced trees.

• 1962       AVL-tree             G.M.Adelson-Velskii and E.M.Landis
• 1970       2-3-tree               J.Hopcroft
• 1971       B-tree                     R.Bayer, E.McCreight
• 1971       red-black-tree   R.Bayer
• B*-tree,   B+tree,   (a,b)-tree
• 1992       S(1)-tree               utilization 1/2 - ε
• 1994       S(2)-tree               utilization 2/3 - ε
• 1995       S(b)-tree               utilization 1 - ε

Various balance conditions used in balanced trees are intended to provide fast logarithmic time access to the data.

Utilization characterizes the space overhead of storing data in a particular type of balanced trees. Utilization is defined as the ratio of the size of the original data to the space used to represent the data set as a balanced tree.

The main disadvantage of B-trees is that they provide poor utilization, when used to store variable length keys, e.g. strings. If the average length of keys stored in a B-tree is substantially smaller than the key length maximum, then it is not possible to guarantee any lower bound for utilization of B-trees greater than 0, even though that the number of elements in the tree is always more than a half of its capacity.

## S(b)-tree is a generalization of B-trees.

A new type of balanced trees, called S(b)-trees, was developed in order to efficiently maintain variable length keys. Contrary to B-trees, S(b)-trees provide optimal utilization of keys of variable length, while the data access time remains logarithmic, the same as for B-trees. The difference with B-trees is that a node capacity for S(b)-trees is evaluated by the total length of keys stored in the node, rather than by their number as in B-trees. The balance condition for S(b)-trees is based on their local incompressibility. That is, any sequence consisting of b+1 neighboring nodes of the tree cannot be compressed into b well-formed nodes.

This is the summary of results proven for S(b)-trees:
• S(b)-tree is a generalization of B-trees: a B-tree is a special case of a S(0)-tree.
• The lower bound of utilization for S(b)-trees is 1 - ε, where ε is inversely proportional to the tree branching.
• Search, insertion, and deletion of a key in a n-node S(b)-tree can be performed in time O(log n).
• S(b)-trees form a shrinking hierarchy by parameter b. That is if b < b' then the class of S(b')-trees is strictly contained in the class of S(b)-trees.