Augmenting Data Structures

Zhao Cong

Dynamic order statistics

  • the rank of an element—its position in the linear order of the set
  • An order-statistic tree T is simply a red-black tree with additional information stored in each node.we have another attribute, . This attribute contains the number of (internal) nodes in the subtree rooted at x (including x itself), that is, the size of the subtree. If we define the sentinel’s size to be 0.
  • we have the identity: x.size = x.left.size + x.right.size+1
  • We do not require keys to be distinct in an order-statistic tree.we solve it by defining the rank of an element.

Retrieving an element with a given rank

1
2
3
4
5
6
7
OS-SELECT(x,i) 
1 r = x.left.size + 1
2 if i == r
3 return x
4 elseif i < r
5 return OS-SELECT(x.left, i)
6 else return OS-SELECT(x.right, i-r)
  • If , then node is the smallest element
  • If , then the smallest element resides in x’s left subtree
  • If , then the smallest element resides in x’s right subtree
  • the running time of OS-SELECT is for a dynamic set of elements.

Determining the rank of an element

1
2
3
4
5
6
7
8
OS-RANK(T,x) 
1 r = x.left.size + 1
2 y = x
3 while y ≠ T.root
4 if y == y.p.right
5 r = r+y.p.left.size+1
6 y = y.p
7 return r
  • We can think of node rank as the number of nodes preceding in an inorder tree walk, plus 1 for itself. ## Maintaining subtree sizes

  • insertion into a red-black tree consists of two phases.

    • The first phase goes down the tree from the root, inserting the new node as a child of an existing node.
    • The second phase goes up the tree, changing colors and performing rotations to maintain the red-black properties
  • To maintain the subtree sizes in the first phase, we simply increment for each node x on the simple path traversed from the root down toward the leaves.

  • Referring to the code for LEFT-ROTATE.(T,x) , we add the following lines:

    1
    2
    13 y.size = x.size 
    14 x.size = x.left.size + x.right.size + 1

  • Deletion from a red-black tree also consists of two phases:

    • the first operates on the underlying search tree
    • the second causes at most three rotations and otherwise performs no structural changes
  • To update the subtree sizes, we simply traverse a simple path from node y (starting from its original position within the tree) up to the root, decrementing the size attribute of each node on the path.

  • We handle the rotations in the second phase of deletion in the same manner as for insertion.

  • Thus, both insertion and deletion, including maintaining the size attributes, take time for an n-node order-statistic tree.

How to augment a data structure

  • We can break the process of augmenting a data structure into four steps:
    • Choose an underlying data structure
    • Determine additional information to maintain in the underlying data structure
    • Verify that we can maintain the additional information for the basic modifying operations on the underlying data structure.
    • Develop new operations

Augmenting red-black trees

  • Theorem 14.1 (Augmenting a red-black tree):Let f be an attribute that augments a red-black tree T of n nodes, and suppose that the value of f for each node x depends on only the information in nodes x, x:left, and x:right, possibly including x:left:f and x:right:f. Then, we can maintain the values of f in all nodes of T during insertion and deletion without asymptotically affecting the O.lg n/ performance of these operations.

Interval trees

  • We can represent an interval as an object , with attributes (the low endpoint) and (the high endpoint). We say that intervals and overlap if , that is, if and . Any two intervals andsatisfy the interval trichotomy; that is, exactly one of the following three properties holds:
    • and overlap
    • is to the left of i0(i.e., ),
    • is to the right of i0(i.e.,)
  • An interval tree is a red-black tree that maintains a dynamic set of elements, with each element x containing an interval .
  • In addition to the intervals themselves, each node contains a value , which is the maximum value of any interval endpoint stored in the subtree rooted at .
1
2
3
4
5
6
7
INTERVAL-SEARCH(T,i) 
1 x = T.root
2 while x ≠ T.nil and i does not overlap x.int
3 if x.left ≠ T.nil and x.left.max ≥ i.low
4 x = x.left
5 else x = x.right
6 return x
  • Theorem 14.2:Any execution of INTERVAL-SEARCH.(T, i) either returns a node whose interval overlaps , or it returns and the tree contains no node whose interval overlaps .