Data Structures and Algorithms

Exam Review for Binary Search Trees, Hash Tables, and Red-Black Trees

Binary Search Tree Deep Dive

Understanding BST operations through interactive visualizations

Learning Objectives: Understand BST structure, search, insert, delete operations, and traversal methods

BST Foundation

What is a Binary Search Tree?

A Binary Search Tree (BST) is a hierarchical data structure where:

  • Each node has at most two children
  • Left child < Parent < Right child
  • All operations average O(log n) time
  • In-order traversal gives sorted sequence

BST Search Operation

How Search Works

Starting from root, compare target with current node:

  • If equal: Found!
  • If target < current: Go left
  • If target > current: Go right
  • If null reached: Not found

BST Insert Operation

Building a Tree Step by Step

Watch how values [8,3,10,1,6,14,4,7,13] build our BST:

BST Insert Sequence

8 → (3,10) → (1,6) → (14) → (4,7) → (13)

Each value finds its correct position maintaining BST property

🧩 Visual Problem: Where would value 5 be inserted in our tree?

Answer: Between 4 and 6 as right child of 4

BST Delete Operation

Three Deletion Cases

  • Leaf node: Simply remove
  • One child: Replace with child
  • Two children: Replace with inorder successor

BST Traversals

Three Traversal Methods

Traversal Order Use Case
Inorder Left → Root → Right Sorted sequence
Preorder Root → Left → Right Tree copy
Postorder Left → Right → Root Tree deletion

DFS on Graphs

DFS on Complex Graph Structure

DFS works on graphs similar to tree traversal:

  • Stack-based execution shows deep exploration
  • Creates a spanning tree structure
  • Back edges connect to already visited vertices
  • Similar to pre-order traversal pattern
🧩 DFS Pattern Recognition: What traversal pattern does DFS resemble?

Answer: Pre-order traversal (visit node before children)

Interactive BST & DFS Practice

Question 1: Cascading BST Deletions

Starting from the BST constructed by inserting the keys [8, 3, 10, 1, 6, 14, 4, 7, 13] in that order:

  • Delete the keys in the order 3, 14, 8.
  • For each deletion, identify whether it is a leaf, one-child, or two-child case.
  • Draw the resulting tree after each deletion and give the final in-order traversal.

Idea: Apply the three BST deletion cases and carefully update the tree after each operation.

  1. Step 1 – Original tree.
    • 8
      • 3
        • 1
        • 6
          • 4
          • 7
      • 10
        • 14
          • 13
    This satisfies the BST property: all left keys < parent < all right keys.
  2. Step 2 – Delete 3 (two-child case).
    • Node 3 has two children (1 and 6) ⟶ use its in-order successor.
    • In-order successor = minimum of right subtree of 3 ⟶ node 4.
    • Copy key 4 into node 3, then delete the leaf node 4 under node 6.
    • 8
      • 4
        • 1
        • 6
          • 7
      • 10
        • 13
  3. Step 3 – Delete 14 (one-child case).
    • Node 14 has a single left child 13.
    • Replace 14 by its child 13.
    • 8
      • 4
        • 1
        • 6
          • 7
      • 10
        • 13
  4. Step 4 – Delete 8 (two-child case at the root).
    • Root 8 has two children (4 and 10).
    • In-order successor is the minimum in the right subtree ⟶ node 10.
    • Copy 10 into the root, then delete the original node 10 (one-child node with child 13).
    • 10
      • 4
        • 1
        • 6
          • 7
      • 13
  5. Step 5 – Final in-order traversal.

    In-order (Left → Root → Right) on the final tree gives:

    [1, 4, 6, 7, 10, 13]

Question 2: Search Path and Cost

Using the original BST from Question 1 (before any deletions), trace the search paths for keys 7 and 2.

  • List the nodes visited for each search in order.
  • Count the number of key comparisons.
  • Explain why unsuccessful searches in a BST still take O(h) time (h = tree height).

Idea: Follow the rule “go left if the key is smaller, right if it is larger”.

  1. Step 1 – Search for 7 (successful).

    Path for key 7:

    • Compare with 8 ⟶ 7 < 8 ⟶ go left to 3.
    • Compare with 3 ⟶ 7 > 3 ⟶ go right to 6.
    • Compare with 6 ⟶ 7 > 6 ⟶ go right to 7.
    • Compare with 7 ⟶ match found.

    Visited nodes: 8 → 3 → 6 → 7 ⇒ 4 key comparisons.

  2. Step 2 – Search for 2 (unsuccessful).

    Path for key 2:

    • Compare with 8 ⟶ 2 < 8 ⟶ go left to 3.
    • Compare with 3 ⟶ 2 < 3 ⟶ go left to 1.
    • Compare with 1 ⟶ 2 > 1 ⟶ go right to null.
    • Reaching null means 2 is not in the tree.

    We did 3 key comparisons (with 8, 3, 1) plus the final null check.

  3. Step 3 – Relating cost to height.
    • Each comparison takes us one level down the tree.
    • The longest possible path from the root has length equal to the tree height h.
    • Therefore, both successful and unsuccessful BST searches run in O(h) time.
    • If the tree is balanced, h = O(log n); if it is skewed, h = O(n).
Question 3: Balanced vs. Skewed BST Shapes

Consider inserting the keys [1, 2, 3, 4, 5, 6, 7] into:

  • (a) An initially empty BST by inserting them in that order.
  • (b) A perfectly balanced BST built from the same keys (for example, by inserting the median first, then recursively).

Compare the resulting heights and the worst-case search costs.

  1. Step 1 – Case (a): Sorted insertion → skewed tree.

    Insert 1, 2, 3, 4, 5, 6, 7 into an empty BST in that order:

    • 1
      • 2
        • 3
          • 4
            • 5
              • 6
                • 7
    • Each new key is larger than all previous ones ⟶ always inserted as the right child of the bottom node.
    • The tree degenerates into a linked list of length 7.
    • Height hskewed = 6 (root at level 0, leaf 7 at level 6).
  2. Step 2 – Case (b): Perfectly balanced BST.

    Take 4 as root (the median), then 2 and 6 as its children, etc. One possible perfectly balanced BST is:

    • 4
      • 2
        • 1
        • 3
      • 6
        • 5
        • 7
    • All levels except possibly the last are completely full.
    • Height hbalanced = 2 (levels 0, 1, 2).
  3. Step 3 – Impact on search complexity.
    • Worst-case search in a BST is proportional to the height h.
    • Skewed tree: h = 6 ⇒ O(7) = O(n) steps in the worst case.
    • Balanced tree: h = 2 ⇒ O(log₂ 7) steps ≈ O(log n).
    • This is why self-balancing trees (AVL, Red-Black) are so valuable: they prevent the O(n) worst-case behavior.

Fast Data Retrieval with O(1) Average Time

An introduction to hash tables and their importance in computer science

Learning Objectives: Understand hash functions, collision resolution, and practical applications

Why Hash Tables Matter

Hash tables allow very fast retrieval of data regardless of dataset size.

  • Problem solved: Linear search takes O(n) time
  • Solution: Array access is O(1)
  • Access time is independent of array size

Hash Function Fundamentals

Calculating the Index

For numeric keys:

address = key mod n (where n is array size)

For alphanumeric keys:

Sum each character's ASCII value, then apply mod n

Hash Function Example: Cartoon Characters

Array size = 11

Character Calculation Index
FRED F(70)+r(114)+e(101)+d(100) = 385 mod 11 0
TWEETY T(84)+w(119)+e(101)+e(101)+t(116)+y(121) = 642 mod 11 4
BUGS B(66)+u(117)+g(103)+s(115) = 401 mod 11 5
SCOOBY S(83)+c(99)+o(111)+o(111)+b(98)+y(121) = 623 mod 11 7

Collisions and Resolution

Collision: When two different keys hash to the same index

Linear Probing

If calculated position is occupied, find the next available slot:

Example: "Porky" hashes to index 5 (occupied by Bugs)
  • Check index 6: occupied (Daffy)
  • Check index 7: occupied (Scooby)
  • Check index 8: empty! ✓ Place Porky here

Alternative Collision Strategies

  • Quadratic Probing: Use quadratic distances (1², 2², 3²) to reduce clustering
  • Plus-3 Rehash: Check every 3rd position
  • Double Hashing: Apply second hash function to determine step size
  • Chaining: Each position has linked list for colliding items

Load Factor and Performance

Load Factor = Number of Items / Array Size

Best practice: Keep load factor ≤ 0.7 (70% occupancy)

Dynamic Resizing

When load factor exceeds threshold:

  1. Create new array (typically 2× size)
  2. Rehash all items into new array
  3. Replace old array with new array

Performance Analysis

Scenario Time Complexity Explanation
Best Case O(1) No collisions, direct access
Average Case O(1) Good hash function, load factor < 0.7
Worst Case O(n) Many collisions or poor hash function

Hash Function Design

Good Hash Function Characteristics:

  • Deterministic (same input → same output)
  • Fast to compute
  • Distributes keys uniformly
  • Uses all available address space
  • Minimizes collisions

Perfect Hashing

Perfect hash function: Produces unique index for each key

  • Requirements: All keys must be known in advance
  • Use cases: Compiler keywords, fixed configuration mappings

Real-World Applications

Examples in programming languages:

  • Java: HashMap, HashSet
  • Python: dict, set
  • JavaScript: Object, Map, Set
  • C++: unordered_map, unordered_set
Key Takeaway: Hash tables trade space for time to achieve fast lookups!

Interactive Hash Table Practice

Question 1: Building a Hash Table with Linear Probing

Using a hash table of size 11 and linear probing, insert the keys in this order: FRED, TWEETY, BUGS, SCOOBY, DAFFY, PORKY.

Assume the initial hash indices (from the cartoon example) are: FRED→0, TWEETY→4, BUGS→5, SCOOBY→7, DAFFY→6, PORKY→5.

  • Show how the table evolves as you insert the keys.
  • Count the collisions per key.
  • Compute the final load factor.
  1. Step 1 – Start with an empty table (size = 11).
    0
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    All slots are empty; load factor = 0/11 = 0.
  2. Step 2 – Insert FRED, TWEETY, BUGS, SCOOBY, DAFFY.
    • FRED → 0: slot 0 empty ⟶ place FRED at 0 (0 collisions).
    • TWEETY → 4: slot 4 empty ⟶ place TWEETY at 4 (0 collisions).
    • BUGS → 5: slot 5 empty ⟶ place BUGS at 5 (0 collisions).
    • SCOOBY → 7: slot 7 empty ⟶ place SCOOBY at 7 (0 collisions).
    • DAFFY → 6: slot 6 empty ⟶ place DAFFY at 6 (0 collisions).
    0FRED
    1
    2
    3
    4TWEETY
    5BUGS
    6DAFFY
    7SCOOBY
    8
    9
    10
    So far, 5 items, load factor = 5/11 ≈ 0.45.
  3. Step 3 – Insert PORKY with linear probing.
    • PORKY → 5: slot 5 contains BUGS ⟶ collision #1.
    • Probe index 6: contains DAFFY ⟶ collision #2.
    • Probe index 7: contains SCOOBY ⟶ collision #3.
    • Probe index 8: empty ⟶ place PORKY at 8.
    0FRED
    1
    2
    3
    4TWEETY
    5BUGS
    6DAFFY
    7SCOOBY
    8PORKY
    9
    10
    • PORKY incurred 3 collisions before finding slot 8.
    • Final number of items = 6 ⇒ load factor = 6/11 ≈ 0.545.
Question 2: Successful vs. Unsuccessful Probing Cost

Using the final hash table from Question 1, trace the probe sequence for:

  • A successful search for SCOOBY.
  • An unsuccessful search for some key X that hashes initially to index 5 (same cluster).

Assume hash(X) = 5, and X is not actually in the table.

  1. Step 1 – Successful search for SCOOBY.

    From Question 1, the relevant slots are:

    5BUGS
    6DAFFY
    7SCOOBY
    8PORKY
    • hash(SCOOBY) = 7 ⟶ probe slot 7 first.
    • Slot 7 contains SCOOBY ⟶ found after 1 probe.
  2. Step 2 – Unsuccessful search for X with hash(X) = 5.

    Linear probing sequence starting at index 5:

    • Probe 5: BUGS (≠ X) ⟶ continue.
    • Probe 6: DAFFY (≠ X) ⟶ continue.
    • Probe 7: SCOOBY (≠ X) ⟶ continue.
    • Probe 8: PORKY (≠ X) ⟶ continue.
    • Probe 9: empty ⟶ stop; X not in table.

    The unsuccessful search touches 5 slots (indices 5, 6, 7, 8, 9).

  3. Step 3 – Takeaway about cost.
    • Successful search time is proportional to the length of the probe sequence up to the matching key.
    • Unsuccessful search must scan through an entire cluster until it hits an empty slot.
    • As load factor grows and clusters form, the average probe length (and thus cost) increases.
Question 3: Load Factor, Resizing, and Amortized Cost

Suppose you implement a hash table with:

  • Initial capacity = 8.
  • Load factor threshold = 0.7 (resize when items / capacity > 0.7).
  • Resizing strategy: double capacity (8 → 16 → 32 → ...), rehash all items.

If you insert 12 distinct keys into an initially empty table, when do resizes happen, and how many total hash computations are performed?

  1. Step 1 – Track load factor as we insert.
    • Capacity = 8, threshold = 0.7 ⟶ max allowed items before resize ≈ 0.7 × 8 = 5.6.
    • So after inserting the 6th item, load factor becomes 6/8 = 0.75 > 0.7 ⟶ trigger resize.

    Events so far:

    • Insert items 1–6: 6 hash computations (one per insertion).
    • After item 6, resize to capacity 16 and rehash the 6 existing items (6 more computations).
  2. Step 2 – Continue inserting up to 12 items.
    • Now capacity = 16.
    • Threshold = 0.7 ⇒ max items before next resize ≈ 0.7 × 16 = 11.2.
    • We currently have 6 items; we will insert items 7–12 ⟶ 6 more insertions.

    During these insertions:

    • For each of the 6 new items, we compute its hash once ⟶ 6 more hash computations.
    • After inserting the 12th item, load factor = 12/16 = 0.75 > 0.7, so another resize would be triggered next, but we stop at 12 items per the question.
  3. Step 3 – Total hash computations and amortized cost.
    • 6 hashes for initial inserts (items 1–6).
    • 6 hashes to rehash those 6 items when growing from 8 → 16.
    • 6 hashes for inserts 7–12.

    Total = 6 + 6 + 6 = 18 hash executions for 12 insert operations.

    • Average (amortized) hash computations per insert = 18 / 12 = 1.5.
    • This shows that even with occasional expensive resizes, the amortized cost per insertion remains O(1).

Red-Black Trees: Balanced BST with O(log n) Guarantee

CSE310 Data Structures and Algorithms: Advanced Tree Balancing

Learning Objectives: Understand RB tree properties, rotations, insertion fixup, and deletion fixup

The Problem: Unbalanced BSTs

Aspect Unbalanced BST (Worst Case) Red-Black Tree (Guaranteed)
Height O(n) O(log n)
Insert O(n) O(log n) with ≤2 rotations
Delete O(n) O(log n) with ≤3 rotations

Red-Black Tree Definition

A Red-Black Tree is a Binary Search Tree with an additional color attribute (RED or BLACK) on each node that enforces specific structural properties.

Core Idea: Each node is colored either RED or BLACK. These colors enforce constraints that automatically maintain balance.

RB Tree Properties

Property 1: Color Attribute

Each node is either RED or BLACK

Property 2: Root is Black

The root node is always BLACK

Property 3: Null Leaves are Black

All NIL (null) leaves are BLACK

Property 4: Red Node Constraint

If a node is RED, then both its children must be BLACK (no two consecutive red nodes)

Property 5: Black Height Property

All paths from any node to its descendant leaves contain the same number of BLACK nodes

Why These Properties Ensure Balance

  • Property 4 prevents long chains - no chain of 3 nodes possible without violating constraints
  • Property 5 ensures the tree doesn't become skewed
  • Result: Height ≤ 2·lg(n+1)
Balance Guarantee: Longest path ≤ 2 × shortest path

Since all paths have same # of black nodes, and red nodes can't be consecutive, the tree remains approximately balanced

Rotations: The Balancing Mechanism

Rotation is a local restructuring operation that maintains BST property while changing tree structure.

Left Rotation on x:

  • Right child y moves up
  • y's left subtree becomes x's right subtree
  • Maintains BST property: x < y

Right Rotation on x:

  • Left child y moves up
  • y's right subtree becomes x's left subtree
  • Mirror image of left rotation

Insertion in Red-Black Trees

Strategy: Always insert as RED because this is less likely to violate Property 5 (black height)

The Problem:

New red node might have red parent (violates Property 4)

The Solution:

Walk up tree fixing violations - three main cases based on uncle's color and node position

Case 1: Uncle is RED (Recoloring Case)

Parent P and uncle U are both RED, grandparent G is BLACK, and new node N is RED under P.

Before fix-up

  • G
    • P
      • N
    • U

After recolor

  • G
    • P
      • N
    • U
  1. Recolor parent P and uncle U from RED → BLACK.
  2. Recolor grandparent G from BLACK → RED.
  3. Now treat G as the newly inserted node and move the “fix-up” pointer up to G.
  4. If G is the root, recolor it BLACK at the end to satisfy Property 2.

Case 2: Uncle BLACK, Triangle Configuration

Uncle U is BLACK, and node N forms a “triangle” with parent P and grandparent G. Example: N is the right child of a left child (left-right) or left child of a right child (right-left).

Before fix-up (left-right example)

  • G
    • P
      • N
    • U

N is right child of P, P is left child of G (triangle)

After first rotation (convert to Case 3)

  • G
    • N
      • P
    • U

Rotate left around P ⟶ N becomes child of G

  1. Because N and P form a triangle with G, perform a rotation at P to “straighten” the path.
  2. For the left-right case:
    • Rotate left around P so that N moves up and P becomes N’s left child.
  3. After this rotation, the configuration becomes a “line” (N is now directly under G).
  4. Relabel N as the new “inserted” node and fall through to Case 3 on the new configuration.

Case 3: Uncle BLACK, Line Configuration

Uncle U is BLACK, and node N, parent P, and grandparent G form a straight line (e.g., left-left or right-right).

Before fix-up (left-left example)

  • G
    • P
      • N
    • U

After rotation & recolor

  • P
    • N
    • G
      • U
  1. Perform a rotation at G to bring P up (right rotation in left-left case, left rotation in right-right case).
  2. After the rotation:
    • Color P BLACK.
    • Color G RED.
  3. This fixes the “red-red” violation and preserves the black-height property.
  4. The path from the root down remains short, maintaining O(log n) height.

Insertion Complexity:

  • Each iteration either stops (Cases 2–3) or moves z up 2 levels (Case 1)
  • Maximum iterations: O(log n) because tree height is O(log n)
  • Total time: O(log n) iterations × O(1) per iteration = O(log n)

Deletion in Red-Black Trees

Why it's complex: Deleting a BLACK node violates black-height property

The Core Problem:

When a BLACK node is deleted, one path loses a black node. This violates Property 5 (black-height constraint)

The Solution:

Mark replacement node as "double-black" - conceptually carrying an extra black. Propagate this mark up until it can be eliminated through recoloring and rotations.

Case 1: Sibling is RED

Node x is double-black, its sibling S is RED, and parent P is BLACK. We rotate to transform into a case where the sibling is BLACK.

Before fix-up (x is left child)

  • P
    • x
    • S

After rotation & recolor

  • S
    • P
      • x
  1. Recolor S to BLACK and P to RED.
  2. If x is a left child, perform a left rotation on P (mirror: right rotation if x is a right child).
  3. After rotation, x still has “double-black”, but its new sibling is BLACK.
  4. Now you are in one of Cases 2–4 with a BLACK sibling and can continue the fix-up.

Case 2: Sibling BLACK, Both Sibling's Children BLACK

Sibling S is BLACK and both of S's children are BLACK (or NIL). We push the extra black up.

Before

  • P
    • x
    • S
      • A
      • B

After recolor (x still conceptually double-black)

  • P
    • x
    • S
      • A
      • B
  1. Recolor sibling S from BLACK → RED.
  2. “Remove” one black from both x and S by moving the double-black problem up to the parent P.
  3. If P was RED, recoloring it to BLACK ends the fix-up.
  4. If P was BLACK, it now becomes double-black and you continue the process at P.

Case 3: Sibling BLACK, Near Child RED, Far Child BLACK

Sibling S is BLACK, the child of S closer to x (the “near” child) is RED, and the “far” child is BLACK. We rotate and recolor to transform into Case 4.

Before (x is left child, near child = SL)

  • P
    • x
    • S
      • SL
      • SR

After rotation around S

  • P
    • x
    • SL
      • S
      • SR
  1. Recolor SL to BLACK and S to RED.
  2. Rotate right around S (if x is left child; mirror with left rotation if x is right child).
  3. Now the new sibling of x has a RED far child and BLACK near child ⟶ configuration of Case 4.
  4. Proceed to Case 4 using this new configuration.

Case 4: Sibling BLACK, Far Child RED

Sibling S is BLACK and its far child (farther from x) is RED. This is the final case that lets us remove the double-black and restore all properties with a single rotation.

Before (x is left child, far child = SR)

  • P
    • x
    • S
      • SL
      • SR

After rotation & recolor

  • S
    • P
      • x
      • SL
    • SR
  1. Set S’s color to P’s color (copy parent’s color down into S).
  2. Recolor P and SR to BLACK.
  3. Perform a left rotation at P (if x is left child; mirror with right rotation if x is right child).
  4. This removes the double-black on x and restores all RB properties; the fix-up terminates.

Performance Summary

Operation Time Complexity Details
Search O(log n) Height is O(log n), standard BST search
Insertion O(log n) BST insert O(log n) + fixup O(log n), ≤2 rotations
Deletion O(log n) BST delete O(log n) + fixup O(log n), ≤3 rotations
Space O(n) One color bit per node + parent pointer

Real-World Applications

Java Collections:

  • java.util.TreeMap - sorted key-value storage
  • java.util.TreeSet - sorted set with O(log n) operations

C++ STL:

  • std::map, std::set, std::multimap, std::multiset

System Software:

  • CFS Scheduler: Linux uses RB trees for process scheduling
  • Database Indexing: B+ trees and RB tree variants for in-memory indexing

RB Trees vs AVL Trees

Aspect Red-Black Tree AVL Tree
Balance h ≤ 2·lg(n+1) h ≤ 1.44·lg(n)
Insert Rotations ≤2 O(log n)
Delete Rotations ≤3 O(log n)
Best For Frequent updates Search-heavy workloads

References

  • CLRS: "Introduction to Algorithms" Chapter 13
  • Visualizations: VisuAlgo and Programiz have excellent step-by-step animations
  • Practice: Implement insertion and deletion fixup to master these concepts

Interactive Red-Black Tree Practice

Question 1: Step-by-Step Red-Black Insertions

Starting from an empty Red-Black tree, insert the keys [41, 38, 31, 12, 19, 8] using the standard RB insertion algorithm (new node is RED, then fix-up).

  • For each insertion, state which of the three insertion cases (or their mirror) applies.
  • Identify when rotations occur.
  • Show the final tree with correct colors.
  1. Step 1 – Insert 41.
    • Tree was empty; 41 becomes the root.
    • Root must be BLACK (Property 2).
    • 41
  2. Step 2 – Insert 38.
    • 38 < 41 ⟶ insert as left child of 41, color RED.
    • Parent (41) is BLACK, so no properties are violated.
    • 41
      • 38
  3. Step 3 – Insert 31 (first rotation).
    • 31 < 41, 31 < 38 ⟶ left child of 38, colored RED.
    • Violation: 31 (RED) with parent 38 (RED) ⟶ Property 4 break.
    • Uncle of 31 is NIL (BLACK) and the configuration is a straight line (31 is left child of left child).
    • This is Case 3 (line configuration): right-rotate around 41 and recolor.

    After fix-up:

    • 38 becomes the new root (BLACK).
    • 31 and 41 become its RED children.
    • 38
      • 31
      • 41
  4. Step 4 – Insert 12 (recoloring case).
    • 12 < 38, 12 < 31 ⟶ left child of 31, color RED.
    • Parent 31 is RED and uncle 41 is RED ⟶ Case 1 (uncle RED).
    • Recolor parent and uncle to BLACK, grandparent (38) to RED.
    • Because 38 is the root, we recolor it back to BLACK at the end.

    Resulting tree:

    • 38
      • 31
        • 12
      • 41
  5. Step 5 – Insert 19 (rotation + recolor).
    • 19 < 38, 19 < 31, 19 > 12 ⟶ right child of 12 (RED).
    • Parent 12 is RED, uncle (right child of 31) is NIL (BLACK).
    • Configuration is a triangle (left-right) w.r.t. grandparent 31 ⟶ Case 2 then Case 3.
    • First, rotate left around 12, then rotate right around 31 with recoloring.

    After fix-up, the left subtree of 38 becomes:

    • 19 (BLACK) as left child of 38.
    • 12 (RED) as left child of 19.
    • 31 (RED) as right child of 19.
  6. Step 6 – Insert 8 (final recoloring).
    • 8 < 38, 8 < 19, 8 < 12 ⟶ left child of 12, color RED.
    • Parent 12 is RED and uncle 31 is RED ⟶ Case 1 again.
    • Recolor 12 and 31 to BLACK, recolor 19 to RED.
    • Parent of 19 is 38 (BLACK), so the loop terminates and 38 stays BLACK.

    Final Red-Black tree:

    • 38
      • 19
        • 12
          • 8
        • 31
      • 41

    All RB properties hold: root is BLACK, no two REDs in a row, and every path from a node to NIL has the same number of BLACK nodes.

Question 2: Computing Black Height

Using the final Red-Black tree from Question 1, compute the black height bh(x) for each internal node x, where bh(x) is the number of BLACK nodes on any simple path from x down to a NIL leaf (excluding x itself).

  • Verify that all root-to-leaf paths have the same black height.
  • Explain how this implies the tree is approximately balanced.
  1. Step 1 – Redraw the final tree with colors.
    • 38
      • 19
        • 12
          • 8
        • 31
      • 41
    We treat every missing child pointer as a NIL leaf (BLACK).
  2. Step 2 – Compute black height bottom-up.
    • All NIL children are BLACK with bh(NIL) = 0 by definition.
    • Node 8 is RED, its children are NIL ⟶ the number of BLACK nodes below it is 0, so bh(8) = 0.
    • Node 12 is BLACK with left child 8 and right child NIL: both subpaths from 12 to leaves contain exactly one BLACK node (NIL) after 12, so bh(12) = 1.
    • Node 31 is BLACK with two NIL children ⟶ bh(31) = 1.
    • Node 41 is BLACK with NIL children ⟶ bh(41) = 1.
    • Node 19 is RED with children 12 and 31 ⟶ both subtrees below 19 have black height 1, so bh(19) = 1.
  3. Step 3 – Black height of the root and balance implication.
    • From the root 38 (BLACK), any path to a NIL leaf passes through some number of BLACK nodes below it (here 1) plus the NIL leaf.
    • Thus bh(38) = 2 in the conventional RB definition.
    • Every root-to-leaf path has the same number of BLACK nodes (2), so Property 5 holds.
    • Because reds cannot be consecutive, the longest path from the root can have at most twice as many nodes as the shortest path.
    • Therefore the tree height h is at most 2 × bh(root), which is O(log n), giving logarithmic search, insert, and delete time.
Question 3: Proving h ≤ 2·log₂(n + 1)

Let a Red-Black tree have n internal nodes and height h. Show that h ≤ 2·log₂(n + 1).

  • Use the fact that the black height of the root is at least h/2.
  • Relate black height to a lower bound on the number of nodes.
  1. Step 1 – At least half the nodes on any root-to-leaf path are BLACK.
    • No two RED nodes can be adjacent (Property 4).
    • Along any path, RED nodes must be separated by at least one BLACK node.
    • Therefore, the number of RED nodes on a path is at most the number of BLACK nodes.
    • So the length of the path h is at most 2 × (number of BLACK nodes on that path).

    Let b be the black height of the root (number of BLACK nodes on any root→leaf path, excluding the root in the formal definition); then h ≤ 2b.

  2. Step 2 – Minimum number of nodes for a given black height.
    • Consider a RB tree where every node on every path is BLACK (no RED nodes at all).
    • Such a tree with black height b is a perfectly balanced BLACK-only tree.
    • This tree has at least 2b − 1 internal nodes (full binary tree of height b − 1 with only BLACK nodes).
    • Any RB tree with black height b has at least that many nodes:

    n ≥ 2b − 1.

  3. Step 3 – Combine inequalities to bound h in terms of n.
    • From Step 2: n ≥ 2^b − 1 ⇒ n + 1 ≥ 2^b.
    • Taking log₂ on both sides: log₂(n + 1) ≥ b.
    • From Step 1: h ≤ 2b.
    • Therefore: h ≤ 2 · b ≤ 2 · log₂(n + 1).

    This proves that the height of a Red-Black tree is always O(log n), which is the key to its worst-case guarantees.