Building a Fibonacci Heap

28 Mar

A Fibonacci heap is a heap data structure which leverages laziness to obtain a few favourable time complexities. A Fibonacci heap is used as a priority queue, usually used in graph algorithms to find the shortest path, such as A* or Dijkstra's algorithm. Compared to a binary heap, a Fibonacci heap has an amortised time complexity for inserts and key decreases of O(1) versus a binary heap's O(log n). Popping off the minimum item from the heap has the same complexity of O(log n). It is important to stress that this cost is amortised, as we'll see that it comes from the lazy way a Fibonacci heap maintains its structure. This blog post delves into building a Fibonacci heap in Rust, and compares the performance with std's BinaryHeap.

Some terminology

The implementation presented will just focus on two operations: pushing and popping. The names of the operations align with std's BinaryHeap, however they are usually called insert and delete-min respectively in literature. We will also use the term degree to denote the number of direct children a root node of a tree has. Other operations, such as decrease-key and meld (merge) are out of scope for this post, however it could be added to the code base.

Attempt Version #1

A contrasting feature of a Fibonacci heap to a binary heap is how a Fibonacci heap will contain a collection of 'root' trees, rather than a single tree. Each root tree will satisfy the minimum-heap property: for any given parent node P, none of its children will be smaller than P. Note that we are designing a min-heap, but the reverse is true for a max-heap. So to begin we need a list of root trees, usually this is stored in a doubly linked list. Linked lists are notoriously unwieldy in Rust, so this is likely a spot to come back to. Let's define the heap like so, with a placeholder for the tree structures.

use std::collections::LinkedList;

pub struct FibonacciHeap<T> {
    roots: LinkedList<Tree<T>>,
}

struct Tree<T>(T);

A common operation is to get the heap's minimum value. Known as find-min, this operation is O(1) for Fibonacci heaps (and binary heaps). Rust's BinaryHeap has a method peek which is the find-min equivalent. Our Fibonacci heap will likewise define a method peek, returning the minimum value for the whole heap. We could implement it as a search through the roots, since the root will be the minimum value for each tree. This would take linear time, so to obtain constant time we want to have some way to know which root has the minimum value. In other programming languages we could use a pointer or reference to the minimum tree. However, with Rust we cannot (easily) do this, since we would be self-referencing our struct. We could use an index (slow for a linked list), but we could also leverage the structure of the linked list to uphold an invariant. Our invariant is that the first item in the list will store the minimum value. We will see later how to maintain that invariant. Let's define the peek method (we will get to the Tree impl later)

impl<T: Ord> FibonacciHeap<T> {
    pub fn peek(&self) -> Option<&T> {
        self.roots.front().map(Tree::root)
    }
}

impl<T> Tree<T> {
    fn root(&self) -> &T {
        &self.0
    }
}

Pushing operation

Now let us tackle pushing, because with a Fibonacci heap it is really simple. In some ways, the pushing operation is the key change from a binary heap. In a binary heap, when a push occurs the heap is rebalanced, incurring the O(log n) time complexity. A Fibonacci heap however delays any restructuring. Instead, lets think of a push as just adding a singleton tree into the roots. That way it is a constant time operation. We do need to uphold the minimum root invariant, this is fairly simple; if the pushed value is less than or equal to the minimum value, it goes at the front of the list, otherwise it is appended to the back.

impl<T: Ord> FibonacciHeap<T> {
    pub fn push(&mut self, item: T) {
        if self.peek().map(|o| &item <= o).unwrap_or(true) {
            // item is lt or eq to min value, or list is empty
            // push to front, becoming **new min**
            self.roots.push_front(Tree::new(item));
        } else {
            self.roots.push_back(Tree::new(item));
        }
    }
}

Popping operation

Now we come to popping the minimum value, and all our laziness catches up to us. To help explain what is about to occur, we need to delve into the structure of the heap a little. Let us start with a heap like so:

The minimum is node _1_, so popping of the node leaves us with two new trees _2_ and _4_. So now we can just re-evaluate the minimum node right? In this heap it would take 3 iterations, however it is pretty easy to see that it has a worst case of _n_ if all nodes are as root trees (which is the case from a new heap and a bunch of pushes).

So in the pop operation, we also rebalance the heap, which is where we obtain an amortised running time of O(log n). The method used is to group trees which have the same degree in an iterative fashion. Implementations use a temporary array to place the trees. When two trees are grouped, the tree with the larger root becomes a child of other tree (maintaining the min-heap) property. This increases the subsequent tree's degree, and the process repeats.

In our example above, node _2_ has degree 0 and occupies the 0th place of the temporary array. Nodes _3_ and _4_ both have degree 2, so they merge, with node _4_ becoming a child of node _3_.

The final step is to find the minimum value, which we iterate over the roots. Importantly, the process of grouping like degree trees reduces the number of roots to (less than) log n. Thus, finding the minimum is now a O(log n) operation.

To translate this into our Rust code, we need to properly define a tree structure.

struct Tree<T> {
    node: T,
    children: Vec<Tree<T>>
}

impl<T> Tree<T> {
    fn degree(&self) -> usize {
        self.children.len()
    }
}

The pop operation works in three phases:

It pops from the front of the root list, since this is the minimum value that we uphold,
- If the heap is empty we return None,
- If there is some, store the node for return later, and extend the roots with the child trees
Perform the rebalance operation,
Perform a relink of the list to uphold the minimum-at-front property invariant.

The last two phases are done in functions to assist with testing. Both of them manipulate the linked list in some way.

impl<T: Ord> FibonacciHeap<T> {
    pub fn pop(&mut self) -> Option<T> {
        // take the front of the roots, since this is the _minimum_ value
        let Tree { node, children } = match self.roots.pop_front() {
            Some(x) => x,
            None => return None,
        };

        // add the child tree into the roots
        self.roots.extend(children);

        // perform the grouping of like-degrees
        rebalance(&mut self.roots);

        // find the minimum root value
        bring_min_to_front(&mut self.roots);

        Some(node)
    }
}

The rebalance function is defined below. We use some fancy pattern matching and take semantics on the array. By the end of it, there will be no trees in the roots linked list, and the buf will have a set of trees with unique degrees. The final step is to place these trees back into the linked list.

/// Rebalances the list of roots such that no two roots share the same degree.
/// The method employed uses a temporary array to order the trees by degrees.
/// This has a worst case of `O(n)` but is _amortised_ as `O(log n)`.
fn rebalance<T: Ord>(roots: &mut LinkedList<Tree<T>>) {
    if roots.is_empty() {
        return;
    }

    // a recursive counting function of then nodes
    let nodes = count_nodes(roots);

    // NOTE: this will panic if nodes == 0
    let cap = nodes.ilog2() + 1;

    // initialise temp array with log2 of length
    let mut buf: Vec<Option<Tree<T>>> =
        std::iter::repeat_with(|| None).take(cap as usize).collect();

    // iterate through the roots
    while let Some(mut tree) = roots.pop_front() {
        loop {
            let degree = tree.degree();
            // if a tree returns here, we need to repeat the loop since
            // the degrees would have increased by one
            tree = match buf[degree].take() {
                // most simple, slot was unoccupied so we just
                // insert tree into it and stop the loop
                None => {
                    buf[degree] = Some(tree);
                    break;
                }
                // there was already a tree with the same degree
                // and the new tree has a lesser root value
                // make the old tree a child of the new one
                Some(tree_b) if tree.root() <= tree_b.root() => {
                    tree.children.push(tree_b);
                    tree
                }
                // there was already a tree with the same degree
                // and the new tree has a greater root value
                // make the new tree a child of the old one
                Some(mut tree_b) => {
                    tree_b.children.push(tree);
                    tree_b
                }
            };
        }
    }

    // place the roots back into the linked list
    roots.extend(buf.into_iter().filter_map(|x| x));
}

Recycling the list in bring_min_to_front is somewhat less complex. It works by finding the index of the minimum root. This iterates linearly through the root nodes. We then split the linked list at that index. So now there are two linked lists, one where the minimum is at the front, upholding our invariant! Simply append the other linked list onto it and now our list is ordered the way we need it to be.

fn bring_min_to_front<T: Ord>(roots: &mut LinkedList<Tree<T>>) {
    let min_index = roots
        .iter()
        .enumerate()
        .min_by_key(|(_, t)| t.root())
        .map(|(idx, _)| idx);

    if let Some(idx) = min_index {
        // split off returns a LinkedList with `idx` at the front (our minimum)
        let mut split = roots.split_off(idx);
        // append the remaining roots to the **end** of the split
        split.append(roots);
        // set the roots reference to be the new list
        *roots = split;
    }
}

And so we have a working pop operation! Now would be a good time to implement a few tests. I will not delve into those here, I use a few smoke tests, implement some debug assertions, and run quickcheck.

Performance

Once satisfied that the heap performs as intended. It is worth testing the performance. We want to confirm a few properties:

O(1) time for peek/push operations
O(log n) time for pop operations
Compare performance against BinaryHeap

Peek performance

Benchmarks were run using criterion, and (surprise), the peek operation in both our Fibonacci heap and BinaryHeap are both O(1) and very fast.

Benchmark	Time
`std::BinaryHeap::peek`	461.87 _ps_
`v1::FibonacciHeap::peek`	462.45 _ps_

Push performance

To test push performance, two styles of benchmark are used. The first one iterates the immediate push of an element on varying sized heaps consisting of random numbers in the range 0..1000. The number to push is 500, chosen since it is the median of the range. This should highlight the differences in time complexity between the two heaps and somewhat confirm the time complexity for our heap. The second benchmark is a construction of each heap with 10,000 random elements. This benchmark is used to compare the performance of the two heaps.

Benchmark	Time
`std::BinaryHeap::push one-el n100`	112.60 _ns_
`std::BinaryHeap::push one-el n10_000`	2.1555 _µs_
`std::BinaryHeap::push one-el n10_000_000`	928.70 _µs_
`v1::FibonacciHeap::push one-el n100`	1.5052 _µs_
`v1::FibonacciHeap::push one-el n10_000`	179.17 _µs_
`v1::FibonacciHeap::push one-el n10_000_000`	239.20 _ms_
`std::BinaryHeap::from_iter n10_000`	33.490 _µs_
`v1::FibonacciHeap::from_iter n10_000`	348.81 _µs_

🤔 Hrm. That does not seem right. It is suspicious that the large heaps are taking much longer to run, I mean ~240 _ms_ for a simple addition of a node? Well it turns out that we are timing the destructor. Criterion supports avoiding this through calling a bencher which passes through a mutable reference to the heap, avoiding cleaning it up when the timing routine finishes. Switching to this benching function yields the following.

Benchmark	Time
`std::BinaryHeap::push one-el n100`	83.751 _ns_
`std::BinaryHeap::push one-el n10_000`	3.69 _µs_
`std::BinaryHeap::push one-el n10_000_000`	255.79 _µs_
`v1::FibonacciHeap::push one-el n100`	34.670 _ns_
`v1::FibonacciHeap::push one-el n10_000`	71.157 _ns_
`v1::FibonacciHeap::push one-el n10_000_000`	356.00 _ns_
`std::BinaryHeap::from_iter n10_000`	40.691 _µs_
`v1::FibonacciHeap::from_iter n10_000`	252.69 _µs_

So we show that, yes, our implementation does somewhat show a constant time push operation. This is very promising compared to BinaryHeap, yet a straight construction from an iterator is slower... This suggests that our backing linked list is becoming a bottleneck (indeed, perusing BinaryHeap's source code there are smarts to reserve enough allocation space for the incoming elements). It should also be noted that BinaryHeap::push states that it has an amortised time complexity of O(1).

Popping performance

Testing the time complexity is harder with our popping operation, since the complexity is amortised. We would see very poor performance on the first pop where it would likely take linear time to compute. Instead, we benchmark the time taken drain a heap through repeated pops. Note that this is not using a drain iterator, rather we manually pop until the heap is empty. We also benchmark a 'general' use case of random collection of operations (push/pop). This gives a feel of the performance in a 'real world' scenario.

Benchmark	Time
`std::BinaryHeap drain 1000`	15.282 _µs_
`std::BinaryHeap drain 100_000`	3.3054 _ms_
`v1::FibonacciHeap drain 1000`	1.7628 _ms_
`v1::FibonacciHeap drain 100_000`	dnf
`std::BinaryHeap randomops 10_000`	143.32 _µs_
`v1::FibonacciHeap randomops 10_000`	10.856 _ms_

The drain benchmarks are inconclusive to revealing the time complexity, even BinaryHeap is not performing as expected.

Can we improve? Version 2

Let's start with the drain benchmarks, this should hone in on the popping operation. Running flamegraph-rs over the drain 1000 benchmark, a surprising result is observed: the recursive count_nodes function is taking up the bulk of the time! Oops. Replacing the function with an extra usize field yielded a 3x speed up in the drain 1000 benchmark, and the _drain 100_000_ finished with ~100 _ms_ time.

pub struct FibonacciHeap<T> {
     roots: LinkedList<Tree<T>>,
+    len: usize,
 }

 impl<T: Ord> FibonacciHeap<T> {
     pub fn push(&mut self, item: T) {
         // ...
+
+        self.len += 1;
     }

    pub fn pop(&mut self) -> Option<T> {
        // take the front of the roots, since this is the _minimum_ value
        let Tree { node, children } = match self.roots.pop_front() {
            Some(x) => x,
            None => return None,
        };

+       // reduce the number of nodes
+       self.len -= 1;
+

        // add the child tree into the roots
        self.roots.extend(children);

        // perform the grouping of like-degrees
-       rebalance(&mut self.roots);
        rebalance(&mut self.roots, self.len);

        // find the minimum root value
        bring_min_to_front(&mut self.roots);

        Some(node)
    }
}

-fn rebalance<T: Ord>(roots: &mut LinkedList<Tree<T>>) {
+fn rebalance<T: Ord>(roots: &mut LinkedList<Tree<T>>, nodes: usize) {
     if roots.is_empty() {
         return;
     }

-    let nodes = count_nodes(roots);
-
     // NOTE: this will panic if nodes == 0
     let cap = nodes.ilog2() + 1;

     // ...
}

Using another flamegraph, the bottleneck in popping now dominated by two items in the rebalance function: LinkedList::pop_front, and Vec::push. The linked list popping operation spends much of the time unboxing the value. A change we could make is to use a Vec instead of a LinkedList to house the root nodes. The Vec::push is coming from pushing a tree as a child of another one. This will be harder to optimise without sacrificing space, so let's start by replacing our roots list with a Vec. We could use an index to track the minimum root, but we could also use a similar technique we used with the linked list instead using the last element in the vector as the minimum element. This makes it easy to pop and swap out and saves us having to update an index. The change is pretty simple, almost a drop-in replacement. We take this opportunity to make another change in bring_min_to_front. We rename it to something more generic (order_min), and leverage []::swap instead of split_off. The changes yield around a 30% performance benefit.

-use std::collections::LinkedList;
-
 pub struct FibonacciHeap<T> {
-    roots: LinkedList<Tree<T>>,
+    roots: Vec<Tree<T>>,
     len: usize,
 }

 impl<T: Ord> FibonacciHeap<T> {
     pub fn push(&mut self, item: T) {
-        if self.peek().map(|o| &item <= o).unwrap_or(true) {
-            // item is lt or eq to min value, or list is empty
-            // push to front, becoming **new min**
-            self.roots.push_front(Tree::new(item));
-        } else {
-            self.roots.push_back(Tree::new(item));
+        // item is lt or eq to min value, or list is empty
+        // push to **back**, becoming **new min**
+        let new_min = self.peek().map(|o| &item <= o).unwrap_or(true);
+
+        self.roots.push(Tree::new(item));
+
+        if !new_min {
+            // not a new min, so swap the last 2 elements
+            let i = self.roots.len() - 1;
+            self.roots.swap(i - 1, i);
         }

         self.len += 1;
     }

     pub fn peek(&self) -> Option<&T> {
-        self.roots.front().map(Tree::root)
+        self.roots.last().map(Tree::root)
     }

     pub fn pop(&mut self) -> Option<T> {
-        // take the front of the roots, since this is the _minimum_ value
-        let Tree { node, children } = match self.roots.pop_front() {
+        // take the last of the roots, since this is the _minimum_ value
+        let Tree { node, children } = match self.roots.pop() {
             Some(x) => x,
             None => return None,
         };

         // ...

         // find the minimum root value
-        bring_min_to_front(&mut self.roots);
+        order_min(&mut self.roots);

         Some(node)
     }
 }

-fn rebalance<T: Ord>(roots: &mut LinkedList<Tree<T>>, nodes: usize) {
+fn rebalance<T: Ord>(roots: &mut Vec<Tree<T>>, nodes: usize) {
     if roots.is_empty() {
         return;
     }

     // NOTE: this will panic if nodes == 0
     let cap = nodes.ilog2() + 1;

     // initialise temp array with log2 of length
     let mut buf: Vec<Option<Tree<T>>> =
         std::iter::repeat_with(|| None).take(cap as usize).collect();

     // iterate through the roots
-    while let Some(mut tree) = roots.pop_front() {
+    while let Some(mut tree) = roots.pop() {
        // ...
     }

     // place the roots back into the linked list
     roots.extend(buf.into_iter().filter_map(|x| x));
 }

-fn bring_min_to_front<T: Ord>(roots: &mut LinkedList<Tree<T>>) {
+fn order_min<T: Ord>(roots: &mut [Tree<T>]) {
     let min_index = roots
         .iter()
         .enumerate()
         .min_by_key(|(_, t)| t.root())
         .map(|(idx, _)| idx);

     if let Some(idx) = min_index {
-        // split off returns a LinkedList with `idx` at the front (our minimum)
-        let mut split = roots.split_off(idx);
-        // append the remaining roots to the **end** of the split
-        split.append(roots);
-        // set the roots reference to be the new list
-        *roots = split;
+        let lastidx = roots.len() - 1; // len >= 1
+        roots.swap(idx, lastidx); // min at end
     }
 }

I think I'll leave it there for this post. If you have enjoyed reading and want to tinker with the code, I have pushed the repository to Github. The best place to focus on performance is to dream up a smarter Tree structure. The final performance numbers are a bit short of BinaryHeap, it goes to show the engineering smarts that are in the standard library!

Push performance

Benchmark	Time
`v2::FibonacciHeap::push one-el n100`	597.53 _ns_
`v2::FibonacciHeap::push one-el n10_000`	59.385 _µs_
`v2::FibonacciHeap::push one-el n10_000_000`	736.70 _µs_
`v2::FibonacciHeap::from_iter n10_000`	92.424 _µs_

Popping performance

Benchmark	Time
`v2::FibonacciHeap drain 1000`	381.99 _µs_
`v2::FibonacciHeap drain 100_000`	70.851 _ms_
`v2::FibonacciHeap randomops 10_000`	692.16 _µs_

Kurt Lawrence

Building a Fibonacci Heap

Some terminology

Attempt Version #1

Pushing operation

Popping operation

Performance

Peek performance

Push performance

Popping performance

Can we improve? Version 2

Push performance

Popping performance

Detailed web-based 3D rendering of mining spatial data

Profr and Crowd Tic-Tac-Toe

Kurt Lawrence - Mining & Software Consultant