Merkle Mountain Ranges

Merkle mountain ranges $^{[1]}$ are an improvement over conventional merkle trees for growing, potentially unbounded lists. Where conventional merkle tree constructions over growing lists prove very inefficient to compute, as all nodes in the tree must be recomputed. Merkle mountain ranges amortise this cost by growing subtrees incrementally and merging subtrees at the same height, rather than growing the full tree.

MMR tree — You can observe how the tree grows by looking at the order of the nodes.

Properties

Merkle mountain ranges, true to it’s name, are a type of merkle tree that is composed of perfectly balanced merkle trees (ie each sub tree has leaves = a power of 2) such that it does indeed look like a mountain range. This construction provides a few benefits:

Smaller proof sizes when compared to conventional merkle trees for very large sets of data (especially for more recent leaves in the tree).
Better insertion complexity.

First we define the termial function $n?$ $^{[2]}$ as the sum:

\begin{split} n? &= 1 + 2 + 3 \cdots + n\ \\ &= \sum_{k=1}^n k \\ &=\frac{n(n+1)}{2} \, \end{split}

Let’s recall the properties of merkle trees:

$height = log_2(n_{leaves})$

$n_{leaves} = 2^{height}$

For merkle trees, the number of proof nodes for a single item proof is defined as $log_2(n_{leaves})$ . In order to understand the improvements that mmr’s bring to the table, lets consider the number of leaves present in both trees to get a maximum proof size of $64$ nodes. Note that the number of proof items needed for an item in a merkle tree = height of the merkle tree.

Since an mmr is itself composed of perfectly balanced binary trees with decreasing heights. The total number of leaves in an mmr can be expressed as $n?$ , where $n$ is the total number of leaves in the first subtree. We can therefore derive the maximum number of leaves in an mmr where the height of the first subtree is $64$ using the function:

$\displaystyle{ n? =\frac{2^{h}(2^{h}+1)}{2} \, }$

$\displaystyle{ 1.7 * 10^{38} = \frac{2^{64}(2^{64}+1)}{2} \,}$

Whereas for a conventional merkle tree, whose height is $64$ the total number of leaves is:

$n = 2^h$

$1.8 * 10^{19} = 2^{64}$

So now, we see how mmrs enable more efficient for merkle proof sizes on much larger data sets than conventional merkle trees.

The total number of subtrees in a merkle mountian range tree with $n$ leaves is given by the algorithm:

\begin{algorithm}
\caption{Subtree Heights}
\begin{algorithmic}
\INPUT The total number of leaves in the mmr
\OUTPUT An array of the heights of the subtrees in the mmr
\PROCEDURE{SubtreeHeights}{$leafCount$}
\State $\text{indices} \gets \text{[]}$
\State $i \gets 0$
\State $\text{current} \gets \text{leafCount}$
\While{$\text{current} > 0$}
\State $\text{height} \gets \lfloor \log_2(\text{current}) \rfloor$
\State $\text{indices}[i] \gets \text{height}$
\State $\text{current} \gets \text{current} - 2^{\text{height}}$
\State $i \gets i + 1$
\EndWhile
\State \Return $\text{indices}$
\ENDPROCEDURE
\end{algorithmic}
\end{algorithm}

Multi Proofs

Our approach to verifying mmr multi proofs will be to regard each sub tree as an isolated merkle tree, using the $k$ -index model defined in the previous section. The $k$ -index of each node in an mmr, will be the distance from the left-most node in each subtree.

We can calculate the $k$ -index for any node in an mmr using the algorithm:

\begin{algorithm}
\caption{MMR Position to K-Index Conversion}
\begin{algorithmic}
\PROCEDURE{MMRPositionToKIndex}{$leaves, mmr\_size$}
\State $peaks \gets \text{GetPeaks}(mmr\_size)$
\State $leaves\_with\_k\_indices \gets \text{empty list}$

\ForAll{$peak \in peaks$}
\State $leaves \gets \text{take\_while}(leaves, \lambda pos: pos \leq peak)$

\If{$\text{length}(leaves) > 0$}
\ForAll{$pos \in leaves$}
\State $height \gets \text{PosHeightInTree}(peak)$
\State $index \gets 0$
\State $parent\_pos \gets peak$

\For{$h = height$ \textbf{downto} $1$}
\State $left\_child \gets parent\_pos - \text{ParentOffset}(h - 1)$
\State $right\_child \gets left\_child + \text{SiblingOffset}(h - 1)$
\State $index \gets index \times 2$

\If{$left\_child \geq pos$}
\State $parent\_pos \gets left\_child$
\Else
\State $parent\_pos \gets right\_child$
\State $index \gets index + 1$
\EndIf
\EndFor

\State $\text{Append}(leaves\_with\_k\_indices, (pos, index))$
\EndFor
\EndIf
\EndFor

\State \Return $leaves\_with\_k\_indices$
\ENDPROCEDURE
\end{algorithmic}
\end{algorithm}

Given this model, we can re-use the CalculateMerkleRoot function defined in the previous section to verify mmr multi proofs using the algorithm:

\begin{algorithm}
\caption{Calculate the Root of an MMR Tree}
\begin{algorithmic}
\INPUT The known leaves, the proof nodes and the leaf count
\OUTPUT The calculated root of the mmr tree
\PROCEDURE{CalculateRoot}{$leaves, proof, leaf\_count$}
\State $\text{subtrees} \gets \text{subtrees}(\text{leaf\_count})$
\State $\text{peak\_roots} \gets []$
\State $\text{current\_subtree} \gets 0$
\For{$\text{subtree} \in \text{subtrees}$}
\State $\text{height} \gets 2^{\text{subtree}}$
\State $\text{current\_subtree} \gets \text{current\_subtree} + \text{height}$
\State $\text{leaves} \gets \text{take\_while}(\text{leaves}, \lambda (i, \_, \_) . i \leq \text{current\_subtree})$
\If{$\text{length}(\text{leaves}) = 1 \land \text{leaves}[0].0 = \text{peak}$}
\State $\text{peak\_roots} \gets \text{append}(\text{peak\_roots}, \text{leaves}[0].2)$
\State $\text{leaves} \gets \text{pop}(\text{leaves})$
\ElsIf{$\text{length}(\text{leaves}) = 0$}
\If{$\text{length}(\text{proof}) > 0$}
\State $\text{peak\_roots} \gets \text{append}(\text{peak\_roots}, \text{pop}(\text{proof}))$
\Else
\State \textbf{break}
\EndIf
\Else
\State $\text{leaves} \gets \text{map}(\text{leaves}, \lambda (\_, j, l) . (j, l))$
\State $\text{current\_layer} \gets \text{map}(\text{leaves}, \lambda (j, \_) . j)$
\State $\text{sparse\_sub\_tree} \gets []$
\For{$i = 0 \to \text{height}-1$}
\State $\text{siblings} \gets \text{sibling\_indices}(\text{current\_layer})$
\State $\text{diff} \gets \text{difference}(\text{siblings}, \text{current\_layer})$
\If{$\text{length}(\text{diff}) = 0$}
\State $\text{sparse\_sub\_tree} \gets \text{extend}(\text{sparse\_sub\_tree}, [[], \ldots, []] \text{ with } (\text{height}-i) \text{ elements})$
\State \textbf{break}
\EndIf
\State $\text{layer} \gets \text{zip}(\text{diff}, \text{take}(\text{proof}, \text{length}(\text{diff})))$
\State $\text{sparse\_sub\_tree} \gets \text{append}(\text{sparse\_sub\_tree}, \text{layer})$
\State $\text{current\_layer} \gets \text{parent\_indices}(\text{siblings})$
\EndFor
\State $\text{sparse\_sub\_tree}[0] \gets \text{extend}(\text{sparse\_sub\_tree}[0], \text{leaves})$
\State $\text{sparse\_sub\_tree}[0] \gets \text{sort}(\text{sparse\_sub\_tree}[0], \lambda (a, b) . a.0 - b.0)$
\State $\text{peak\_root} \gets \text{CalculateMerkleRoot}(\text{sparse\_sub\_tree})$
\State $\text{peak\_roots} \gets \text{append}(\text{peak\_roots}, \text{peak\_root})$
\EndIf
\EndFor
\While{$\text{length}(\text{peak\_roots}) > 1$}
\State $\text{right\_peak} \gets \text{pop}(\text{peak\_roots})$
\State $\text{left\_peak} \gets \text{pop}(\text{peak\_roots})$
\State $\text{peak\_roots} \gets \text{append}(\text{peak\_roots}, \text{keccak256}([\text{right\_peak}, \text{left\_peak}]))$
\EndWhile
\State \Return $\text{pop}(\text{peak\_roots})$
\ENDPROCEDURE
\end{algorithmic}
\end{algorithm}

Implementations

You can find the implementations of merkle mountain ranges in:

polytope-labs/solidity-merkle-trees
pallet-mmr in polytope-labs/hyperbridge

References

$^{[1]}$ P. Todd. Merkle Mountain Ranges. GitHub, 2012. https://opentimestamps.org.

$^{[2]}$ Donald Knuth in The Art of Computer Programming.

(Third edition, Volume 1, page 48).