Researching Sovrin Ledger 2.0


The team at Evernym working on Indy Node have identified a few deficiencies with the software currently running the Sovrin Network. These problems would take enough effort to address that we want to research taking a fundamentally different approach. You can see our preliminary research here:

Our team briefly reviews that information here:

We want to work on a proposal together that focuses on the Sovrin use case which we can then take to the Indy community.

Here is an Epic in Indy Jira to track this work:

Feel free to create new tasks there and put the results of research into existing tasks.

We look forward to discussing this topic in this thread.


Research on Exonum: It is like Tendermint’s protocol but with a few tweaks, all written in Rust. It initially looked great, with good documentation and runs well on my computer, but I learned a few unfortunate things when looking through their chat:

I came across a comment on their chat (on Gitter, similar to Rocket.Chat) saying that a reasonable upper bound “for the number of validators would be around 8-15. 16 nodes is maximum for anchoring service”.

And also this: “We’re aiming at private networks, so more than 16 validators will not bring much benefits, but will drastically decrease network performance.” I’m under the impression that we’re hoping to have 25-30+ validator nodes, so that might be an issue. Of course, @alexander.shcherbakov pointed out to me earlier that “this is rather a question whether PBFT-like protocols (in general) are suitable for Sovrin if Sovrin wants to have 25+ Nodes.”

Another comment from their chat: “Additionally, you cannot add new validators easily hence public blockchain is nearly impossible.” So Exonum might not be the best option.

Alexander Scherbakov: There is no actualy limitations for the number of nodes, this is just a question of performance (throughput VS latency).
Another question is whether Exonum is well-tested on 25 node pools, because our experience showed that behavior in case of 25 nodes and in case of 9 nodes is very different."
Private blockchain is a goal for almost all existing frameworks (Tendermint may be the only exception), so this can also be not a real limitation.
But we need to check what features a framework has to support a public one (for example, some analogue of Observer nodes out of the box).
The architecture (client-to-node communication in particular) and scalability is important here. Also the question of deployment is very important.
So, I still think that Exonum is a good candidate and worth further researches.

Sergey Khoroshavin:
Also some notes on Exonum:

  • number of validator nodes - yes, documentation states that number of validators should normally be in 8-15 range (, however it’s not clear for which case this assumption is made. Exonum uses tendermint protocol under the hood, and there was a thesis from one of major tendermint contributors which included test results of as many as 64 nodes. Probably developers had some specific use case in mind with some performance requirements which prohibited use of more than 15 nodes. Sovrin most likely have very different performance requirements, so number of nodes will be different. I don’t see any hard cap here.
  • concerning “you cannot add new validators easily” - in fact there is whole chapter in documentation on how to change pool configuration including validator set (, and there is corresponding source code in their repo (

And one more general note on performance of PBFT-based algorithms (including Tendermint) - they all require O(N^2) network messages for each round (and RBFT requires O(N^3)), so performance of naive implementation will decrease very fast with increasing number of nodes. However all major implementations (Plenum, Exonum, Tendermint) have some form of batching, so during each round they can process a lot of messages and attain very high throughput. Making batches very large can increase latency, but probably this is not a real problem for Sovrin use cases.


I’d like to bring some insights here.

DDoS Resilience

There was a lot of discussion around resiliency to DDoS and it was mentioned that protocols that run multiple instances are more resilient. I agree with this, however running multiple instances (like in RBFT) is not the only solution and actually can be quite costly. There are other options that can be applied to almost any consensus protocol to shield it from unwanted traffic:

  • separate node-to-node and client-to-node network stacks and use whitelists for node-to-node communication - in this case attacks from clients won’t affect consensus and only other validator nodes could perform such attacks, which is much less probable
  • hide validator nodes behind gatekeepers, make gatekeepers filter unwanted traffic and spawn more gatekeepers if load increases

That said, I think choice of consensus protocol shouldn’t be affected by this concern as much.

Async protocols

While it was agreed that in the long run fully asynchronous protocols are more robust (in terms of resilience against both internal and extrenal threats), however there was a concern that these protocols are very recent development and are not widespread so they can have some yet unknown problems, so it was proposed to concentrate on recent improvements of leader-based protocols. I have several concerns here:

  • recent improvements to leader-based protocols are, well, also recent, so they also could contain some yet unknown problems
  • if we look at leader-based protocols we can see that PBFT was invented around 1999, then there were some improvements, with RBFT appearing in 2013. HoneyBadger was developed in 2016, so it may look relatively new, however it’s basically an improvement of SINTRA protocol developed in 2002, and there was also prior research dating back to 1980s, so in reality age difference is not so big
  • all leader-based protocols are vulnerable in one way or another to timing attacks from malicious leaders and/or malicious network schedulers (and in my opinion DDoS can be seen as a form of malicious network scheduling), so no matter how smart we try to become as long as there is some timeout in system it can be potentially exploited

So, one of main questions is whether Sovrin Foundation is okay with the fact that weak synchrony of leader-based protocols can always be exploited to halt processing? If yes then almost any PBFT-based protocol with periodic leader rotation can be picked (given it’s also not too new). If no - there is basically no other choice other than migrate to some async protocol. Probably this can generalized as a consequence of FLP theorem, which states that no deterministic fully async protocol which tolerate even single node failure is possible.

HoneyBadger async protocol and implementation

Regarding HoneyBadger (HBBFT) - while this protocol seem more complex than PBFT, it is very modular, so complexity is still manageable. Furthermore, this modularity helps a lot with further improvements and tuning for specific use cases. Also, quite a big portion of HBBFT consists of running parallel instances of reliable broadcast (RBC) subprotocol. RBC is very similar to PBFT in normal operation mode, so HBBFT is somewhat similar to RBFT (which boils down to running multiple redundant instances of PBFT), but with one important twist - different RBC instances are doing mostly non-redundant work. This makes HBBFT way more efficient than RBFT, and some experiments showed that it can be even more efficient than plain PBFT, especially when number of validator nodes is large (>16). Also it seems like HBBFT can be easily parallelized so modern multi-core CPUs can be fully utilized.

There are also some PoC implementations of HoneyBadger protocols, and most interesting one in my opinion is HBBFT library developed by POA Network. Some good properties:

  • library focused on consensus only
  • written in Rust
  • suitable for embedding into existing code
  • seems to follow TDD rigorously
  • has property-based randomized tests
  • has example node implementation (Hydrabadger) suitable for tests in real environments
  • strong contributors including one of original HoneyBadger authors and guy from Ripple research who developed Cobalt - protocol which borrowed a lot from HoneyBadger

This library is the only 3rd party BFT protocol implementation I’m currently aware of that’s suitable for embedding into Indy Node without rewrite of current business logic. Also it doesn’t affect storage so there’s no need in complex migration of current data. Of course more research and experiments are needed in order to understand whether its performance and stability is suitable for Indy and Sovrin.


Have you considered the Avalanche protocol? The paper is here and a great write-up by Murat Demirbas of the implementation and pros and cons is here.

It’s based on a gossip protocol that quickly reaches consensus (2-3 seconds).


I’d highly recommend looking at the Dfinity consensus protocol. Especially seeing Jan Camenisch (CL sigs) is their main crypto guy!


Hi @esplinr,
Do we have clear requirements when it comes to the number of nodes and stewards (with write access to ledge)? Can we still keep on assumption of high availability machines required from the stewards today ? What is our target for the number of transactions per secondes and latency of operations ?


The requirements for performance of the current ledger are documented here:

We haven’t yet defined the requirements for a “Sovrin Ledger 2.0”, except to say that it should scale to a billion users. We will need to decide as a group what those requirements should be.


To my knowledge, the Evernym team has not looked into the Avalanche protocol very much. Thanks for sharing the information.