Researching Sovrin Ledger 2.0


#1

The team at Evernym working on Indy Node have identified a few deficiencies with the software currently running the Sovrin Network. These problems would take enough effort to address that we want to research taking a fundamentally different approach. You can see our preliminary research here:

https://docs.google.com/presentation/d/e/2PACX-1vQetphlWf2TEb-zarMnJFbvkK4r2USIgAYOyeKAMzCL5EU1nUMdZB4q8BSY6bJGpYi8D5TIB2ezEOHo/pub?start=false&loop=false&delayms=60000

Our team briefly reviews that information here:

We want to work on a proposal together that focuses on the Sovrin use case which we can then take to the Indy community.

Here is an Epic in Indy Jira to track this work: https://jira.hyperledger.org/browse/INDY-1691

Feel free to create new tasks there and put the results of research into existing tasks.

We look forward to discussing this topic in this thread.


#2

Research on Exonum: It is like Tendermint’s protocol but with a few tweaks, all written in Rust. It initially looked great, with good documentation and runs well on my computer, but I learned a few unfortunate things when looking through their chat:

I came across a comment on their chat (on Gitter, similar to Rocket.Chat) saying that a reasonable upper bound “for the number of validators would be around 8-15. 16 nodes is maximum for anchoring service”.

And also this: “We’re aiming at private networks, so more than 16 validators will not bring much benefits, but will drastically decrease network performance.” I’m under the impression that we’re hoping to have 25-30+ validator nodes, so that might be an issue. Of course, @alexander.shcherbakov pointed out to me earlier that “this is rather a question whether PBFT-like protocols (in general) are suitable for Sovrin if Sovrin wants to have 25+ Nodes.”

Another comment from their chat: “Additionally, you cannot add new validators easily hence public blockchain is nearly impossible.” So Exonum might not be the best option.

Alexander Scherbakov: There is no actualy limitations for the number of nodes, this is just a question of performance (throughput VS latency).
Another question is whether Exonum is well-tested on 25 node pools, because our experience showed that behavior in case of 25 nodes and in case of 9 nodes is very different."
Private blockchain is a goal for almost all existing frameworks (Tendermint may be the only exception), so this can also be not a real limitation.
But we need to check what features a framework has to support a public one (for example, some analogue of Observer nodes out of the box).
The architecture (client-to-node communication in particular) and scalability is important here. Also the question of deployment is very important.
So, I still think that Exonum is a good candidate and worth further researches.

Sergey Khoroshavin:
Also some notes on Exonum:

  • number of validator nodes - yes, documentation states that number of validators should normally be in 8-15 range (https://exonum.com/doc/advanced/network/), however it’s not clear for which case this assumption is made. Exonum uses tendermint protocol under the hood, and there was a thesis from one of major tendermint contributors which included test results of as many as 64 nodes. Probably developers had some specific use case in mind with some performance requirements which prohibited use of more than 15 nodes. Sovrin most likely have very different performance requirements, so number of nodes will be different. I don’t see any hard cap here.
  • concerning “you cannot add new validators easily” - in fact there is whole chapter in documentation on how to change pool configuration including validator set (https://exonum.com/doc/advanced/configuration-updater/), and there is corresponding source code in their repo (https://github.com/exonum/exonum/tree/master/services/configuration)

And one more general note on performance of PBFT-based algorithms (including Tendermint) - they all require O(N^2) network messages for each round (and RBFT requires O(N^3)), so performance of naive implementation will decrease very fast with increasing number of nodes. However all major implementations (Plenum, Exonum, Tendermint) have some form of batching, so during each round they can process a lot of messages and attain very high throughput. Making batches very large can increase latency, but probably this is not a real problem for Sovrin use cases.


#3

I’d like to bring some insights here.

DDoS Resilience

There was a lot of discussion around resiliency to DDoS and it was mentioned that protocols that run multiple instances are more resilient. I agree with this, however running multiple instances (like in RBFT) is not the only solution and actually can be quite costly. There are other options that can be applied to almost any consensus protocol to shield it from unwanted traffic:

  • separate node-to-node and client-to-node network stacks and use whitelists for node-to-node communication - in this case attacks from clients won’t affect consensus and only other validator nodes could perform such attacks, which is much less probable
  • hide validator nodes behind gatekeepers, make gatekeepers filter unwanted traffic and spawn more gatekeepers if load increases

That said, I think choice of consensus protocol shouldn’t be affected by this concern as much.

Async protocols

While it was agreed that in the long run fully asynchronous protocols are more robust (in terms of resilience against both internal and extrenal threats), however there was a concern that these protocols are very recent development and are not widespread so they can have some yet unknown problems, so it was proposed to concentrate on recent improvements of leader-based protocols. I have several concerns here:

  • recent improvements to leader-based protocols are, well, also recent, so they also could contain some yet unknown problems
  • if we look at leader-based protocols we can see that PBFT was invented around 1999, then there were some improvements, with RBFT appearing in 2013. HoneyBadger was developed in 2016, so it may look relatively new, however it’s basically an improvement of SINTRA protocol developed in 2002, and there was also prior research dating back to 1980s, so in reality age difference is not so big
  • all leader-based protocols are vulnerable in one way or another to timing attacks from malicious leaders and/or malicious network schedulers (and in my opinion DDoS can be seen as a form of malicious network scheduling), so no matter how smart we try to become as long as there is some timeout in system it can be potentially exploited

So, one of main questions is whether Sovrin Foundation is okay with the fact that weak synchrony of leader-based protocols can always be exploited to halt processing? If yes then almost any PBFT-based protocol with periodic leader rotation can be picked (given it’s also not too new). If no - there is basically no other choice other than migrate to some async protocol. Probably this can generalized as a consequence of FLP theorem, which states that no deterministic fully async protocol which tolerate even single node failure is possible.

HoneyBadger async protocol and implementation

Regarding HoneyBadger (HBBFT) - while this protocol seem more complex than PBFT, it is very modular, so complexity is still manageable. Furthermore, this modularity helps a lot with further improvements and tuning for specific use cases. Also, quite a big portion of HBBFT consists of running parallel instances of reliable broadcast (RBC) subprotocol. RBC is very similar to PBFT in normal operation mode, so HBBFT is somewhat similar to RBFT (which boils down to running multiple redundant instances of PBFT), but with one important twist - different RBC instances are doing mostly non-redundant work. This makes HBBFT way more efficient than RBFT, and some experiments showed that it can be even more efficient than plain PBFT, especially when number of validator nodes is large (>16). Also it seems like HBBFT can be easily parallelized so modern multi-core CPUs can be fully utilized.

There are also some PoC implementations of HoneyBadger protocols, and most interesting one in my opinion is HBBFT library developed by POA Network. Some good properties:

  • library focused on consensus only
  • written in Rust
  • suitable for embedding into existing code
  • seems to follow TDD rigorously
  • has property-based randomized tests
  • has example node implementation (Hydrabadger) suitable for tests in real environments
  • strong contributors including one of original HoneyBadger authors and guy from Ripple research who developed Cobalt - protocol which borrowed a lot from HoneyBadger

This library is the only 3rd party BFT protocol implementation I’m currently aware of that’s suitable for embedding into Indy Node without rewrite of current business logic. Also it doesn’t affect storage so there’s no need in complex migration of current data. Of course more research and experiments are needed in order to understand whether its performance and stability is suitable for Indy and Sovrin.


#4

Have you considered the Avalanche protocol? The paper is here and a great write-up by Murat Demirbas of the implementation and pros and cons is here.

It’s based on a gossip protocol that quickly reaches consensus (2-3 seconds).


#5

I’d highly recommend looking at the Dfinity consensus protocol. Especially seeing Jan Camenisch (CL sigs) is their main crypto guy!



#6

Hi @esplinr,
Do we have clear requirements when it comes to the number of nodes and stewards (with write access to ledge)? Can we still keep on assumption of high availability machines required from the stewards today ? What is our target for the number of transactions per secondes and latency of operations ?
Thanks
Herve


#7

The requirements for performance of the current ledger are documented here:
https://jira.hyperledger.org/browse/INDY-1607

We haven’t yet defined the requirements for a “Sovrin Ledger 2.0”, except to say that it should scale to a billion users. We will need to decide as a group what those requirements should be.


#8

To my knowledge, the Evernym team has not looked into the Avalanche protocol very much. Thanks for sharing the information.


#9

It’s great to see people showing interest in this thread.

I want to get us together to discuss how we can make progress on this effort within this calendar year.

It is hard to have a conversation with a global team, but during a week in March we are all an hour closer together than usual. I propose that we meet on Monday, March 11 at 18:00 UTC. That is an early meeting in New Zealand and a late meeting in Moscow, but it shouldn’t prevent anyone from sleeping.

I sent a calendar invitation to the people in this thread. If anyone else is interested in attending, please message me directly.


#10

Please include me in the invite Richard, thank you.

How do you differentiate the “Sovrin Ledger” from the “Indy Ledger”?

Best regards,
Michael


#11

Hello, Richard. Count me in please.


#12

I’m trying to keep the conversation focused on the use cases of the Sovrin network. That is likely to drive changes to the roadmap in Hyperledger Indy, but we don’t need to make that a requirement. Though the Sovrin network currently uses Indy as its ledger, it is possible that the goals of the Sovrin network and the Indy community diverge at some point.


#13

Thank you for everyone’s interest in the future of the Sovrin network! We are looking forward to discussing the future of the Sovrin ledger with you on Monday (Tuesday in New Zealand).

I think I got a calendar invitation sent to each person who expressed interest. In case I missed someone, here is the connection information:

Subject: Coordinate effort for Sovrin Ledger 2.0
Time: Mon Mar 11, 2019 12:00 - 12:50 (MDT)
Corresponding UTC (GMT): Monday, March 11, 2019 at 18:00:00
Location: https://zoom.us/j/715671233
Start time: Monday 12:00 PM
End time: Monday 12:50 PM

Purpose:
An opportunity to people interested in contributing to the architecture of the next generation of the Sovrin Ledger to share plans. Our main goal is to understand what commitments organizations can make to this project so that we can make decisions about next steps and timelines.

Agenda:

  • Make sure everyone knows each other.
  • Get a sense of when organizations can make investments, so that we can align our budgets.
  • Discuss useful areas of research and prototyping that will help us lay the groundwork for our collaboration together.

Join from PC, Mac, Linux, iOS or Android: https://zoom.us/j/715671233
Or iPhone one-tap :
US: +16699006833,715671233# or +14086380986,715671233#
Or Telephone:
Dial(for higher quality, dial a number based on your current location):
US: +1 669 900 6833 or +1 408 638 0986 or +1 646 558 8656 or +1 646 558 8665
Russia: +7 495 283 9788
United Kingdom: +44 (0) 20 3695 0088
Meeting ID: 715 671 233
International numbers available: https://zoom.us/u/WwMumszR

Have a good weekend.


#14

Thank you Richard, I understand the differentiation. From a scope perspective (at a conceptual level (i.e. not Indy specific), what elements in the INDY ARM Ledger viewpoint might be considered potentially in-scope for Sovrin Ledger 2.0 …at least from a vision perspective …subject to available resources? For example, …
35. Ledger Node - design and implementation
39. Ledger Node to Node communication - protocols, etc. (e.g. replica management)
17. Ledger data architecture - logical and physical
41. DID Resolver implications

…or, potentially, all of the above?

Does Sovrin Ledger 2.0 have a written description somewhere?


#15

The coordination meeting on Monday was really valuable. Thank you to everyone who attended!

I took away from the call the following decisions:

  • We agreed to try and have a team assembled by October (Q4), and four organizations said they believe they can receive approval for a budget request. We need to recruit a few more participants to really make progress.
  • We need to do some work to clarify the requirements and the expected effort required so that we can make appropriate budget requests.
  • We will continue our collaboration within the Indy community.
  • We will collaborate using #indy-ledger-next at http://chat.hyperledger.org
  • In order for asynchronous collaboration, instead of a regular meeting I will post a request for updates in #indy-ledger-next at the start of each month.
  • We will have a follow up call on October 28, 2019. That call will be listed on the Hyperledger Community Calendar.

I look forward to collaborating with everyone on the Hyperledger chat.