I’d like to bring some insights here.
DDoS Resilience
There was a lot of discussion around resiliency to DDoS and it was mentioned that protocols that run multiple instances are more resilient. I agree with this, however running multiple instances (like in RBFT) is not the only solution and actually can be quite costly. There are other options that can be applied to almost any consensus protocol to shield it from unwanted traffic:
- separate node-to-node and client-to-node network stacks and use whitelists for node-to-node communication - in this case attacks from clients won’t affect consensus and only other validator nodes could perform such attacks, which is much less probable
- hide validator nodes behind gatekeepers, make gatekeepers filter unwanted traffic and spawn more gatekeepers if load increases
That said, I think choice of consensus protocol shouldn’t be affected by this concern as much.
Async protocols
While it was agreed that in the long run fully asynchronous protocols are more robust (in terms of resilience against both internal and extrenal threats), however there was a concern that these protocols are very recent development and are not widespread so they can have some yet unknown problems, so it was proposed to concentrate on recent improvements of leader-based protocols. I have several concerns here:
- recent improvements to leader-based protocols are, well, also recent, so they also could contain some yet unknown problems
- if we look at leader-based protocols we can see that PBFT was invented around 1999, then there were some improvements, with RBFT appearing in 2013. HoneyBadger was developed in 2016, so it may look relatively new, however it’s basically an improvement of SINTRA protocol developed in 2002, and there was also prior research dating back to 1980s, so in reality age difference is not so big
- all leader-based protocols are vulnerable in one way or another to timing attacks from malicious leaders and/or malicious network schedulers (and in my opinion DDoS can be seen as a form of malicious network scheduling), so no matter how smart we try to become as long as there is some timeout in system it can be potentially exploited
So, one of main questions is whether Sovrin Foundation is okay with the fact that weak synchrony of leader-based protocols can always be exploited to halt processing? If yes then almost any PBFT-based protocol with periodic leader rotation can be picked (given it’s also not too new). If no - there is basically no other choice other than migrate to some async protocol. Probably this can generalized as a consequence of FLP theorem, which states that no deterministic fully async protocol which tolerate even single node failure is possible.
HoneyBadger async protocol and implementation
Regarding HoneyBadger (HBBFT) - while this protocol seem more complex than PBFT, it is very modular, so complexity is still manageable. Furthermore, this modularity helps a lot with further improvements and tuning for specific use cases. Also, quite a big portion of HBBFT consists of running parallel instances of reliable broadcast (RBC) subprotocol. RBC is very similar to PBFT in normal operation mode, so HBBFT is somewhat similar to RBFT (which boils down to running multiple redundant instances of PBFT), but with one important twist - different RBC instances are doing mostly non-redundant work. This makes HBBFT way more efficient than RBFT, and some experiments showed that it can be even more efficient than plain PBFT, especially when number of validator nodes is large (>16). Also it seems like HBBFT can be easily parallelized so modern multi-core CPUs can be fully utilized.
There are also some PoC implementations of HoneyBadger protocols, and most interesting one in my opinion is HBBFT library developed by POA Network. Some good properties:
- library focused on consensus only
- written in Rust
- suitable for embedding into existing code
- seems to follow TDD rigorously
- has property-based randomized tests
- has example node implementation (Hydrabadger) suitable for tests in real environments
- strong contributors including one of original HoneyBadger authors and guy from Ripple research who developed Cobalt - protocol which borrowed a lot from HoneyBadger
This library is the only 3rd party BFT protocol implementation I’m currently aware of that’s suitable for embedding into Indy Node without rewrite of current business logic. Also it doesn’t affect storage so there’s no need in complex migration of current data. Of course more research and experiments are needed in order to understand whether its performance and stability is suitable for Indy and Sovrin.