Running sovrin on a cluster


#1

There are some instructions to run sovrin nodes on different machine on https://github.com/sovrin-foundation/sovrin/blob/master/setup.md but I’m stuck towards the end when I want to use the CLI. Here is what I’ve done:

  1. I created 4 virtual machines, each with a different IP, each with Ubuntu 16, OrientDB.
  2. Following the instructions on the above-mentioned page, I started the 4 notes (generate_sovrin_pool_transactions --nodes 4 --clients 5 --nodeNum <nnn> followed by start_sovrin_node Node1 <port1> <port2>) on each machine.
  3. I started a 5th virtual machine (yet another IP). On this one, I would like to use the sovrin CLI.

I cannot not find a way to tell the CLI to use the pool of machines created above. I thought that modifying lib/python3.5/site-packages/sovrin/config.py would help… but no luck.


#2

Has anyone tried that?


#3

Hi Fabien - thanks for the reminder - I’ve asked the dev team to answer.

Andy


#4

Assuming you are using the --ips flag, you have to use the same command as you ran on any of those 4 machines on the 5th machine too, the value of --nodeNum can be anything. Once done you should be able to use the CLI to connect to nodes. The purpose of generate_sovrin_pool_transactions is to generate the genesis transaction file that specifies where the nodes are located. I know using the --nodeNum on the 5th machine sounds silly and would be fixed the next release.


#5

Thanks @lovesh. @fabienpe let us know if that does the trick.


#6

Thanks @lovesh. This works fine.

Note that the --nodeNum of the 5th node cannot be anything otherwise you get an assertion failure (nodeNum <= nodeCount) so I used 0.


#7

Right, nodeNum had to be <=5 (this has been resolved though where you can omit it, it will soon be in the release). From the logs it looks like it got connected to Node4 since 9vj4toFMbvuzGcDLHJoiRNc7P6VN3SmKPjfb7vuofkdF now connected to Node4C. When you ran the nodes did you see some traffic between them? I am trying to see if the other nodes allow UDP communiction.

Update: Can it now connect to other nodes, was it a delay issue where you had to wait for a minute or so to get connected?


#8

On the CLI, when I type status it tells me that I’m Connected to test Sovrin network.

My next issue now is when I accept the Faber invitation. It tells me that it cannot find the remote endpoint (see output below).

Sovrin-CLI (c) 2016 Evernym, Inc.
Node registry loaded.
Node1: 10.132.3.7:9701
Node2: 10.132.3.8:9703
Node3: 10.132.3.6:9705
Node4: 10.132.3.9:9707
Type 'help' for more information.
sovrin> new key with seed 000000000000000000000000Steward1
New keyring Default created
Active keyring set to "Default"
Key created in keyring Default
Identifier for key is FYmoFw55GeQH7SRFa37dkx1d2dZ3zUF8ckg7wmL7ofN4
Current identifier set to FYmoFw55GeQH7SRFa37dkx1d2dZ3zUF8ckg7wmL7ofN4
sovrin> connect test
Client sovrinQ0i1Xh initialized with the following node registry:
Node1C listens at 10.132.3.7 on port 9702
Node2C listens at 10.132.3.8 on port 9704
Node3C listens at 10.132.3.6 on port 9706
Node4C listens at 10.132.3.9 on port 9708
Active client set to sovrinQ0i1Xh
A2Dkd9ikeJknyEDyuqrwYMiZ3VpBSwPbunH2HPeBcZ2K listening for other nodes at 0.0.0.0:8310
A2Dkd9ikeJknyEDyuqrwYMiZ3VpBSwPbunH2HPeBcZ2K looking for Node2C at 10.132.3.8:9704
A2Dkd9ikeJknyEDyuqrwYMiZ3VpBSwPbunH2HPeBcZ2K looking for Node4C at 10.132.3.9:9708
A2Dkd9ikeJknyEDyuqrwYMiZ3VpBSwPbunH2HPeBcZ2K looking for Node1C at 10.132.3.7:9702
A2Dkd9ikeJknyEDyuqrwYMiZ3VpBSwPbunH2HPeBcZ2K looking for Node3C at 10.132.3.6:9706
Connecting to test...
A2Dkd9ikeJknyEDyuqrwYMiZ3VpBSwPbunH2HPeBcZ2K now connected to Node4C
A2Dkd9ikeJknyEDyuqrwYMiZ3VpBSwPbunH2HPeBcZ2K now connected to Node3C
A2Dkd9ikeJknyEDyuqrwYMiZ3VpBSwPbunH2HPeBcZ2K now connected to Node2C
A2Dkd9ikeJknyEDyuqrwYMiZ3VpBSwPbunH2HPeBcZ2K now connected to Node1C
Connected to test.
sovrin@test> send NYM dest=FuN98eH2eZybECWkofW6A9BKJxxnTatBCopfUiNxo6ZB role=SPONSOR
Adding nym FuN98eH2eZybECWkofW6A9BKJxxnTatBCopfUiNxo6ZB
sovrin@test> send ATTRIB dest=FuN98eH2eZybECWkofW6A9BKJxxnTatBCopfUiNxo6ZB raw={"endpoint": "127.0.0.1:5555"} At that point I started, on the same machine the Faber endpoint script.
Adding attributes {"endpoint": "127.0.0.1:5555"} for FuN98eH2eZybECWkofW6A9BKJxxnTatBCopfUiNxo6ZB
sovrin@test> load sample/faber-invitation.sovrin
sovrinQ0i1Xh is already started, so start has no effect
1 link invitation found for Faber College.
Creating Link for Faber College.
Generating Identifier and Signing key.

Try Next:
show link "Faber College"
accept invitation from "Faber College"

sovrin@test> accept invitation from "Faber College"
Invitation not yet verified.
Link not yet synchronized.
Attempting to sync...

Synchronizing...
Link Faber College synced
Remote endpoint not found, can not connect to Faber College


#9

I see Adding attributes {"endpoint": ..., but you should have seen something like Added.... That confirms that the attribute was written to the ledger. Maybe wait for a few more seconds to see the confirmation. I think there is some latency between your machines


#10

It’s been several minutes since the send ATTRIB message was sent. Is there a way on the node side to disable the verbose output and only get warnings or errors ? I see tons of Debug information and it’s not easy to spot what could go wrong there.


#11

You should be seeing some COMMIT messages recently. Also in the directory where you run the CLI, there would be a file called cli.log, you can check whether it contains any REPLY message for this ATTRIB message


#12

Here is the cli log fie converted as PDF as other formats are not allowed here. In any case I have not seen a COMMIT message and, as said, I’m not sure what to look for in the output of of the nodes processes or of the Faber process.

cli.pdf (209.0 KB)


#14

It’s the full log for an entire session from starting the sovrin CLI to accepting twice the Faber invitation.


#15

Ok, i see from the log that NYM request did not get REPLY, so ATTRIB request was rejected since the request was asking to add an attribute to the ledger for a NYM that was not present, so a REQNACK was sent. I am looking at the log more to see if i can see more


#16

@fabienpe Do you see some exceptions on any node, and you said you dont see any COMMIT messages on nodes, do you see any PROPAGATE or PRE-PREPARE or PREPARE messages on the nodes?


#17

Hi @lovesh. I solved the the problem by resetting the Sovrin environment on all nodes including the machine running the CLI. Thanks for following up on that issue.