The Authenticity of a DID and Sovrin's Dependence on the PKI


#1

I have read most of Sovrin’s white papers, and scrutinised Sovrin’s codebase on GitHub and the forum if things were not clear. I have some questions, however, that remain unanswered (see below).

Alice has a Sovrin client. Her client contains a wallet, which contains, among other things, her master secret. She embeds her master secret in credential requests, and uses it to proof ownership of the credentials (i.e., claims) she receives from issuers. I’m leaving out a lot of details, but so far, no confusion.

Her wallet also contains a bunch of public-private key pairs, that she uses to communicate securely with other identity owners, such as Faber College. In particular, she uses her private key to sign the messages she sends to the College, which allows the College to make sure that the messages they receive from someone who pretends to be Alice, actually are from Alice, and that Eve did not tamper with them.

How did the College obtain Alice’s public key? Well, Alice send it to the College, along with her DID. Alice visited the College’s website, logged on (or authenticated herself to the College somehow), downloaded a connection request, generated a public-private key pair, derived a DID from her public key (i.e., she took the first 16 bytes), and responded to the connection request by sending the College, among other things, her public key and DID. Eve couldn’t tamper with the connection request, nor with Alice’s response to it, because Alice was talking to the College over a secure channel. What secured the channel? The public key infrastructure (PKI). As soon as the College received Alice’s public key and DID, the College, being a trust anchor, wrote both to the ledger.

In other words, without the PKI, neither the authenticity of Alice’s DID, nor of her public key, can be guaranteed.

What about the College’s public key? How does Alice obtain it? Alice’s client fetches it from the ledger, because it knows the College’s DID. How does it know the College’s DID? It was part of the connection request she downloaded securely, by means of the PKI.

If I am correct, then

  1. why do Alice and the College need a DID that maps to a public key, in a ledger? What problem does it solve? If, after all, the PKI remains necessary in order to reliable share a DID? And

  2. Alice cannot dispose of her usernames and passwords, because she needs them to authenticate to issuers, such as the College. In Sovrin’s latest white paper, however, is written that (p. 26): “Sovrin goes one step further: it not only eliminates usernames and passwords in favor of cryptographic authentication, but it adds the ability to exchange verifiable digital credentials for stronger, more flexible, and more resilient identity verification and access control.”

Most of the information above comes from the getting-started.md on GitHub (see https://github.com/hyperledger/indy-node/blob/master/getting-started.md) and from the “First-Time Provisioning” section of the white paper “The Technical Foundations of Sovrin” (p. 14).


#2

Excellent, thoughtful questions!

Today, a Sovrin connection is initiated over non-Sovrin channels. This is necessary, since the world isn’t Sovrin-based yet. You are quite correct that non-Sovrin channels include usernames and passwords, and that they must have some basis for trust. That basis could be PKI, but it could be other things as well. For example, two parties could connect face-to-face, at a conference, by using bluetooth on their phones, with no PKI involved.

In a future world where Sovrin becomes far more common, the dynamic will change. A new connection can be bootstrapped off a mutual introduction by a mutually-connected, Sovrin-based party. Or a connection can be sent using indy-sdk’s anon_crypt() primitive, where one party contacts the other while still unknown. Such workflows will not use traditional PKI at all.

An issuer that switches to a Sovrin-based authentication won’t need traditional login (every message will be authenticated as it arrives, instead of depending on a long-lived session that you activate with a login operation), and won’t need usernames and passwords at all; every action taken by a remote party will be strongly associated with its originating identity because of the encryption at play. This switch away from traditional login is more likely for Sovrin issuers than for random entities in the digital landscape, because issuers will already have made the jump to other aspects of Sovrin usage.

If some of what I’m alluding to here is unfamiliar, it may be helpful to listen to this presentation about Sovrin agent-to-agent communication concepts: https://drive.google.com/file/d/1PHAy8dMefZG9JNg87Zi33SfKkZvUvXvx/view?usp=sharing


#3

Thank you, Daniel, for your prompt reply. I have watched the video, but some ambiguities remain.

For example, two parties could connect face-to-face, at a conference, by using bluetooth on their phones, with no PKI involved.

Clear. However, if two parties are somehow able to communicate in a secure manner, why would they exchange DIDs and not just public keys and endpoints? After all, isn’t the only purpose of the DID to point to public keys and possibly endpoints? Sovrin’s ledger secures this mapping, but as I do not understand why the mapping itself is necessary, I also do not understand why Sovrin needs a ledger. Moreover, requesting and proving ownership of credentials can be done without a ledger: Alice’s master secret is (mathematically) completely independent of the public-private key pairs that are also contained inside her wallet.

A new connection can be bootstrapped off a mutual introduction by a mutually-connected, Sovrin-based party.

Could you explain this in a bit more detail? Or does this refer to the video?

Or a connection can be sent using indy-sdk’s anon_crypt() primitive, where one party contacts the other while still unknown. Such workflows will not use traditional PKI at all.

I had a look at the anon_crypt() function, but one of the function’s arguments appears to be the public key of the receiver. Taken from the code (line 312 of indy-sdk/wrappers/python/indy/crypto.py):

Sealed boxes are designed to anonymously send messages to a Recipient given its public key.

But how is the Recipient’s public key obtained? In particular, after it has been obtained, how can its authenticity be guaranteed? The PKI was invented to solve this problem, but what’s Sovrin’s solution? (If it does not want to depend on the PKI, in the future.)

An issuer that switches to a Sovrin-based authentication won’t need traditional login (every message will be authenticated as it arrives, instead of depending on a long-lived session that you activate with a login operation), and won’t need usernames and passwords at all; every action taken by a remote party will be strongly associated with its originating identity because of the encryption at play. This switch away from traditional login is more likely for Sovrin issuers than for random entities in the digital landscape, because issuers will already have made the jump to other aspects of Sovrin usage.

Clear. But when you say “every message will be authenticated as it arrives” I suppose that you refer to digital signatures, contained inside these messages. Verifying a digital signature requires a public key, which brings me back to the question above.

To leave no doubt, my line of reasoning is as follows:

  1. Sovrin wants to have application-layer security, in order to securely exchange credential requests, credentials, proof requests, proofs, etc. A noble ambition.
  2. Application-layer security is achieved by means of public key cryptography: messages are signed or decrypted with private keys, and verified or encrypted with public keys.
  3. Public key cryptography introduces the problem of key management. In particular, how does Alice obtain Bob’s public key? The PKI solves this problem by means of digital certificates signed by certificate authorities.
  4. Sovrin wants to be independent of the PKI, at least in the future.
  5. Part of the key management problem, in Sovrin, is solved by its ledger: if Alice has Bob’s DID, the ledger will allow her to reliably obtain Bob’s public key. As Phil Windley wrote elsewhere on the forum:
  1. But how does Alice obtain Bob’s DID? Bob will need to tell her his public key over a secure channel (e.g., face to face).
  2. But wait! If Bob will need to establish a secure channel between himself and Alice, why would he send Alice his DID? And not just his public key instead? The ledger becomes redundant.

#4

Let me add a few points that I hope will be helpful.

First, the purpose of DIDs is to provide a persistent address for a public key and a service endpoint where you can interact with the identity owner. So DIDs solve a problem that public/private key pairs alone do not—persistence of an identifier and the ability to rotate keys and endpoints without breaking the relationship.

For more on DIDs, see the DID primer and the DID spec (the second generation of which is nearing completion at the W3C Credentials Community Group).

Secondly, the goal of Sovrin infrastructure (and indeed all of DID-based infrastructure) is not to replace PKI, but to migrate gradually from centralized PKI to decentralized PKI (DPKI). In DPKI, the role of CAs evolves into a more generalized, more contextual web of trust model where any DID owner can issue verifiable claims about the trustworthiness of any other DID owner.

We believe this DPKI model will be attractive to many of the same CAs that currently work in the conventional PKI model—indeed, Infocert, the EU’s largest CA, is one of the first Sovrin stewards, and is very active in the Sovrin Trust Framework Working Group.


#5

I’m going to answer individual questions in separate posts…

if two parties are somehow able to communicate in a secure manner, why would they exchange DIDs and not just public keys and endpoints?

They absolutely could just exchange keys and endpoints. As long as every time one party rotates their keys, they can security communicate that rotation to the other party, this can even go on indefinitely. This observation is the basis for a “microledger” feature that is being discussed in Sovrin working groups. (A microledger takes state off the main ledger into tiny files that capture a pairwise state machine, but still preserves merkle trees and state proofs such that tamper evidence and other guarantees are preserved.)

However, if one party rotates their keys and cannot notify the other, the ledger provides a reliable source of truth that can be used to recover the relationship. A DID provides a permanent identifier for a relationship, independent of its key state. You are not known by your key, but by the identifier that the key represents.

This subtlety has consequences for historical validation; knowing what a party’s current key is, today, does not help you if you are wanting to check a digital signature that they created a year ago. What you need to know, today, is what the other party’s key was when they signed. A ledger provides a reliable (tamper-proof) way to answer that question.


#6

Suppose Alice knows Bob, and Bob knows Carol. Bob can introduce Alice to Carol using only Sovrin agent-to-agent communication. Bob ends up not knowing the new DIDs and public keys that Alice and Carol are using in their relationship; Alice and Carol both know that Bob was the trusted intermediary, and that nobody tampered.


#7

I’m a bit puzzled here by the contrast between Sovrin and PKI–as if these two schools of thought are divergent. In its most abstract definition, Sovrin is PKI. It is a bit non-traditional, because it doesn’t use certificates, and it doesn’t depend on certificate authorities as a basis of trust. But it absolutely defines roles, policies, and procedures needed to create, manage, distribute, use, store, and revoke public keys. So Sovrin’s answers to many questions are going to be familiar to someone steeped in PKI.

One distinction is that Sovrin doesn’t expect people to use public keys in multiple contexts. Alice doesn’t have 1 public key that has to be discovered by other parties; she creates a new public key for every party that contacts her. When Bob connects with her, he doesn’t discover her DID or her public key; he sends her a connection request using a DID that he created just for the Alice~Bob relationship, and signed by a new public key that he created just for that purpose. His invitation communicates an endpoint where Alice can respond to him. Alice responds by creating a new DID and keypair for the Alice~Bob relationship, and sending it back to him. In traditional PKI, the expectation is that the keys would be known to belong to a certain individual with properties like an email address and name–but in Sovrin, what’s known is far less. Once this connection is established, Alice and Bob simply know that if they want to talk to a given DID, they are going to do it at a given endpoint, with a particular key. They then proceed to ratchet up trust by exchanging zero-knowledge proofs about their identity.

When Phil said that a DID could be looked up on the ledger, I think he was talking about widely used public DIDs (e.g., the DID for an organization that wants to be known to millions of customers). Such DIDs are useful, but they are not how an individual ends up relating to an org; instead, the individual looks up the public DID, sends a connection request to the entity behind that DID, and then the two of them negotiate separate DIDs and keys for their direct relationship.

This strong preference for pairwise DIDs and keys is driven by Sovrin’s stance on privacy; see especially requirement #1 of “Self-Sovereign Privacy By Design” (https://docs.google.com/document/d/1BbmYwvzAyYuhY148IBCSPuxQS0Z8M4bukbxhDzDSouM/edit#heading=h.wwwbyx17du5g)


#8

The thing I have been wondering is what about endpoints? Won’t they enable tracking in practice? Its not like you can arbitrarily choose any value you like for an endpoint. Most people will be constrained to at least use the same DNS name or IP address in the endpoint for all of their DIDs. Will everyone have to use onion routers all the time? What am I missing?


#9

First, endpoints will be the same for every agent hosted at a given agency (for example, all agent endpoints hosted at agency X might be “https://x.com/inbox”). This is approximately like saying that all customers of a given ISP have the same MX record–but it is better, because there is nothing observable to the outside world that tells how the message will be routed once it arrives at that endpoint. The message is anon_crypt() encrypted for the agency. The agency decrypts it and sees that it is to be routed to the agent for did X. That’s all the agency knows. It doesn’t know the sender, the message type, etc. It hands the message to the agent. The agent then does a further decryption to peer inside, etc.

Second, endpoints do not need to be recorded on the ledger for private relationships, because there is always exactly one other party who needs to know them. Only public DIDs that are supposed to be correlated (e.g., the endpoint for acme.com’s own public agent) are written to the ledger. All others are written to a microledger instead.


#10

@danielh

RE: First. As I understand it, blinding is primarily to protect against insider threats, so that is what I an focusing on here. Assume I am a service company or aggregator of some kind and I handle exchanges for different web sites which have many different users presenting proofs. The question is, is it possible for me to figure out which DIDs (and therefore their associated properties) are controlled by the same entity.

My first thought was that you could scan the DID records in the Ledger for the same unusual Endpoint and conclude that they represent the same individual or some small group of individuals with something in common.

So you are saying that if I want privacy a) I must use an Agent, b) I better use a Cloud Provider to run my Agent and c) I better use Google or Amazon (or Oracle) rather than somebody like Sprynet. This also suggests that my privacy is at the mercy configuration procedures at my vendor which they can change at any time. For example, what if endpoints leak location information?

Re: Second. This is interesting. If I want to prove to the Foo-club web site that I am a member in good standing. How to I tell the my endpoint if it is not in the Ledger?


#11

a) yes, you must use an Agent. The idea of agents is baked into Sovrin. However, I don’t think you are defining them the way I am. A mobile app is an “agent”. So is software running in the cloud. The former is called an “edge agent”, and the latter is called a “cloud agent”. We have to use agents (software that does our bidding), because humans can’t emit and consume bytes directly, sign things, etc.

b) No, but you will find it much easier to participate in the ecosystem if your endpoint is stable. So you could minimally host a redirector in the cloud that get people to your phone (edge agent). This redirector could be totally decoupled from your physical location (e.g., it’s based out of an AWS datacenter in Hong Kong, or a smaller ISP like Sprynet, either of which is half a world away from where you really live.) Or, if you really want nothing to do with the cloud, you could update your endpoint each time you turn on your phone and get an IP address–and communicate that addr only to the parties you need to interact with at that time. Whatever endpoint you use, it will be better for you not to have a unique endpoint, but rather a common one. This gives you an additional form of privacy (herd privacy). Without that, your endpoint becomes a unique identifier. People may have no idea what is behind that endpoint at first, but as you disclose things about yourself through the endpoint, all data points that you leak can be correlated. For more on correlation, see http://bit.ly/2yXtKYG and http://bit.ly/2mkr3Y8.

Re. Foo-club: you send a message to Foo-club proving what you like (using Sovrin’s anoncreds proof presentation protocol). Foo-club doesn’t need your endpoint at all; you just need theirs.


#12

Hi @danielh

Thank you for your explanation. Can i ask you to elaborate on the “microledger” a little bit more? Or could you provide me a link with further explanation?
Am i right this microledger is stored in a agent?


#13

I just noticed that no one in the forum ever answered the final question in this thread. Since I know the Sovrin Foundation is working on a paper on microledgers, let me provide at least a basic answer.

A microledger is a private ledger shared only between the parties to a relationship. In fact, each microledger is unidirectional, i.e., it is published by one identity owner and subscribed to by one or more other identity owners. For example, in a two-party relationship, each of the two parties is sharing a dedicated microledger with the other—so the full relationship represents two paired microledgers, one in each direction. To be specific:

  1. When one identity owner adds another other as a Sovrin connection, the first identity owner’s agent generates a unique key pair (from which is derived the corresponding Sovrin microledger DID) and a private agent service endpoint.
  2. These are then shared in a DID document only with the other party. In other words, they are all pairwise pseudonymous.
  3. Each party shares the DID document with the other on a microledger—thus the two “paired” microledgers, one going in each direction.
  4. All updates to the DID documents, including key rotations (and potentially shared claims or other transactions that need to be mutually verified) are written to the source microledger. Such changes are “pushed” directly to the subscribed microledger rather than to the Sovrin public ledger.

Note that this contrasts to the use of public DID documents on the Sovrin public ledger, where changes are not “pushed” to anyone but rather must be “pulled” by interested parties.

Yes, you are right. In fact one major advantage of microledgers is that they are completely private, i.e., shared only between the agents representing the parties to a relationship. This also makes them highy scalable, since the load of microledger-based transactions does not involve the Sovrin public ledger.

Note that, although I am not a direct contributor to Hyperledger Indy (except for specs), my understanding is that microledger code has not yet been added to the Hyperledger Indy repos. It is on the roadmap so look for that later this year.


#14

HI @Drummond
Thank you very much for the detailed answer. I think the concept of the Microledgers is very interesting and highly promising in regard to a more scalable solution. Looking forward to the Paper of the Sovrin foundation.