Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Identity and package management #2532

Merged
merged 6 commits into from
Aug 19, 2019
Merged

Conversation

oggy-
Copy link
Contributor

@oggy- oggy- commented Aug 14, 2019

Hey folks,

here's a write-up of the identity and package management services. This also includes the ledger topology write-up for completeness, but let's keep the discussion on that topic in a separate PR:
#2476

Pull Request Checklist

  • Read and understand the contribution guidelines
  • Include appropriate tests
  • Set a descriptive title and thorough description
  • Add a reference to the issue this PR will solve, if appropriate
  • Add a line to the release notes, if appropriate
  • Normal production system change, include purpose of change in description

NOTE: CI is not automatically run on non-members pull-requests for security
reasons. The reviewer will have to comment with /AzurePipelines run to
trigger the build.

Copy link
Contributor

@bame-da bame-da left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice piece overall. Just a few bits that confused me given that Canton ledgers can split and merge.

Identity and Package Management
###############################

A DAML Ledger is a software system that enables parties to automate the management of their rights and obligations through smart contract code.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't think you need to introduce DAML at this point.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True - just needed an overture for why we need identity and package management. Rephrased


By definition, identifiers identify parties, and are thus unique for a ledger.
They do not, however, have to be unique across different ledgers.
That is, two parties with identical identifiers in two different ledgers are not the same party.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this true in a Canton world?
If I have two parties with the same identifier in two un-connected Canon domains, this would suggest they are different parties. However, as soon as these domains somehow get connected, even if through several domain-hops, they are the same? Or do you consider all Canton domains in the world to be part of one partitioned leder?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I consider Canton the latter (we call it the "virtual global ledger"). I weakened the statement to "not necessarily", though, as what I wrote was not true in general.

For example, a party with a display name "Attorney of Nigerian Prince" might well be controlled by a real-world entity without a bar exam.
However, particular ledger deployments might make stronger guarantees about this link.
Finally, the association of identifiers to display names may change over time.
For example, a party might change its display name from "Bruce" to "Caitlyn" -- as long as the identifier remains the same, so does the party.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are "display names" defined at ledger-level, domain-level or participant-level?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIK, participant level - added.

docs/source/concepts/identity-and-package-management.rst Outdated Show resolved Hide resolved
The ``AllocateParty`` call can take the desired identifier and displayed name as optional parameters, but these are merely hints and the ledger implementation may completely ignore them.

If the call returns a new identifier, the :ref:`participant node <participant-node-def>` serving this call is ready to host the party with this identifier.
The returned identifier is guaranteed to be **unique** in the ledger; namely, no other call of the ``AllocateParty`` method at this or any other ledger participant may return the same identifier.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, this doesn't make sense to me in a Canton world where ledgers can merge and split.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what issue you're raising here: is the statement too weak, or too strong?

I don't see any stronger statements we could make: we can only guarantee that the returned identifier is unique within "the ledger". I can easily have two Sandbox instances who return the same identifiers when AllocateParty is called.

I don't think it's too weak either; I believe it should hold for all ledgers with central operators, and also for Canton. Even when Canton is viewed as a single virtual global ledger, AllocateParty will never return the same identifier, even when initiated at two participants who are cut off from each other (at least with the current Canton identity management system).

Copy link
Contributor Author

@oggy- oggy- Aug 16, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After discussing with Bernhard: it's too strong; in Canton we can have two participants that share the same private key, and thus allocate the same party identifiers. I'll rephrase to something weaker.

The unit of packaging for DAML-LF is the :ref:`.dalf <dar-file-dalf-file>` file.
Each ``.dalf`` file is uniquely identified by its **package identifier**, which is the hash of its contents.
A :ref:`.dar <dar-file-dalf-file>` file is a simple archive containing multiple ``.dalf`` files, and has no identifier of its own.
DAML ledgers support uploading only ``.dar`` files.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems rather bizarre. We are saying that the deployable unit of code is a dalf file with a unique identifier, but to actually deploy it, you must wrap it in an otherwise function-less container.
Plus, both Sandbox and Canton do allow deployment of dalf files, don't they?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They do, but the official API only takes .dar files. I added an explanation as to why .dars are still useful.

However, the operators of the ledger infrastructure nodes may still wish to review and vet any DAML code before allowing it to execute.
One reason for this is that the DAML interpreter currently lacks a notion of reproducible resource limits.
Thus, executing a DAML contract might result in high memory or CPU usage.
Furthermore, security bugs in the DAML interpreter or JVM might enable malicious code to break out of the sandbox.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, that's possible, but I wouldn't suggest this in our docs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed

docs/source/concepts/identity-and-package-management.rst Outdated Show resolved Hide resolved
docs/source/concepts/identity-and-package-management.rst Outdated Show resolved Hide resolved
docs/source/concepts/identity-and-package-management.rst Outdated Show resolved Hide resolved
Copy link
Contributor

@meiersi-da meiersi-da left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@oggy- : this is good stuff! Forgive my very direct language in the comments... I was minimizing typing 😊

I haven't yet completed the review, but need to make a save-point. See all comments as suggestions and hints at how one reader perceived your text when reading without context top-down. Your call on how to address.

The topologies can impact both the functional and non-functional properties of the resulting ledger.
This document:

1. Provides one useful categorization of the existing implementations' topologies.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add headings of the two topologies to give a preview and easy link targets

This document:

1. Provides one useful categorization of the existing implementations' topologies.
The categorization is not the only one possible, and it is not always clear-cut.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest to drop this line. Covered by one useful above.


1. Provides one useful categorization of the existing implementations' topologies.
The categorization is not the only one possible, and it is not always clear-cut.
Its main aim is to group the implementations according to their high-level properties.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Drop as well. Not clear how it helps reader.

The categorization is not the only one possible, and it is not always clear-cut.
Its main aim is to group the implementations according to their high-level properties.

2. Describes the general ledger properties of each category.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having written the above comments: consider replacing the two items with a short summary of the two categories, and add a forward link to Package and Party Management where they are helpful to know.


.. _trust-domain:

We call a system operated by a single real-world entity a **trust domain**.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add definition to Glossary of Background concepts, and reference from here; or perhaps better intro of topologies page.


- it provides no scaling

- it is not highly available
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these two properties are "not not true" ;-) all the Google Apps are examples of highly available and scalable services operated in a single trust domain.

==> remove these two bullet points


- the real-world entity operating the physical shared ledger is fully trusted with preserving the ledger's integrity

- the real-world entity operating the physical shared ledger has full insight into the entire ledger, and is thus fully trusted with privacy
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

real-world entity vs. single physical copy vs. trust domain. Consider defining operating entity as the entity operating trust domain use that in the definition of Fully Centralized Ledger topology

The :ref:`DAML Sandbox <sandbox-manual>` uses this topology.
While simple, this topology has certain downsides:

- it provides no scaling
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: use ordered lists to make it easy to reference items in discussions


The first four problems can be solved or mitigated as follows:

- scaling by splitting the system up into separate functional components and parallelizing execution
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

consider using ordered list for ease of reference

Copy link
Contributor

@meiersi-da meiersi-da left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another save-point. Looking forward to reviewing the remainder of the pkg and identity mngt. Thanks for the good work @oggy- !


.. _participant-node-def:

2. **Participant nodes**, (also called Client nodes in some platforms) which serve the ledger API to a subset of the system parties, which we say are hosted by this participant.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/ledger API/Ledger API

`Canton <http://canton.io>`__ and DAML on `R3 Corda <https://www.corda.net>`__ are two such implementations.
The main drawback of this topology is that availability can be influenced by the participant nodes.
In particular, transactions cannot be committed if they use data that is only stored on unresponsive nodes.
Spreading the data among additional trusted entities can mitigate the problem.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Completed review of topology section: very nice decomposition of topologies @oggy- ! This is going to be a good basis for providing a "Which DAML ledger should I use?" section!


#. the minimal behavioral guarantees for identity and package services across all ledger implementations, and

#. guidelines to understand how the :ref:`ledger's topology <daml-ledger-topologies>` influences the unspecified part of the behavior.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be good if this section clarified

  1. who the audience of the document is
  2. what questions the seciton will answer.

My take is that the audience is ledger implementors and ledger operators. There seem to be multiple questions:

  1. How should I implement party and package managment?
  2. Which ledger should I select provided my requirements on party and package management?

Copy link
Contributor

@meiersi-da meiersi-da left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@oggy- I've completed the review :) Good stuff! Thanks a lot!

I've added extensive comments, but none of them are critical. Please address as you see fit!

The access to these services is usually more restricted compared to the other Ledger API services, as they are part of the administrative API.
Any implementation of the services is guaranteed to accept inputs and provide outputs of the format specified by these services.
However, the services' *behavior* -- the relationship between the inputs and outputs that the various parties observe -- is largely implementation dependent.
The remainder of the document will presents both:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/will presents/presents

In fact, in such a ledger, the participants ``P1`` and ``P2`` might not have a way to communicate to each other, or might not even be aware of each other's existence.

For diagnostics, the ledger also provides a ``ListKnownParties`` method which lists parties known to the participant node.
The parties can be local (i.e., hosted by the participant) or not.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good section! Thanks @oggy- !

For example, if the operator is a stock exchange, it might guarantee that a real-world exchange participant whose legal name is "Bank Inc." is represented by a ledger party with the identifier "Bank Inc.".
Alternatively, it might use a random identifier, but guarantee that the display name is "Bank Inc.".
Ledgers with :ref:`partitioned topologies <partitioned-topologies>` in general might not have such a single store of identities.
The solutions for linking the identifiers to real-world identities could rely on certificate chains, `verifiable credentials <https://www.w3.org/TR/vc-data-model/>`__, or other mechanisms.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider adding a sentence like

An additional option is to use for a DAML-based application to include a Know-Your-Customer workflow in DAML to establish the link from a party to a real world identity.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added

Templates in a ``.dalf`` file can references templates from other ``.dalf`` files, i.e., ``.dalf`` files can depend on other ``.dalf`` files.
A :ref:`.dar <dar-file-dalf-file>` file is a simple archive containing multiple ``.dalf`` files, and has no identifier of its own.
The archive provides a convenient way to package ``.dalf`` files together with their dependencies.
The Ledger API supports only ``.dar`` file uploads.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add: However storage of data on-ledger typically happens using .dalf files, as only they have globally unique identifiers.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added

@oggy- oggy- force-pushed the identity-and-package-management branch from 52b2763 to c0867c0 Compare August 19, 2019 15:15
@oggy-
Copy link
Contributor Author

oggy- commented Aug 19, 2019

@meiersi-da thanks for the review and I hope it saves you some discussions in the future :) I think I was able to address all your comments in a satisfactory way

@oggy- oggy- merged commit c71237b into master Aug 19, 2019
@oggy- oggy- deleted the identity-and-package-management branch August 19, 2019 17:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants