Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FIXED] Some Leafnode issues #1106

Merged
merged 1 commit into from
Aug 26, 2019
Merged

[FIXED] Some Leafnode issues #1106

merged 1 commit into from
Aug 26, 2019

Conversation

kozlovic
Copy link
Member

  • When soliciting a Leafnode connection, if there is an account
    specified and the lookup fails, the server would only print
    as a debug and the connection would never been retried. After
    discussion, it seems that proper behavior would be to produce
    a fatal error.

  • When using basic auth (user/password), it was possible for a
    solicited Leafnode connection to not use user/password when
    trying an URL that was discovered through gossip. The server
    now saves the credentials of a configured URL to use with
    the discovered ones.

Updated RouteRTT test in case RTT does not seem to be updated
because getting always the same value.

Signed-off-by: Ivan Kozlovic ivan@synadia.com

Copy link
Member

@derekcollison derekcollison left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally looks good, a few questions.

@@ -499,7 +515,7 @@ func (s *Server) createLeafNode(conn net.Conn, remote *leafNodeCfg) *client {
// FIXME(dlc) - Make this resolve at startup.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is at startup we can remove, but if server starts and then can fail at a later time still need to update this IMO.

@@ -499,7 +515,7 @@ func (s *Server) createLeafNode(conn net.Conn, remote *leafNodeCfg) *client {
// FIXME(dlc) - Make this resolve at startup.
acc, err := s.LookupAccount(remote.LocalAccount)
if err != nil {
c.Debugf("No local account %q for leafnode", remote.LocalAccount)
s.Fatalf("No local account %q for leafnode: %v", remote.LocalAccount, err)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See above.

@@ -499,7 +515,7 @@ func (s *Server) createLeafNode(conn net.Conn, remote *leafNodeCfg) *client {
// FIXME(dlc) - Make this resolve at startup.
acc, err := s.LookupAccount(remote.LocalAccount)
if err != nil {
c.Debugf("No local account %q for leafnode", remote.LocalAccount)
s.Fatalf("No local account %q for leafnode: %v", remote.LocalAccount, err)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this only for local config? Meaning can this fail on operator mode lookup?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's what we discussed yesterday! My initial thought was that we should ERR and retry but you said that the account should always be avail otherwise we should not start (arguably, we could then check the account availability at startup instead of here, but..).
If you are asking if LookupAccount() can fail, of course it can, if it has to fetch and connection is not avail, etc..

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are in createLeafnode() here, which is invoked when soliciting or accepting a LN connection.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think per my comment if we are in local config mode we should lookup any account binding for the leafnode on startup and fail if it does not exist.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For solicited ones its essentially on startup, but not for accepted ones.

Copy link
Member Author

@kozlovic kozlovic Aug 23, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My concern was that in operator mode we may not be able to lookup the account at first connect, but may later. But that may be wrong assumption.

Regardless of local/operator mode, lookup on Server A can fail at runtime:

  • Account removed and config reload issued
  • Account expired

If after any of those 2 events Server B is restarted, Server A would try to recreate connection to B and lookup will fail (just tried with removing account + config reload + restart of B).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For Server A that has a solicited connection to Server B, agree that the account can expire, or be removed via config reload. We need to define our desired behavior when that happens. Server should not fail or exit obviously, but probably should not aimlessly try to recreate the connection without a valid local account IMO.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All my discussion has been about Server A here.. the one that has a "remote" configuration in "leafnodes{}".
Let's discuss then later, because I am not sure what is the alternative between [FTL] (exit) and [ERR] and try again... except prevent removal/expiration of such account? (not sure how we would do that because that would leave the config file in different state than the runtime)..

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On startup if a solicited leafnode can not resolve/lookup its local account the server should fail to start IMO. If we subsequently get an error on the account we should shut the leafnode down. We can keep trying to reconnect and failing on account lookup to see if it resolves, or be smarter about it but that feels like we can do that later on.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, updated the PR (push forced). Not sure I have captured what you wanted. Happy to iterate if needed or remove the connect/account part and just leave the fix for username/password.

- On startup, verify that local account in leafnode (if specified
  can be found otherwise fail startup).
- At runtime, print error and continue trying to reconnect.
  Will need to decide a better approach.
- When using basic auth (user/password), it was possible for a
  solicited Leafnode connection to not use user/password when
  trying an URL that was discovered through gossip. The server
  now saves the credentials of a configured URL to use with
  the discovered ones.

Updated RouteRTT test in case RTT does not seem to be updated
because getting always the same value.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
Copy link
Member

@derekcollison derekcollison left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@kozlovic kozlovic merged commit 5518cbe into master Aug 26, 2019
@kozlovic kozlovic deleted the fix_leafnode branch August 26, 2019 15:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants