Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix re-login behavior of remote clients to a server #3486

Merged
merged 11 commits into from
Mar 14, 2018

Conversation

adcxyz
Copy link
Contributor

@adcxyz adcxyz commented Feb 1, 2018

*** PR Info text completely redone ***
Thanks to questions by @muellmusik and @brianlheim,
the cases covered in this PR became much clearer to me.
The general issue is handling repeated logins in network performance situations,
to recover as gracefully as possible from loss of connection between client and server,
whether by network glicht or client crash.
(server crash is outside the scope of this PR, but should be fully thought through elsewhere.)

Case 1: network connection is lost,
and the same client, same client-side server object logs in again.

  • the server program remembers the client, and sends its previous clientID back
  • then the client knows it is connected again,
  • and it can keep all its known buffers, nodes, synths as they are.
    -> so, just post that connection is back, and all is well.

Case 2: client crashes, and is restarted;
then the new client logs in again from the same network address:

  • the client-side server object is new, but the server program knows the address,
    so the server program sends its previous clientID back to the new client.
  • because the new server object may not know its previous clientID,
    the server program response may trigger new allocators - which is correct
  • there may be buffers and running nodes and synths on the server program
    that the new server object does not know about:
    depending on the specific setup used,
  • it may make sense to get rid of these by hand ( server.defaultGroup.release )
  • and reinit the server by running server.statusWatcher.prFinalizeBoot by hand.
    -> so, post about successful reconnect, and advice about release/reinit code.

(Note for later: maybe develop functionality to retrieve these resources
from the server program in the client.)

Case 3: loss of connection and login from a different network address:
This will look like a new login to the server, so the old defaultGroup
and everything in it keeps going, and the new client gets a new clientID.

In the future, this case could be handled better by releasing the previous notify registration,
and then asking for the same clientID it had earlier, to continue seamlessly from there.
This would need some more effort to work.

The tests provided also cover cases 1 and 2:
TestServer_clientID_booted:test_repeatedLogin tests case 1
TestServer_clientID_booted:test_remoteLogins tests a first login by the Server:remote method,
and a repeated login from a new server object after a simulated crash.

fork {
	Server.killAll;
	0.1.wait;
	TestServer_clientID_booted.new.test_repeatedLogin;
	0.1.wait;
	TestServer_clientID_booted.new.test_remoteLogins;
};

@nhthn nhthn added this to the 3.9.2 milestone Feb 1, 2018
@adcxyz adcxyz requested review from mossheim and telephon February 2, 2018 09:17
// -> so the response should go thru prHandleNotifyFailString

remote = Server.remote(\remTest, homeServer.addr, homeServer.options);
// Server.default = remote; // to test IDE server display
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove

remote.serverRunning.not and: { timeout > 0 }
} {
timeout = timeout - dt;
dt.wait;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use a condition variable here


// make s play dead now, but leave the process running
homeServer.statusWatcher.stopStatusWatcher.stopAliveThread.serverRunning_(false);
// homeServer.serverRunning.postln; // client thinks it is dead
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove


this.assert(homeServer.clientID == 3,
"homeServer gets requested clientID back from server process."
);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be a separate test or removed altogether

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, done

);

// make s play dead now, but leave the process running
homeServer.statusWatcher.stopStatusWatcher.stopAliveThread.serverRunning_(false);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

chaining these calls is somewhat confusing IMO

@@ -544,7 +544,8 @@ Server {
case
{ failString.asString.contains("already registered") } {
"% - already registered with clientID %.\n".postf(this, msg[3]);
statusWatcher.notified = true;
statusWatcher.prRecoverRemoteLogin(msg[3]);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could use a comment - without looking I don't know what msg[3] is

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

}, AppClock)
}

prRecoverRemoteLogin { |clientID|
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

needs to be added to the list of private methods in the help file

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is this name supposed to mean, btw? not immediately apparent to me, sorry

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hm, maybe really not a good name.
would you find prHandleRemoteLoginWhenAlreadyRegistered clearer?

// -> this client netaddr is already registered, and should say so!
// -> so the response should go thru prHandleNotifyFailString

remote = Server.remote(\remTest, homeServer.addr, homeServer.options);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

flow of logic would make more sense to me if this server were created right after the homeServer; does this constructor call depend on preceding code in a way I'm not seeing?

"homeServer gets requested clientID back from server process."
);

// make s play dead now, but leave the process running
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s -> homeServer

@muellmusik
Copy link
Contributor

Just two quick questions @adcxyz:

If clientIDs are handed out by the server, then if the server recognises the client address, it can just give the same ID or? Or is this for the backwards compatible manually specified ID case?

Does it make sense that a client necessarily gets the same ID? If you have a lang crash, leaving orphaned nodes, then getting the same ID could lead to resource conflicts. There are various ways of dealing with that of course (e.g. gracefully free orphaned nodes using predictable default group), but you might want different ones for different use cases.

@adcxyz
Copy link
Contributor Author

adcxyz commented Feb 2, 2018

@muellmusik :

If clientIDs are handed out by the server, then if the server recognises the client address, it can just give the same ID or?

Yes, exaclty. This is what the server program has been doing all along, and I think is the clearest and most predictable behavior for most cases, as listed below.

Or is this for the backwards compatible manually specified ID case?
No, not specifically - when you manually ask for a clientID, and it is free, you get that;
when it is not free, you get a free one instead.
When your are recovering a lost login, the recovered previous clientID overrides a manually specified ID. Users always get back a clientID that will work, as long as there are free ones.

Does it make sense that a client necessarily gets the same ID? If you have a lang crash, leaving orphaned nodes, then getting the same ID could lead to resource conflicts. There are various ways of dealing with that of course (e.g. gracefully free orphaned nodes using predictable default group), but you might want different ones for different use cases.

I think keeping previously known IDs is the clearest and most predictable behavior for most cases I can think of:
Case 1: remote server login code accidentally runs twice -> same ID, nothing breaks.
(still needs full testing whether node allocator is being reset in this case)
Case 2: remote client loses network connection, and logs in again:
gets the same clientID, so the same defaultGroup, and all your synths etc should still be there.
(still needs full testing whether node allocator is being reset in this case)
Case 3: remote client crashes, and logs in again:
gets the same clientID and can then decide how to handle things gracefully:
release or free their (same as before) defaultGroup, or whatever else is called for.
(allocator reset is desirable here)
Case 4: a remote client gets lost and leaves sounds running:
Because all default groups are known to all clients, other users can intervene and gracefully end or release everything in that client's defaultGroup.
Case 5: multiple losses of contact in one show: if the server program keeps handing out new clientIDs, it would run out of clientIDs quickly, so handing back the previous ones is also safer for this case.

hope that answers your questions?
best a

@adcxyz
Copy link
Contributor Author

adcxyz commented Feb 2, 2018

hi @brianlheim, thanks for the detailed comments,
will get back to them ASAP - off to a show with Julian and Marcus Schmickler now!

@muellmusik
Copy link
Contributor

hope that answers your questions?

Indeed! Thanks for the detailed and patient explanation. :-)

@mossheim
Copy link
Contributor

mossheim commented Feb 2, 2018

off to a show with Julian and Marcus Schmickler now

Jealous! Have a great time!

@adcxyz
Copy link
Contributor Author

adcxyz commented Feb 5, 2018

Hi @brianlheim and @muellmusik,
thanks for the questions, and comments they made me rethink and rewrite this one quite a bit!
I think it is much clearer now what this PR aims to improve - please see updated top comment
This is the thread that discussed the problem:
https://www.listarc.bham.ac.uk/lists/sc-users/msg59049.html
best adc

Copy link
Member

@telephon telephon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one minor request, should be easy to fix. Looks very good -- it'll be a great relief not having to teach the kids to kill servers on a daily basis.

"// so you may want to release currently running synths by hand:\n".postln;
"%.defaultGroup.release;\n".postf(server.cs);
"// and you may want to redo server boot finalization by hand:\n".postln;
"%.statusWatcher.prFinalizeBoot;\n".postf(server.cs);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we want to suggest this line as something to actually do, it shouldn't involve a private message. What about adding:

+ Server {
		finalizeBoot { this.statusWatcher.prFinalizeBoot }
}

then the line above would read:
"%.finalizeBoot;\n".postf(server.cs);

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure about the name ...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC this solution is a hack, so it's ok to call a private function. Adding access to a private function as a public function seems worse because it has to be walked back with deprecation.

Copy link
Member

@telephon telephon Feb 5, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, makes some sense. My only worry is that the public use of private methods might be perceived as the standard way to go.

So let's note it in the comment that this is a temporary measure which will be replaced in future versions?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

like the comment above the method says, the whole thing is about making it possible to recover in an emergency, so yes, it is a hack.

// cleanup
remote1.remove;
remote2.remove;
unixCmd("kill -9" + serverPid);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is not portable at all

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. Is there a crossplatform version of kill?
or is it OK to use Server.killAll in the testsuite?
I would have preferred that, but assumed it is cleaner to only kill the exact process and nothing else.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a crossplatform version of kill?

Windows uses taskkill.

is it OK to use Server.killAll in the testsuite?

No. The single process approach is much more sound. Since we have Platform.killAll already, it would probably not be such a bad idea to add killProcess that accepts a pid. Just a suggestion. That should be done in a separate PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have near zero windows shell skills - would the methods below work?
in UnixPlatform:
kill { |pid|
("kill -9 " ++ pid).unixCmd;
}
and in WindowsPlatform:
kill { |pid|
("taskkill /F /pid " ++ pid).unixCmd;
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think so.. I can't test at the moment

} {
// same clientID, so leave all server resources in the state they were in!
"// This seems to be a login after a loss of network contact - \n"
"// - reconnected with the same clientID as before, so probably all is well.\n".postln;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this really need to be posted? i think this is reaching a point where it's going to be regarded as noise by users

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well, it is trying to be helpful in a rare emergency scenario - you are in the middle of playing a show, network drops out, you try to reconnect, and likely happy to see anything informative posted. I doubt that anyone would find that noisy in that situation.
But maybe that is just me.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gotcha, that makes sense. I think it would be nice if the post message was consistent with other messages coming out of this class, though. No other messages that I've seen start with "//", it seems arbitrary.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually since it should be noticeable, how about making it begin with *** for bold posting?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just tried it - that does make the post-crash hints pop out nicely in the post window!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all right!

// simulate a remote server process by starting a server process independently of SC
serverPid = unixCmd(Server.program + options.asOptionsString);

1.wait;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is this wait for?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

leftover from an earlier version of the test, unneeded now - can remove it.


this.assert(remote1.clientID == 3,
"after first login, remote client should have its requested clientID."
);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are these asserts necessary? they seem to be covered well by other tests.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIK the Server:remote method is not tested anywhere else, so testing both its normal and its emergency use seemed efficient and appropriate.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

then it should be in a separate test...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so have a test for remoteLogin, which tests the first login only,
and a test for repeatedRemoteLogin, which will needs a first login as setup,
and then tests the repeated login only?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, I think all requests done,
and submitted separate PR for kill method

@adcxyz
Copy link
Contributor Author

adcxyz commented Feb 9, 2018

OK - I believe I addressed all requests for changes on this one,
so I guess the next steps are:
I wait for #3499 addKillMethod to be merged,
then change this one to use killProcessByID,
then it can be merged.
Did I miss anything?

@nhthn nhthn added the comp: class library SC class library label Feb 18, 2018
@mossheim
Copy link
Contributor

mossheim commented Mar 1, 2018

#3499 has been merged, this can be updated now.

@adcxyz
Copy link
Contributor Author

adcxyz commented Mar 1, 2018

ok, updated - thanks @brianlheim !

.notified_(false)
.serverRunning_(false);

// 1.wait;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please remove this? Everything else looks good, thanks for the tests!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, done!

@mossheim
Copy link
Contributor

mossheim commented Mar 1, 2018

Thanks, this is ready for merge IMO. @telephon - want to give it a look since you've got a review already?

Copy link
Member

@joshpar joshpar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good!

@patrickdupuis
Copy link
Contributor

@telephon please approve when you get the chance 👍

@telephon telephon merged commit d835d53 into supercollider:3.9 Mar 14, 2018
@telephon
Copy link
Member

(sorry for the delay, I had somehow missed the message)

@adcxyz adcxyz deleted the fixRecoverRemoteLogin branch March 31, 2018 15:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
comp: class library SC class library
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants