Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error checking for #1461 #1462

Open
wants to merge 18 commits into
base: main
Choose a base branch
from

Conversation

Game4Move78
Copy link
Contributor

@Game4Move78 Game4Move78 commented Jul 11, 2022

#1461

Types of changes

  • Docs change / refactoring / dependency upgrade
  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)

Motivation and Context / Related issue

#1461

How Has This Been Tested (if it applies)

Checklist

  • The documentation is up-to-date with the changes I made.
  • I have read the CONTRIBUTING document and completed the CLA (see CLA).
  • All tests passed, and additional code has been covered with new tests.

@facebook-github-bot
Copy link

Hi @Game4Move78!

Thank you for your pull request and welcome to our community.

Action Required

In order to merge any pull request (code, docs, etc.), we require contributors to sign our Contributor License Agreement, and we don't seem to have one on file for you.

Process

In order for us to review and merge your suggested changes, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA.

Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the pull request will be tagged with CLA signed. The tagging process may take up to 1 hour after signing. Please give it that time before contacting us about it.

If you have received this in error or have any questions, please contact us at cla@fb.com. Thanks!

@facebook-github-bot facebook-github-bot added the CLA Signed Do not delete this pull request or issue due to inactivity. label Jul 11, 2022
@facebook-github-bot
Copy link

Thank you for signing our Contributor License Agreement. We can now accept your code for this (and any) Meta Open Source project. Thanks!

1 similar comment
@facebook-github-bot
Copy link

Thank you for signing our Contributor License Agreement. We can now accept your code for this (and any) Meta Open Source project. Thanks!

@teytaud
Copy link
Contributor

teytaud commented Jul 12, 2022

A this line: https://github.com/Game4Move78/nevergrad/blob/7152ca66cc5f66c0427579d486075ba18dc003b7/nevergrad/optimization/differentialevolution.py#L104

We might add
self.llambda = max(self.llambda, self.num_workers)

This should solve the issue, however it means that the user might specify "I want llambda to be 20 and Nevergrad decides to set llambda to 30".

Nevergrad may ignore user specified llambda if fewer than num_workers
@Game4Move78 Game4Move78 marked this pull request as ready for review July 15, 2022 11:55
@teytaud
Copy link
Contributor

teytaud commented Jul 22, 2022

Your code looks good to me, the problem might be in MixDeterministicRL. I investigate. Thanks for your work.

@@ -158,6 +159,8 @@ def _internal_ask_candidate(self) -> p.Parameter:
self.population[candidate.uid] = candidate
self._uid_queue.asked.add(candidate.uid)
return candidate
# stop queue wrapping around to lineage waiting for a tell
assert self._uid_queue.told, "More untold asks than population size (exceeds num_workers)"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jrapin you are the expert for self._uid_queue.told (among so many things...), do you validate this assert ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I guess the error is in class Portfolio. Let me propose a fix (fingers crossed :-) ).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jrapin you are the expert for self._uid_queue.told (among so many things...), do you validate this assert ?

If it helps, my thinking was that there should be a tell preceding every ask after the initalization phase keeping the told queue non-empty. Even in the worst case where popsize ==num_workers and all workers are evaluating untold points, the worker that beats the others to the tell can use the same point again on the next ask.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've just sent a message to Jeremy, who knows that code better than anyone else and who might not have been close to github recently. Sorry for the delay; your PR is interesting.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I used to be strict with the fact that we should not go beyond num_workers, but I changed my mind a couple of years ago because there are many cases you don't master all the details of what is happening (eg: a process dies and you'll never get the result), most times the user won't deal with it and we should be robust to it to simplify use. The code was then supposed to be robust but visibly there are corner cases :s
I would be therefore rather make it robust to this case (would that just take removing duplicates in UuidQueue.told ? it should be light speed so not a problem)

cc @bottler you seemed to disagree and want the user to strictly conform to the "contract", maybe we can discuss and adapt depending if I change your mind or not ;)

@Game4Move78 as a power user, would you rather it bugged explicitely, or be robust to those corner cases? (why did you happen to ask for more points?)

Copy link
Contributor Author

@Game4Move78 Game4Move78 Aug 2, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I followed the hyper-parameter settings of papers that used DE for HPO and set popsize to 20 explicitly without providing num_workers, and thought it would be robust. I then asked for more points and handed them to my own adaptive resource allocation + early stopping implementation that evaluated HPO choices with multiple budgets and only provided a tell to the NG optimiser when points were either stopped early or allocated maximum budget.

This would work fine for hundreds of points until it hit that corner case with a point in the told queue that has been deleted from population. My current workaround is to provide feedback immediately on the minimum budget and then treat all evaluations on higher budgets as unasked points, which works fine for DE.

If you want less strict (I do too), how about we allow duplicates in told but at L162 we add

while lineage not in self.population:
      lineage = self._uid_queue.ask()

Which I believe would toss away those points that were deleted from a better tell not asked. Future asks will be biased to duplicate points. Added a commit that checks for duplicate tell using absence from asked queue, although there may be a more intuitive way.

Copy link
Contributor Author

@Game4Move78 Game4Move78 Aug 2, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My personal preference to help users master those details where they can is to copy Ax's client interface with an abandon_tell. For most optimisers this would just tell a large value, and the BO optimisers might do something different to avoid damaging the model.

teytaud and others added 11 commits July 25, 2022 09:48
so ParaPortfolio is not really parallel.
Avoid adding uid to queue twice. This handles both cases: 
- More asks than workers (point used twice but added to told queue once)
- Ask without a tell (last worker grabs this point from asked queue)

facebookresearch#1462 (comment)
@Game4Move78
Copy link
Contributor Author

Game4Move78 commented Sep 27, 2022

@jrapin Any chance of getting this merged 😃? Line

if uid in self.asked:
also uses absence in UidQueue.asked to check for presence in told, and self._uid_queue.asked is configured directly on many lines in _DE already.

I believe this code enforces that for asked points with the same parent, the lineage will be added to the told queue only once in subsequent tells:

if uid in self._uid_queue.asked:  # if taken from queue in multiple asks, add back only once
            self._uid_queue.asked.discard(uid)
            self._uid_queue.tell(uid)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed Do not delete this pull request or issue due to inactivity.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants