-
Notifications
You must be signed in to change notification settings - Fork 363
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature] Add support for full wandb's define_metric arguments #1099
[Feature] Add support for full wandb's define_metric arguments #1099
Conversation
Updated: allow define_metric_cfg to accept dict and list of dict, and remove added define_metric_cfgs argument. I noticed that the define_metric support glob so that we can use the following setting:
https://docs.wandb.ai/guides/track/log/customize-logging-axes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your contribution! But should we make self.wandb.log
accept the step
argument in this PR? Will the current modification resolve the resume problem?
mmengine/runner/log_processor.py
Outdated
# Add global iter. | ||
tag['iter'] = runner.iter + 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current implementation of runner.iter
in mmengine's Runner
triggers the instantiation of runner._train_loop
as seen in these lines of code:
mmengine/mmengine/runner/runner.py
Line 518 in 7924f8b
return self.train_loop.iter |
and
mmengine/mmengine/runner/runner.py
Line 569 in 7924f8b
return self._train_loop |
runner.iter
actually represents the training iteration, and the construction of the train_loop
should be avoided during validation or testing phases. The current design mechanism is somewhat implicit, which could lead to confusion. Recently, I also created a pull request (#1107) to fix this kind of bug 🤣 .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggest assigning iter
like this:
if isinstance(runner._train_loop, dict) or runner._train_loop is None:
tag['iter'] = 0
else:
tag['iter'] = runner.iter + 1
@HAOCHENYE Thank you for your review.
We can fix the continuity of the resumed plot by changing the X-axis from the 'step' to 'iter' (or epoch). In this PR, we include the 'iter' in the log target so that the wandb knows how to map from 'step' to 'iter' even if the user did not set define_metric in advance. This means that the user needs to change X-axis by themselves by using define_metric or Web UI.
|
Thanks for your detailed explanation 😄 ! I've got it! |
5ddd4ca
to
9f77ab9
Compare
@HAOCHENYE Thank you for your comment.
|
Motivation
In this PR, add support for other original arguments of wandb.define_metric described in: https://docs.wandb.ai/ref/python/run#define_metric
(The current
WandbVisBackend
implementation supports only the case of thesummary
metric.)Especially the
step_metric
is helpful for defining the x-axis more flexibly.Specifying the "iter" or "epoch" as an x-axis, we can solve the issue of resumption discussed in issue #1072. I think this PR is a better solution because this does not need additional variables like
auto_step
norlast_step
nor extra logic and uses only the wandb's feature.Furthermore, users can specify different x-axes depending on the metrics in the same experiment. See below:
Modification
To allow the visualizers to use the
iter
as a step, I added aniter
key in the tag of the log_processor.And I added a new argument (
define_metric_cfgs
) in the WandbVisBackend for users to configure the metric.Since all visualizers use the same tag, this affects other non-wandb visualizers' behavior.
I think this modification would not cause any negative effect on other visualizers.
But if it causes something bad, I will add a flag like 'add_iter' in the log_processor for user control to include
iter
or not.BC-breaking (Optional)
As I described above, this PR added a new key, "iter," in the output of the log_processor.
A new plot for the "iter" might appear in some visualization, but I think it does not cause serious problems.
Use cases (Optional)
Expected usage is defining all the metrics used by the experiment:
Providing a standard preset of the define_metric_cfgs as default (ex.,
configs/_base_/wandb_metrics.py
) might be great, but I don't think it is indispensable. It is not difficult, though cumbersome, for users to define metrics individually for a particular project. And defining all possible metrics in advance is a bit difficult because the metrics would change depending on the models and tasks.If the user missed defining some metrics, wandb would use the default step as the x-axis and loose nothing.
We can fix the X-axis later in the Web UI.
Checklist