-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix GpuUsageLogger to work on different platforms #3008
Conversation
Codecov Report
@@ Coverage Diff @@
## master #3008 +/- ##
========================================
- Coverage 90% 81% -9%
========================================
Files 84 84
Lines 8062 9240 +1178
========================================
+ Hits 7244 7503 +259
- Misses 818 1737 +919 |
d9e0683
to
d5b0496
Compare
This pull request is now in conflict... :( |
c57af7d
to
6908dbf
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the docs should say clearly that this is a callback and it uses the logger of the trainer.
Despite the name, we cannot do Trainer(logger=GpuUsageLogger())
which a user might think it is a real logger.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we should also add a test that checks that something is actually logged, when used in the proper way. Something like this should work (tested locally):
from pytorch_lightning.loggers import CSVLogger
from pytorch_lightning.loggers.csv_logs import ExperimentWriter
import os
@pytest.mark.skipif(not torch.cuda.is_available(), reason="test requires GPU machine")
def test_gpulogger(tmpdir):
gpu_usage = GpuUsageLogger()
logger = CSVLogger(tmpdir)
model = EvalModelTemplate()
trainer = Trainer(
default_root_dir=tmpdir,
callbacks=[gpu_usage],
max_epochs=1,
logger=logger
)
trainer.fit(model)
path_csv = os.path.join(logger.log_dir, ExperimentWriter.NAME_METRICS_FILE)
with open(path_csv, 'r') as fp:
lines = fp.readlines()
header = lines[0].split()
fields = ['GPU_utilization.gpu',
'GPU_memory.used',
'GPU_memory.free',
'GPU_utilization.memory'
]
for f in fields:
assert any([f in h for h in header])
Should it be renamed to |
I would keep it as it is, since the double big U makes it less readable in my opinion. |
or maybe |
I like GPUStatsLogger more than GpuUsageLogger. Another question is whether it should be called a logger or not. I don't have a better suggestion though 👍 |
Yeah, maybe the name should be tweaked so that on should not confuse it with loggers. |
+1 for GPUStatsMonitor or just GPUStats |
This pull request is now in conflict... :( |
I guess it was named |
Yeah, will make changes in few hours and a will do a sep PR for learning rate logger name change. |
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
b82c4a5
to
6da4857
Compare
Ok then, let's merge this and will continue changes on another PR :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rohitgr7 need to ask @williamFalcon because he wants to do refactors which blocks all PRs, but this one is not touching any Trainer stuff so I think it should be fine.
yeah, It's just a callback, completely isolated from those things. |
What does this PR do?
Fixes GpuUsageLogger to work on all platforms.
Also renamed it to
GPUStatsMonitor
: #3008 (comment)Todo:
Before submitting
PR review
Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.
Did you have fun?
Make sure you had fun coding 🙃