Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

repair CI for Win #2358

Merged
merged 46 commits into from
Jun 27, 2020
Merged

repair CI for Win #2358

merged 46 commits into from
Jun 27, 2020

Conversation

Borda
Copy link
Member

@Borda Borda commented Jun 25, 2020

What does this PR do?

it seems like Win CI was falling but the checks were passed
https://github.com/PyTorchLightning/pytorch-lightning/pull/2356/checks?check_run_id=805809911

The issue was reported in nedbat/coveragepy#1003
Fixes #2347
Fixes #2357

Before submitting

  • Was this discussed/approved via a Github issue? (no need for typos and docs improvements)
  • Did you read the contributor guideline, Pull Request section?
  • Did you make sure your PR does only one thing, instead of bundling different changes together? Otherwise, we ask you create a separate PR for every change.
  • Did you make sure to update the documentation with your changes?
  • Did you write any new necessary tests?
  • Did you verify new and existing tests pass locally with your changes?
  • If you made a notable change (that affects users), did you update the CHANGELOG?

PR review

Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

Did you have fun?

Make sure you had fun coding 🙃

@Borda Borda added the ci Continuous Integration label Jun 25, 2020
@mergify mergify bot requested a review from a team June 25, 2020 07:51
@codecov
Copy link

codecov bot commented Jun 25, 2020

Codecov Report

Merging #2358 into master will decrease coverage by 0%.
The diff coverage is 20%.

@@          Coverage Diff           @@
##           master   #2358   +/-   ##
======================================
- Coverage      88%     88%   -0%     
======================================
  Files          70      70           
  Lines        5555    5567   +12     
======================================
+ Hits         4889    4893    +4     
- Misses        666     674    +8     

@Borda
Copy link
Member Author

Borda commented Jun 25, 2020

@awaelchli, failing for Windows:

C:\hostedtoolcache\windows\Python\3.6.8\x64\lib\site-packages\py\_path\local.py:704: in pyimport
    __import__(modname)
pytorch_lightning\metrics\__init__.py:1: in <module>
    from pytorch_lightning.metrics.converters import numpy_metric, tensor_metric
pytorch_lightning\metrics\converters.py:220: in <module>
    reduce_op: Optional[torch.distributed.ReduceOp] = None,
E   AttributeError: module 'torch.distributed' has no attribute 'ReduceOp'

@Borda Borda added the bug Something isn't working label Jun 25, 2020
@awaelchli awaelchli linked an issue Jun 25, 2020 that may be closed by this pull request
@pep8speaks
Copy link

pep8speaks commented Jun 25, 2020

Hello @Borda! Thanks for updating this PR.

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2020-06-26 22:50:53 UTC

@justusschock
Copy link
Member

@Borda the distributed part is fixed. only the horovod part still persists

@Borda
Copy link
Member Author

Borda commented Jun 25, 2020

@tgaddair we may have some problem with Horovod on Win, is it supported?
Otherwise, we may skip it for now :] @justusschock ^^ @awaelchli

@Borda Borda marked this pull request as ready for review June 25, 2020 14:09
@Borda Borda added the priority: 0 High priority task label Jun 25, 2020
Copy link
Member

@justusschock justusschock left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the horovod test, don't we also have tp either skip it or change the backend and remove the horovod size asserts?

@mergify mergify bot requested a review from a team June 25, 2020 14:24
@tgaddair
Copy link
Contributor

Hey @Borda, Horovod is not supported on Windows. Thanks for checking!

@Borda
Copy link
Member Author

Borda commented Jun 25, 2020

@tgaddair this is very strange as it is in try/except

tests\models\data\horovod\train_default_model.py:25: in <module>
    import horovod.torch as hvd
E   ModuleNotFoundError: No module named 'horovod

@Borda
Copy link
Member Author

Borda commented Jun 25, 2020

@awaelchli @justusschock @PyTorchLightning/core-contributors any idea why it fails even during the test collection before running any test? just for Windows... :/

@Borda Borda requested a review from justusschock June 25, 2020 22:05
@Borda
Copy link
Member Author

Borda commented Jun 25, 2020

fixed the failing but for some reason, the Win CI is super slow...

@awaelchli
Copy link
Contributor

The tests became slow for me since the ddp mode was switched from spawn. I'm pretty sure that's it.

@Borda
Copy link
Member Author

Borda commented Jun 26, 2020

The tests became slow for me since the ddp mode was switched from spawn. I'm pretty sure that's it.

so shall we test by default ddp_spawn?

@williamFalcon
Copy link
Contributor

sure that would work. but we need at least one ddp test.

@awaelchli
Copy link
Contributor

awaelchli commented Jun 26, 2020

try num_workers=0 in tests. it will speed up on windows. but probably on linux and mac it will be slower then.
this is an empirical finding, i can not explain why num_workers=0 on windows is faster :)

@Borda
Copy link
Member Author

Borda commented Jun 26, 2020

sure that would work. but we need at least one ddp test.

it runs all tests for all ddp as before just replace the default one, so all functionality is tested

@Borda Borda added this to the 0.9.0 milestone Jun 26, 2020
@awaelchli awaelchli changed the title repaire CI for Win repair CI for Win Jun 26, 2020
Borda and others added 4 commits June 27, 2020 00:29
…on Windows (#2294)

* Fix load_from_checkpoint() not working with URL on Windows

* Update CHANGELOG

* Update CHANGELOG.md

Co-authored-by: Peter Yu <2057325+yukw777@users.noreply.github.com>

* Apply suggestions from code review

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Peter Yu <2057325+yukw777@users.noreply.github.com>
Copy link
Contributor

@awaelchli awaelchli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this was a tough one :)

@mergify mergify bot requested a review from a team June 26, 2020 22:51
@Borda Borda added the ready PRs ready to be merged label Jun 26, 2020
@williamFalcon williamFalcon merged commit f1c9693 into master Jun 27, 2020
@awaelchli awaelchli deleted the tests/ci-win branch June 27, 2020 09:46
@yukw777
Copy link
Contributor

yukw777 commented Jun 27, 2020

@awaelchli, sorry for the late response! I’ve been traveling for the past few days and just saw this. It seems like you guys figured it out though. What was the issue with Windows?

@@ -48,7 +48,7 @@ def translate_path(self, path):
class ThreadingHTTPServer(ThreadingMixIn, HTTPServer):
daemon_threads = True

with ThreadingHTTPServer(('', 0), Handler) as server:
with ThreadingHTTPServer(('localhost', 0), Handler) as server:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah man!!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hey no problem.
yep that was it, Windows didn't like the 0.0.0.0 address :)

@Borda
Copy link
Member Author

Borda commented Jun 27, 2020

There were a couple of them as all Win tests were ignored for a few weeks or months...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working ci Continuous Integration priority: 0 High priority task ready PRs ready to be merged
Projects
None yet
Development

Successfully merging this pull request may close these issues.

accuracy metric dosen't support windows Error when importing metrics on Windows without DDP support
8 participants