Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

URL fixes and URL check fixes #2692

Merged
merged 28 commits into from
Sep 24, 2024
Merged

Conversation

asumagic
Copy link
Collaborator

@asumagic asumagic commented Sep 19, 2024

What does this PR do?

Fixes #2655

Changes:

  • Only files indexed by Git are now read. This is more in line with what the CI does and excludes a lot of temporary files.
  • Added a file exclusion regex.
  • Added a line exclusion regex.
  • URL checks now cover YAML files and Jupyter Notebooks, including tutorials.
  • Fixed certain URLs that were genuinely down/moved.
    • Some were marked to be ignored, with a notice for the user.
    • Some were updated to their new location, if possible.
    • Some dead domains were made to point to the latest known copy to the Web Archive instead.
  • Fixed some false positives.
  • Reworked the URL checking script a bit.
  • Sped up the tests by allowing some parallelization in the network requests and slightly relax the delay between requests.

Remaining changes:

  • Fix whatever the hell has happened to testing-refactoring and hf-interface-testing
Before submitting
  • Did you read the contributor guideline?
  • Did you make sure your PR does only one thing, instead of bundling different changes together?
  • Did you make sure to update the documentation with your changes? (if necessary)
  • Did you write any new necessary tests? (not for typos and docs)
  • Did you verify new and existing tests pass locally with your changes?
  • Did you list all the breaking changes introduced by this pull request?
  • Does your code adhere to project-specific code style and conventions?

PR review

Reviewer checklist
  • Is this pull request ready for review? (if not, please submit in draft mode)
  • Check that all items from Before submitting are resolved
  • Make sure the title is self-explanatory and the description concisely explains the PR
  • Add labels and milestones (and optionally projects) to the PR so it can be classified
  • Confirm that the changes adhere to compatibility requirements (e.g., Python version, platform)
  • Review the self-review checklist to ensure the code is ready for review

@asumagic asumagic added this to the v1.1.0 milestone Sep 19, 2024
This AWS bucket link is used in `recipes/ESC50` and is linked to in http://wham.whisper.ai/ which seems to be treated as the official source for that dataset elsewhere in SB.

Not going to bother moving it to `fetch`.
I am not sure what this is actually intended to point to, as the URL seemingly referred to `torchvision`. Removing for now...
This is the actual README, i'll avoid tampering with it beyond adding a notice. Better indicate to the user the link might be dead.
This fixes the URL check for `https://sail.usc.edu`. We don't really care about the MITM risk here as we do nothing with the data.
@asumagic
Copy link
Collaborator Author

asumagic commented Sep 19, 2024

Remaining issues (in theory):

WARNING: https://github.com/speechbrain/speechbrain/tree/testing-refactoring/updates_pretrained_models is DOWN! got 404
         link appears in docs/guidance.md
WARNING: https://github.com/speechbrain/speechbrain/blob/hf-interface-testing/updates_pretrained_models/asr-wav2vec2-librispeech/test.yaml is DOWN! got 404
         link appears in tests/utils/README.md
WARNING: https://github.com/speechbrain/speechbrain/tree/hf-interface-testing/updates_pretrained_models/asr-wav2vec2-librispeech is DOWN! got 404
         link appears in tests/utils/README.md
WARNING: https://github.com/speechbrain/speechbrain/blob/hf-interface-testing/updates_pretrained_models/emotion-recognition-wav2vec2-IEMOCAP/test.yaml is DOWN! got 404
         link appears in tests/utils/README.md
WARNING: https://github.com/speechbrain/speechbrain/blob/hf-interface-testing/updates_pretrained_models/ssl-wav2vec2-base-librispeech/test.yaml is DOWN! got 404
         link appears in tests/utils/README.md
WARNING: https://github.com/speechbrain/speechbrain/tree/hf-interface-testing/updates_pretrained_models is DOWN! got 404
         link appears in tests/utils/README.md
WARNING: https://github.com/speechbrain/speechbrain/tree/hf-interface-testing is DOWN! got 404
         link appears in tests/utils/refactoring_checks.py
TEST FAILED!

These are only references to testing-refactoring and hf-interface-testing branches on the main SpeechBrain repo. Either those branches have never existed, or someone incorrectly nuked from the repository at some point during some branch cleanup... I don't seem to have a copy since I never had to clone them, I think. Wait no, I might have. I will look into it later. I suspect they haven't been used in a while if ever, though...

@asumagic
Copy link
Collaborator Author

Copies of the branches:

For now I will just remove references to them, they didn't end up seeing a lot of use...

@asumagic
Copy link
Collaborator Author

Ok, I will take on the docs issue separately. Considering this ready to review with the above caveat. All other URLs now work!

@asumagic asumagic marked this pull request as ready for review September 20, 2024 11:24
Copy link
Collaborator

@TParcollet TParcollet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@TParcollet TParcollet merged commit 4e62cf9 into speechbrain:develop Sep 24, 2024
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

URL checks: false positives, filtering and fixes to do
2 participants