Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use thread-local RNG to generate IDs #839

Merged
merged 1 commit into from
Oct 2, 2023

Conversation

dae
Copy link
Contributor

@dae dae commented Oct 2, 2023

We've been exploring dividing our data set up into multiple batches, and training those batches in parallel. I noticed that performance did not scale with core count, and after some digging, found that this was mainly due to the Mutex being used to generate IDs. With the following change, training across 16 cores went from 21s to 4.2s.

thread_rng was previously discussed on #703, but I don't believe that applies here, as this is just used for UUID creation.

Using string UUIDs here is also not the most efficient - would there be any interest in a follow-up PR that uses the bytes directly instead of converting them to a string?

Pull Request Template

Checklist

  • Confirm that run-checks script has been executed.

I'm getting intermittent flakes with burn_wgpu, that happened even before this change. Maybe my GPU/driver?

---- tests::module_backward::tests::test_embedding_backward stdout ----
thread 'tests::module_backward::tests::test_embedding_backward' panicked at 'assertion failed: `(left == right)`
  left: `Data { value: [NaN, NaN, NaN, NaN, NaN, NaN], shape: Shape { dims: [2, 3] } }`,
 right: `Data { value: [3.0, 9.0, 7.0, 21.0, 35.0, 27.0], shape: Shape { dims: [2, 3] } }`', burn-wgpu/src/lib.rs:48:5

We've been exploring dividing our data set up into multiple batches,
and training those batches in parallel. I noticed that performance did
not scale with core count, and after some digging, found that this was
mainly due to the Mutex being used to generate IDs. With the following
change, training across 16 cores went from 21s to 4.2s.

thread_rng was previously discussed on tracel-ai#703, but I don't believe that
applies here, as this is just used for UUID creation.
Copy link
Member

@louisfd louisfd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @dae
Thank you for this contribution. If there is a lot of ID generation, then I can well see how the mutex may affect performance.

I think I can summarize the discussion in #703 as:
-use StdRng if you need to sometimes set the seed deterministically (like for testing)
-use ThreadRng for better performance (no shared mutex), but it will be impossible to set the seed

There are no tests within Burn that rely on the generation of ID being deterministic, but I wonder if it would impact users in the future to not be able to set their seed.

That said, your code is fine, I'm not explicitly approving simply because I'll wait for @nathanielsimard 's input.

Copy link
Collaborator

@antimora antimora left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good from my end. I had worked on statically generated seed for no-std use cases. I think your changes still work for this use case.

I also was going to change to thread random but I deferred it for later. Glad you took care of it.

Standard: Distribution<T>,
{
use crate::stub::Mutex;
static RNG: Mutex<Option<StdRng>> = Mutex::new(None);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good move to bring the static variable inside the function

@antimora
Copy link
Collaborator

antimora commented Oct 2, 2023

@dae

Using string UUIDs here is also not the most efficient - would there be any interest in a follow-up PR that uses the bytes directly instead of converting them to a string?

@nathanielsimard and I were discussing to shorten the IDs if possible. One thing to keep in mind, these IDs exported and visible in JSON file.

Copy link
Member

@nathanielsimard nathanielsimard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems right, we don't care about deterministic ID generation, it has nothing to do with weights and won't impact reproducibility!

I had in mind to change the ID generation game in Burn. The more efficient, the better. We could even have more strategies! UUID is very good when we need to serialize the data (like with parameter ID), but it's way less important for tensor IDs that are only runtime-dependent.

@nathanielsimard
Copy link
Member

@dae I have the same NaN problem with embedding backward, I'm looking into it!

@nathanielsimard nathanielsimard merged commit e363813 into tracel-ai:main Oct 2, 2023
dae added a commit to open-spaced-repetition/fsrs-rs that referenced this pull request Oct 2, 2023
@dae dae mentioned this pull request Oct 2, 2023
1 task
L-M-Sherlock pushed a commit to open-spaced-repetition/fsrs-rs that referenced this pull request Oct 3, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants