Use thread-local RNG to generate IDs #839

dae · 2023-10-02T01:15:12Z

We've been exploring dividing our data set up into multiple batches, and training those batches in parallel. I noticed that performance did not scale with core count, and after some digging, found that this was mainly due to the Mutex being used to generate IDs. With the following change, training across 16 cores went from 21s to 4.2s.

thread_rng was previously discussed on #703, but I don't believe that applies here, as this is just used for UUID creation.

Using string UUIDs here is also not the most efficient - would there be any interest in a follow-up PR that uses the bytes directly instead of converting them to a string?

Pull Request Template

Checklist

Confirm that run-checks script has been executed.

I'm getting intermittent flakes with burn_wgpu, that happened even before this change. Maybe my GPU/driver?

---- tests::module_backward::tests::test_embedding_backward stdout ----
thread 'tests::module_backward::tests::test_embedding_backward' panicked at 'assertion failed: `(left == right)`
  left: `Data { value: [NaN, NaN, NaN, NaN, NaN, NaN], shape: Shape { dims: [2, 3] } }`,
 right: `Data { value: [3.0, 9.0, 7.0, 21.0, 35.0, 27.0], shape: Shape { dims: [2, 3] } }`', burn-wgpu/src/lib.rs:48:5

We've been exploring dividing our data set up into multiple batches, and training those batches in parallel. I noticed that performance did not scale with core count, and after some digging, found that this was mainly due to the Mutex being used to generate IDs. With the following change, training across 16 cores went from 21s to 4.2s. thread_rng was previously discussed on tracel-ai#703, but I don't believe that applies here, as this is just used for UUID creation.

louisfd

Hi @dae
Thank you for this contribution. If there is a lot of ID generation, then I can well see how the mutex may affect performance.

I think I can summarize the discussion in #703 as:
-use StdRng if you need to sometimes set the seed deterministically (like for testing)
-use ThreadRng for better performance (no shared mutex), but it will be impossible to set the seed

There are no tests within Burn that rely on the generation of ID being deterministic, but I wonder if it would impact users in the future to not be able to set their seed.

That said, your code is fine, I'm not explicitly approving simply because I'll wait for @nathanielsimard 's input.

antimora

Looks good from my end. I had worked on statically generated seed for no-std use cases. I think your changes still work for this use case.

I also was going to change to thread random but I deferred it for later. Glad you took care of it.

antimora · 2023-10-02T16:29:06Z

burn-common/src/rand.rs

+    Standard: Distribution<T>,
+{
+    use crate::stub::Mutex;
+    static RNG: Mutex<Option<StdRng>> = Mutex::new(None);


Good move to bring the static variable inside the function

antimora · 2023-10-02T16:47:59Z

@dae

Using string UUIDs here is also not the most efficient - would there be any interest in a follow-up PR that uses the bytes directly instead of converting them to a string?

@nathanielsimard and I were discussing to shorten the IDs if possible. One thing to keep in mind, these IDs exported and visible in JSON file.

nathanielsimard

It seems right, we don't care about deterministic ID generation, it has nothing to do with weights and won't impact reproducibility!

I had in mind to change the ID generation game in Burn. The more efficient, the better. We could even have more strategies! UUID is very good when we need to serialize the data (like with parameter ID), but it's way less important for tensor IDs that are only runtime-dependent.

nathanielsimard · 2023-10-02T18:05:22Z

@dae I have the same NaN problem with embedding backward, I'm looking into it!

dae force-pushed the thread-rng branch from e6ccf08 to f2efb88 Compare October 2, 2023 01:30

dae force-pushed the thread-rng branch from f2efb88 to ffb6919 Compare October 2, 2023 01:47

dae mentioned this pull request Oct 2, 2023

Feat/stratified k fold open-spaced-repetition/fsrs-rs#95

Merged

louisfd reviewed Oct 2, 2023

View reviewed changes

antimora reviewed Oct 2, 2023

View reviewed changes

nathanielsimard approved these changes Oct 2, 2023

View reviewed changes

nathanielsimard merged commit e363813 into tracel-ai:main Oct 2, 2023

dae added a commit to open-spaced-repetition/fsrs-rs that referenced this pull request Oct 2, 2023

Update burn for tracel-ai/burn#839

c005230

dae mentioned this pull request Oct 2, 2023

Use a u64 counter for autodiff NodeIDs #843

Merged

1 task

L-M-Sherlock pushed a commit to open-spaced-repetition/fsrs-rs that referenced this pull request Oct 3, 2023

Update burn for tracel-ai/burn#839 (#97)

ec622a0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use thread-local RNG to generate IDs #839

Use thread-local RNG to generate IDs #839

dae commented Oct 2, 2023

louisfd left a comment

antimora left a comment

antimora Oct 2, 2023

antimora commented Oct 2, 2023

nathanielsimard left a comment

nathanielsimard commented Oct 2, 2023

Use thread-local RNG to generate IDs #839

Use thread-local RNG to generate IDs #839

Conversation

dae commented Oct 2, 2023

Pull Request Template

Checklist

louisfd left a comment

Choose a reason for hiding this comment

antimora left a comment

Choose a reason for hiding this comment

antimora Oct 2, 2023

Choose a reason for hiding this comment

antimora commented Oct 2, 2023

nathanielsimard left a comment

Choose a reason for hiding this comment

nathanielsimard commented Oct 2, 2023