Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] Preproc for datasets #1989

Merged
merged 34 commits into from
Mar 18, 2024
Merged

[Feature] Preproc for datasets #1989

merged 34 commits into from
Mar 18, 2024

Conversation

vmoens
Copy link
Contributor

@vmoens vmoens commented Mar 4, 2024

This PR allows to preprocess datasets once and for all.

One can use transforms of any kind (function or regular transforms).

TODO:

  • Complete tests for other datasets
  • Document the feature
  • Consider allowing non-pickable functions
  • download=True must erase the previous dataset

cc @nicklashansen: I have a couple of questions for you:

  • Do you think it makes sense to replace the dataset, or should we leave the option of not replacing it by default? For instance we could have a dest argument in preprocess that would default to dataset_path.
  • If the dataset is replaced, you can get the original by doing MyDataSet(..., download="force"). Does that seem intuitive?
  • Does this feature satisfy your needs? Have a look at TensorDict.map to see the available options, which is essentially what we're using here.

Example usage with OpenX:

from torchrl.envs import Compose, RenameTransform, Resize

dataset = OpenXExperienceReplay(
    "cmu_stretch",
    download="force",
    streaming=False,
    batch_size=64,
    shuffle=True,
    num_slices=8,
    slice_len=None,
)

t = Compose(
    Resize(
        64,
        64,
        in_keys=[("observation", "image"), ("next", "observation", "image")],
    ),
    RenameTransform(
        in_keys=[
            ("observation", "image"),
            ("next", "observation", "image"),
            ("observation", "state"),
            ("next", "observation", "state"),
        ],
        out_keys=["pixels", ("next", "pixels"), "state", ("next", "state")],
    ),
)

def fn(data: TensorDict):
    data.unlock_()
    data = data.select(
        "action",
        "done",
        "episode",
        ("next", "done"),
        ("next", "observation"),
        ("next", "reward"),
        ("next", "terminated"),
        ("next", "truncated"),
        "observation",
        "terminated",
        "truncated",
    )
    data = t(data)
    data = data.select(*data.keys(True, True))
    return data

dataset.preprocess(
    CloudpickleWrapper(fn),
    num_workers=max(1, os.cpu_count() - 2),
    num_chunks=500,
    mp_start_method="fork",
)
sample = dataset.sample(32)
assert "observation" not in sample.keys()
assert "pixels" in sample.keys()
assert ("next", "pixels") in sample.keys(True)
assert "state" in sample.keys()
assert ("next", "state") in sample.keys(True)
assert sample["pixels"].shape == torch.Size([32, 3, 64, 64])
dataset = OpenXExperienceReplay(
    "cmu_stretch",
    download=True,
    streaming=False,
    batch_size=64,
    shuffle=True,
    num_slices=8,
    slice_len=None,
)
sample = dataset.sample(32)
assert "observation" not in sample.keys()
assert "pixels" in sample.keys()
assert ("next", "pixels") in sample.keys(True)
assert "state" in sample.keys()
assert ("next", "state") in sample.keys(True)

After the transform, the data looks like this

 OpenXExperienceReplay(
    storage=TensorStorage(
        data=TensorDict(
            fields={
                action: MemoryMappedTensor(shape=torch.Size([25016, 8]), device=cpu, dtype=torch.float64, is_shared=True),
                done: MemoryMappedTensor(shape=torch.Size([25016, 1]), device=cpu, dtype=torch.bool, is_shared=True),
                episode: MemoryMappedTensor(shape=torch.Size([25016]), device=cpu, dtype=torch.int32, is_shared=True),
                image: MemoryMappedTensor(shape=torch.Size([25016, 3, 64, 64]), device=cpu, dtype=torch.uint8, is_shared=True),
                next: TensorDict(
                    fields={
                        done: MemoryMappedTensor(shape=torch.Size([25016, 1]), device=cpu, dtype=torch.bool, is_shared=True),
                        image: MemoryMappedTensor(shape=torch.Size([25016, 3, 64, 64]), device=cpu, dtype=torch.uint8, is_shared=True),
                        reward: MemoryMappedTensor(shape=torch.Size([25016, 1]), device=cpu, dtype=torch.float32, is_shared=True),
                        state: MemoryMappedTensor(shape=torch.Size([25016, 4]), device=cpu, dtype=torch.float64, is_shared=True),
                        terminated: MemoryMappedTensor(shape=torch.Size([25016, 1]), device=cpu, dtype=torch.bool, is_shared=True),
                        truncated: MemoryMappedTensor(shape=torch.Size([25016, 1]), device=cpu, dtype=torch.bool, is_shared=True)},
                    batch_size=torch.Size([25016]),
                    device=cpu,
                    is_shared=False),
                state: MemoryMappedTensor(shape=torch.Size([25016, 4]), device=cpu, dtype=torch.float64, is_shared=True),
                terminated: MemoryMappedTensor(shape=torch.Size([25016, 1]), device=cpu, dtype=torch.bool, is_shared=True),
                truncated: MemoryMappedTensor(shape=torch.Size([25016, 1]), device=cpu, dtype=torch.bool, is_shared=True)},
            batch_size=torch.Size([25016]),
            device=cpu,
            is_shared=False), 
        shape=torch.Size([25016]), 
        len=25016, 
        max_size=25016), 
    sampler=SliceSampler(num_slices=8, slice_len=None, end_key=('next', 'done'), traj_key=episode, truncated_key=('next', 'truncated'), strict_length=True), 
    writer=ImmutableDatasetWriter(), 
    batch_size=64, 
    collate_fn=<function _collate_id at 0x123f3f4c0>)

Copy link

pytorch-bot bot commented Mar 4, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/rl/1989

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure, 2 Unrelated Failures

As of commit 04dce54 with merge base 87f3437 (image):

NEW FAILURE - The following job has failed:

FLAKY - The following job failed but was likely due to flakiness present on trunk:

BROKEN TRUNK - The following job failed but was present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Mar 4, 2024
@vmoens vmoens added Data Data-related PR, will launch data-related jobs and removed CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. labels Mar 4, 2024
Copy link

github-actions bot commented Mar 4, 2024

$\color{#D29922}\textsf{\Large&amp;#x26A0;\kern{0.2cm}\normalsize Warning}$ Result of CPU Benchmark Tests

Total Benchmarks: 91. Improved: $\large\color{#35bf28}10$. Worsened: $\large\color{#d91a1a}7$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_single 54.2997ms 53.7852ms 18.5925 Ops/s 17.3667 Ops/s $\textbf{\color{#35bf28}+7.06\%}$
test_sync 34.0925ms 29.2683ms 34.1667 Ops/s 31.7049 Ops/s $\textbf{\color{#35bf28}+7.76\%}$
test_async 59.1840ms 28.9558ms 34.5354 Ops/s 37.4065 Ops/s $\textbf{\color{#d91a1a}-7.68\%}$
test_simple 0.3895s 0.3357s 2.9793 Ops/s 2.9899 Ops/s $\color{#d91a1a}-0.36\%$
test_transformed 0.5271s 0.4739s 2.1103 Ops/s 2.0511 Ops/s $\color{#35bf28}+2.89\%$
test_serial 1.2273s 1.1832s 0.8452 Ops/s 0.8252 Ops/s $\color{#35bf28}+2.43\%$
test_parallel 1.0754s 1.0263s 0.9743 Ops/s 0.9594 Ops/s $\color{#35bf28}+1.55\%$
test_step_mdp_speed[True-True-True-True-True] 0.1153ms 21.0809μs 47.4363 KOps/s 47.2412 KOps/s $\color{#35bf28}+0.41\%$
test_step_mdp_speed[True-True-True-True-False] 33.6830μs 13.0248μs 76.7767 KOps/s 75.5557 KOps/s $\color{#35bf28}+1.62\%$
test_step_mdp_speed[True-True-True-False-True] 36.5890μs 12.3387μs 81.0456 KOps/s 79.5258 KOps/s $\color{#35bf28}+1.91\%$
test_step_mdp_speed[True-True-True-False-False] 37.3700μs 7.5412μs 132.6051 KOps/s 128.5434 KOps/s $\color{#35bf28}+3.16\%$
test_step_mdp_speed[True-True-False-True-True] 57.8870μs 22.7339μs 43.9872 KOps/s 43.5012 KOps/s $\color{#35bf28}+1.12\%$
test_step_mdp_speed[True-True-False-True-False] 39.8850μs 14.2459μs 70.1959 KOps/s 68.9665 KOps/s $\color{#35bf28}+1.78\%$
test_step_mdp_speed[True-True-False-False-True] 36.7790μs 13.5388μs 73.8616 KOps/s 72.5068 KOps/s $\color{#35bf28}+1.87\%$
test_step_mdp_speed[True-True-False-False-False] 30.6170μs 8.7045μs 114.8836 KOps/s 112.3330 KOps/s $\color{#35bf28}+2.27\%$
test_step_mdp_speed[True-False-True-True-True] 0.1039ms 23.6092μs 42.3564 KOps/s 41.1577 KOps/s $\color{#35bf28}+2.91\%$
test_step_mdp_speed[True-False-True-True-False] 41.6070μs 15.2456μs 65.5926 KOps/s 63.7281 KOps/s $\color{#35bf28}+2.93\%$
test_step_mdp_speed[True-False-True-False-True] 35.7970μs 13.3570μs 74.8671 KOps/s 72.2604 KOps/s $\color{#35bf28}+3.61\%$
test_step_mdp_speed[True-False-True-False-False] 37.2390μs 8.6498μs 115.6101 KOps/s 113.6318 KOps/s $\color{#35bf28}+1.74\%$
test_step_mdp_speed[True-False-False-True-True] 52.6080μs 24.7808μs 40.3538 KOps/s 39.5775 KOps/s $\color{#35bf28}+1.96\%$
test_step_mdp_speed[True-False-False-True-False] 43.1410μs 16.3713μs 61.0825 KOps/s 59.2397 KOps/s $\color{#35bf28}+3.11\%$
test_step_mdp_speed[True-False-False-False-True] 40.0750μs 14.5574μs 68.6935 KOps/s 67.0363 KOps/s $\color{#35bf28}+2.47\%$
test_step_mdp_speed[True-False-False-False-False] 30.0960μs 9.7334μs 102.7389 KOps/s 98.6640 KOps/s $\color{#35bf28}+4.13\%$
test_step_mdp_speed[False-True-True-True-True] 49.8730μs 23.5996μs 42.3736 KOps/s 41.8132 KOps/s $\color{#35bf28}+1.34\%$
test_step_mdp_speed[False-True-True-True-False] 50.8750μs 15.2467μs 65.5878 KOps/s 63.7620 KOps/s $\color{#35bf28}+2.86\%$
test_step_mdp_speed[False-True-True-False-True] 65.9610μs 15.5368μs 64.3632 KOps/s 62.3148 KOps/s $\color{#35bf28}+3.29\%$
test_step_mdp_speed[False-True-True-False-False] 33.7830μs 9.8046μs 101.9931 KOps/s 99.4591 KOps/s $\color{#35bf28}+2.55\%$
test_step_mdp_speed[False-True-False-True-True] 55.8240μs 25.0787μs 39.8744 KOps/s 39.7428 KOps/s $\color{#35bf28}+0.33\%$
test_step_mdp_speed[False-True-False-True-False] 48.8910μs 16.2979μs 61.3577 KOps/s 59.2606 KOps/s $\color{#35bf28}+3.54\%$
test_step_mdp_speed[False-True-False-False-True] 44.0220μs 16.7552μs 59.6830 KOps/s 58.2532 KOps/s $\color{#35bf28}+2.45\%$
test_step_mdp_speed[False-True-False-False-False] 39.0030μs 10.9163μs 91.6063 KOps/s 88.4127 KOps/s $\color{#35bf28}+3.61\%$
test_step_mdp_speed[False-False-True-True-True] 66.4940μs 25.8561μs 38.6755 KOps/s 37.6484 KOps/s $\color{#35bf28}+2.73\%$
test_step_mdp_speed[False-False-True-True-False] 42.5290μs 17.5965μs 56.8293 KOps/s 54.6928 KOps/s $\color{#35bf28}+3.91\%$
test_step_mdp_speed[False-False-True-False-True] 42.3180μs 16.6939μs 59.9021 KOps/s 57.9330 KOps/s $\color{#35bf28}+3.40\%$
test_step_mdp_speed[False-False-True-False-False] 37.1990μs 10.9415μs 91.3952 KOps/s 87.3939 KOps/s $\color{#35bf28}+4.58\%$
test_step_mdp_speed[False-False-False-True-True] 59.2510μs 26.8121μs 37.2966 KOps/s 36.4367 KOps/s $\color{#35bf28}+2.36\%$
test_step_mdp_speed[False-False-False-True-False] 40.7660μs 18.6266μs 53.6867 KOps/s 52.2497 KOps/s $\color{#35bf28}+2.75\%$
test_step_mdp_speed[False-False-False-False-True] 57.7680μs 17.7015μs 56.4925 KOps/s 54.8069 KOps/s $\color{#35bf28}+3.08\%$
test_step_mdp_speed[False-False-False-False-False] 36.7880μs 11.8198μs 84.6038 KOps/s 80.1569 KOps/s $\textbf{\color{#35bf28}+5.55\%}$
test_values[generalized_advantage_estimate-True-True] 10.2246ms 9.1069ms 109.8064 Ops/s 94.2445 Ops/s $\textbf{\color{#35bf28}+16.51\%}$
test_values[vec_generalized_advantage_estimate-True-True] 36.9512ms 35.2511ms 28.3679 Ops/s 29.5850 Ops/s $\color{#d91a1a}-4.11\%$
test_values[td0_return_estimate-False-False] 0.1939ms 0.1662ms 6.0152 KOps/s 5.7894 KOps/s $\color{#35bf28}+3.90\%$
test_values[td1_return_estimate-False-False] 23.4976ms 22.3260ms 44.7907 Ops/s 42.4968 Ops/s $\textbf{\color{#35bf28}+5.40\%}$
test_values[vec_td1_return_estimate-False-False] 37.0953ms 35.3412ms 28.2956 Ops/s 29.8106 Ops/s $\textbf{\color{#d91a1a}-5.08\%}$
test_values[td_lambda_return_estimate-True-False] 44.2602ms 32.5437ms 30.7279 Ops/s 29.3498 Ops/s $\color{#35bf28}+4.70\%$
test_values[vec_td_lambda_return_estimate-True-False] 38.1390ms 35.3764ms 28.2674 Ops/s 29.9119 Ops/s $\textbf{\color{#d91a1a}-5.50\%}$
test_gae_speed[generalized_advantage_estimate-False-1-512] 11.0848ms 8.0578ms 124.1031 Ops/s 124.0772 Ops/s $\color{#35bf28}+0.02\%$
test_gae_speed[vec_generalized_advantage_estimate-True-1-512] 2.7778ms 1.9315ms 517.7418 Ops/s 504.6309 Ops/s $\color{#35bf28}+2.60\%$
test_gae_speed[vec_generalized_advantage_estimate-False-1-512] 0.4561ms 0.3418ms 2.9254 KOps/s 2.8460 KOps/s $\color{#35bf28}+2.79\%$
test_gae_speed[vec_generalized_advantage_estimate-True-32-512] 51.2785ms 47.7669ms 20.9350 Ops/s 24.6390 Ops/s $\textbf{\color{#d91a1a}-15.03\%}$
test_gae_speed[vec_generalized_advantage_estimate-False-32-512] 4.1790ms 3.0320ms 329.8180 Ops/s 328.6655 Ops/s $\color{#35bf28}+0.35\%$
test_dqn_speed 1.6232ms 1.3396ms 746.5171 Ops/s 731.8115 Ops/s $\color{#35bf28}+2.01\%$
test_ddpg_speed 3.3951ms 2.6869ms 372.1694 Ops/s 373.5710 Ops/s $\color{#d91a1a}-0.38\%$
test_sac_speed 9.0660ms 8.1292ms 123.0129 Ops/s 121.2870 Ops/s $\color{#35bf28}+1.42\%$
test_redq_speed 15.7570ms 13.0662ms 76.5331 Ops/s 76.4217 Ops/s $\color{#35bf28}+0.15\%$
test_redq_deprec_speed 15.5111ms 13.4171ms 74.5317 Ops/s 76.7253 Ops/s $\color{#d91a1a}-2.86\%$
test_td3_speed 8.4964ms 8.1117ms 123.2781 Ops/s 121.3793 Ops/s $\color{#35bf28}+1.56\%$
test_cql_speed 0.1166s 39.2764ms 25.4606 Ops/s 27.7061 Ops/s $\textbf{\color{#d91a1a}-8.10\%}$
test_a2c_speed 8.4009ms 7.3821ms 135.4626 Ops/s 136.9662 Ops/s $\color{#d91a1a}-1.10\%$
test_ppo_speed 9.1072ms 7.6066ms 131.4646 Ops/s 131.7554 Ops/s $\color{#d91a1a}-0.22\%$
test_reinforce_speed 7.5396ms 6.5369ms 152.9778 Ops/s 153.0060 Ops/s $\color{#d91a1a}-0.02\%$
test_iql_speed 33.8092ms 32.2324ms 31.0247 Ops/s 30.7649 Ops/s $\color{#35bf28}+0.84\%$
test_rb_sample[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 3.2424ms 2.1533ms 464.4127 Ops/s 445.8014 Ops/s $\color{#35bf28}+4.17\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 0.8619ms 0.4973ms 2.0107 KOps/s 2.0051 KOps/s $\color{#35bf28}+0.28\%$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 0.7501ms 0.4683ms 2.1353 KOps/s 2.1138 KOps/s $\color{#35bf28}+1.01\%$
test_rb_sample[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 2.4293ms 2.1189ms 471.9487 Ops/s 444.5872 Ops/s $\textbf{\color{#35bf28}+6.15\%}$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 0.9625ms 0.4866ms 2.0552 KOps/s 2.0425 KOps/s $\color{#35bf28}+0.62\%$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 0.6467ms 0.4609ms 2.1698 KOps/s 2.1212 KOps/s $\color{#35bf28}+2.29\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-sampler6-10000] 1.7443ms 1.2615ms 792.7174 Ops/s 779.4177 Ops/s $\color{#35bf28}+1.71\%$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-sampler7-10000] 1.3495ms 1.1912ms 839.5239 Ops/s 823.3787 Ops/s $\color{#35bf28}+1.96\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 2.4163ms 2.2052ms 453.4644 Ops/s 426.4849 Ops/s $\textbf{\color{#35bf28}+6.33\%}$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 86.8364ms 0.6776ms 1.4757 KOps/s 1.6322 KOps/s $\textbf{\color{#d91a1a}-9.59\%}$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 0.7755ms 0.5800ms 1.7242 KOps/s 1.7021 KOps/s $\color{#35bf28}+1.30\%$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 4.4020ms 2.2112ms 452.2379 Ops/s 446.2576 Ops/s $\color{#35bf28}+1.34\%$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 0.9393ms 0.4933ms 2.0274 KOps/s 1.9739 KOps/s $\color{#35bf28}+2.71\%$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 0.6365ms 0.4703ms 2.1262 KOps/s 2.1153 KOps/s $\color{#35bf28}+0.52\%$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 2.4728ms 2.1003ms 476.1300 Ops/s 432.6131 Ops/s $\textbf{\color{#35bf28}+10.06\%}$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 0.5902ms 0.4852ms 2.0609 KOps/s 2.0149 KOps/s $\color{#35bf28}+2.28\%$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 3.5199ms 0.4652ms 2.1496 KOps/s 2.1380 KOps/s $\color{#35bf28}+0.54\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 2.3881ms 2.2339ms 447.6460 Ops/s 430.3738 Ops/s $\color{#35bf28}+4.01\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 1.1992ms 0.6370ms 1.5698 KOps/s 1.6412 KOps/s $\color{#d91a1a}-4.35\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 1.2675ms 0.5848ms 1.7100 KOps/s 1.6815 KOps/s $\color{#35bf28}+1.70\%$
test_rb_populate[TensorDictReplayBuffer-ListStorage-RandomSampler-400] 0.1007s 7.1922ms 139.0398 Ops/s 141.1811 Ops/s $\color{#d91a1a}-1.52\%$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-400] 13.8771ms 11.8699ms 84.2464 Ops/s 84.1626 Ops/s $\color{#35bf28}+0.10\%$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-400] 1.5908ms 1.0663ms 937.7957 Ops/s 984.1349 Ops/s $\color{#d91a1a}-4.71\%$
test_rb_populate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-400] 86.1089ms 5.2775ms 189.4845 Ops/s 141.6502 Ops/s $\textbf{\color{#35bf28}+33.77\%}$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-400] 13.7299ms 11.7860ms 84.8462 Ops/s 83.6996 Ops/s $\color{#35bf28}+1.37\%$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-400] 3.7036ms 1.1042ms 905.6449 Ops/s 940.6442 Ops/s $\color{#d91a1a}-3.72\%$
test_rb_populate[TensorDictPrioritizedReplayBuffer-ListStorage-None-400] 85.4454ms 7.2222ms 138.4611 Ops/s 179.7493 Ops/s $\textbf{\color{#d91a1a}-22.97\%}$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-400] 14.2251ms 12.0922ms 82.6982 Ops/s 71.9244 Ops/s $\textbf{\color{#35bf28}+14.98\%}$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-400] 3.9706ms 1.4064ms 711.0472 Ops/s 700.3141 Ops/s $\color{#35bf28}+1.53\%$

Copy link

github-actions bot commented Mar 4, 2024

$\color{#D29922}\textsf{\Large&amp;#x26A0;\kern{0.2cm}\normalsize Warning}$ Result of GPU Benchmark Tests

Total Benchmarks: 94. Improved: $\large\color{#35bf28}4$. Worsened: $\large\color{#d91a1a}6$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_single 0.1048s 0.1034s 9.6720 Ops/s 9.2056 Ops/s $\textbf{\color{#35bf28}+5.07\%}$
test_sync 92.4024ms 90.2660ms 11.0784 Ops/s 11.0936 Ops/s $\color{#d91a1a}-0.14\%$
test_async 0.1692s 84.9766ms 11.7679 Ops/s 13.7584 Ops/s $\textbf{\color{#d91a1a}-14.47\%}$
test_single_pixels 0.1124s 0.1120s 8.9248 Ops/s 8.7868 Ops/s $\color{#35bf28}+1.57\%$
test_sync_pixels 68.2384ms 66.2889ms 15.0855 Ops/s 14.8713 Ops/s $\color{#35bf28}+1.44\%$
test_async_pixels 0.1222s 55.4626ms 18.0302 Ops/s 17.7515 Ops/s $\color{#35bf28}+1.57\%$
test_simple 0.6873s 0.6773s 1.4764 Ops/s 1.4630 Ops/s $\color{#35bf28}+0.92\%$
test_transformed 0.9081s 0.9074s 1.1020 Ops/s 1.1163 Ops/s $\color{#d91a1a}-1.28\%$
test_serial 2.2030s 2.1902s 0.4566 Ops/s 0.4717 Ops/s $\color{#d91a1a}-3.21\%$
test_parallel 1.8644s 1.8119s 0.5519 Ops/s 0.5459 Ops/s $\color{#35bf28}+1.10\%$
test_step_mdp_speed[True-True-True-True-True] 0.1033ms 33.7518μs 29.6281 KOps/s 29.7495 KOps/s $\color{#d91a1a}-0.41\%$
test_step_mdp_speed[True-True-True-True-False] 82.3110μs 19.8920μs 50.2715 KOps/s 49.9189 KOps/s $\color{#35bf28}+0.71\%$
test_step_mdp_speed[True-True-True-False-True] 43.9710μs 18.7657μs 53.2888 KOps/s 52.2434 KOps/s $\color{#35bf28}+2.00\%$
test_step_mdp_speed[True-True-True-False-False] 35.1610μs 11.3384μs 88.1961 KOps/s 87.6904 KOps/s $\color{#35bf28}+0.58\%$
test_step_mdp_speed[True-True-False-True-True] 65.4010μs 35.2697μs 28.3529 KOps/s 28.3691 KOps/s $\color{#d91a1a}-0.06\%$
test_step_mdp_speed[True-True-False-True-False] 45.4510μs 21.5797μs 46.3398 KOps/s 45.1528 KOps/s $\color{#35bf28}+2.63\%$
test_step_mdp_speed[True-True-False-False-True] 47.7300μs 20.6114μs 48.5169 KOps/s 48.4483 KOps/s $\color{#35bf28}+0.14\%$
test_step_mdp_speed[True-True-False-False-False] 26.8400μs 13.2198μs 75.6440 KOps/s 75.8492 KOps/s $\color{#d91a1a}-0.27\%$
test_step_mdp_speed[True-False-True-True-True] 0.1138ms 36.7477μs 27.2126 KOps/s 26.8738 KOps/s $\color{#35bf28}+1.26\%$
test_step_mdp_speed[True-False-True-True-False] 42.5700μs 23.6681μs 42.2509 KOps/s 41.3608 KOps/s $\color{#35bf28}+2.15\%$
test_step_mdp_speed[True-False-True-False-True] 41.1300μs 20.5234μs 48.7249 KOps/s 48.1328 KOps/s $\color{#35bf28}+1.23\%$
test_step_mdp_speed[True-False-True-False-False] 97.1110μs 13.3019μs 75.1774 KOps/s 76.4274 KOps/s $\color{#d91a1a}-1.64\%$
test_step_mdp_speed[True-False-False-True-True] 67.4410μs 38.6354μs 25.8830 KOps/s 25.8152 KOps/s $\color{#35bf28}+0.26\%$
test_step_mdp_speed[True-False-False-True-False] 99.5310μs 25.4111μs 39.3529 KOps/s 38.7331 KOps/s $\color{#35bf28}+1.60\%$
test_step_mdp_speed[True-False-False-False-True] 39.5100μs 22.3990μs 44.6448 KOps/s 44.6385 KOps/s $\color{#35bf28}+0.01\%$
test_step_mdp_speed[True-False-False-False-False] 34.8110μs 15.0342μs 66.5152 KOps/s 66.7365 KOps/s $\color{#d91a1a}-0.33\%$
test_step_mdp_speed[False-True-True-True-True] 62.1810μs 37.5242μs 26.6495 KOps/s 26.9976 KOps/s $\color{#d91a1a}-1.29\%$
test_step_mdp_speed[False-True-True-True-False] 50.2300μs 23.8828μs 41.8712 KOps/s 41.7554 KOps/s $\color{#35bf28}+0.28\%$
test_step_mdp_speed[False-True-True-False-True] 79.8910μs 24.4587μs 40.8853 KOps/s 40.3947 KOps/s $\color{#35bf28}+1.21\%$
test_step_mdp_speed[False-True-True-False-False] 37.6310μs 14.9518μs 66.8815 KOps/s 65.9782 KOps/s $\color{#35bf28}+1.37\%$
test_step_mdp_speed[False-True-False-True-True] 64.7310μs 39.1092μs 25.5694 KOps/s 25.3789 KOps/s $\color{#35bf28}+0.75\%$
test_step_mdp_speed[False-True-False-True-False] 44.2700μs 25.5040μs 39.2095 KOps/s 38.8615 KOps/s $\color{#35bf28}+0.90\%$
test_step_mdp_speed[False-True-False-False-True] 0.1989ms 26.0399μs 38.4025 KOps/s 38.2195 KOps/s $\color{#35bf28}+0.48\%$
test_step_mdp_speed[False-True-False-False-False] 42.2610μs 16.7721μs 59.6227 KOps/s 59.4717 KOps/s $\color{#35bf28}+0.25\%$
test_step_mdp_speed[False-False-True-True-True] 0.1077ms 40.3998μs 24.7526 KOps/s 24.5784 KOps/s $\color{#35bf28}+0.71\%$
test_step_mdp_speed[False-False-True-True-False] 53.8100μs 27.2066μs 36.7558 KOps/s 35.9507 KOps/s $\color{#35bf28}+2.24\%$
test_step_mdp_speed[False-False-True-False-True] 51.5010μs 26.4435μs 37.8165 KOps/s 37.9026 KOps/s $\color{#d91a1a}-0.23\%$
test_step_mdp_speed[False-False-True-False-False] 32.4600μs 16.7366μs 59.7492 KOps/s 58.8417 KOps/s $\color{#35bf28}+1.54\%$
test_step_mdp_speed[False-False-False-True-True] 77.3610μs 41.3520μs 24.1826 KOps/s 23.4836 KOps/s $\color{#35bf28}+2.98\%$
test_step_mdp_speed[False-False-False-True-False] 0.2040ms 28.9502μs 34.5420 KOps/s 33.7839 KOps/s $\color{#35bf28}+2.24\%$
test_step_mdp_speed[False-False-False-False-True] 0.2125ms 27.7700μs 36.0101 KOps/s 35.4837 KOps/s $\color{#35bf28}+1.48\%$
test_step_mdp_speed[False-False-False-False-False] 42.4410μs 18.4577μs 54.1779 KOps/s 53.7780 KOps/s $\color{#35bf28}+0.74\%$
test_values[generalized_advantage_estimate-True-True] 25.9838ms 24.7688ms 40.3733 Ops/s 40.2715 Ops/s $\color{#35bf28}+0.25\%$
test_values[vec_generalized_advantage_estimate-True-True] 94.2066ms 3.4419ms 290.5378 Ops/s 309.3948 Ops/s $\textbf{\color{#d91a1a}-6.09\%}$
test_values[td0_return_estimate-False-False] 0.1070ms 63.1283μs 15.8408 KOps/s 15.1125 KOps/s $\color{#35bf28}+4.82\%$
test_values[td1_return_estimate-False-False] 52.5906ms 52.1795ms 19.1646 Ops/s 18.8294 Ops/s $\color{#35bf28}+1.78\%$
test_values[vec_td1_return_estimate-False-False] 2.0350ms 1.7573ms 569.0658 Ops/s 568.1730 Ops/s $\color{#35bf28}+0.16\%$
test_values[td_lambda_return_estimate-True-False] 89.1036ms 87.0897ms 11.4824 Ops/s 11.8170 Ops/s $\color{#d91a1a}-2.83\%$
test_values[vec_td_lambda_return_estimate-True-False] 2.1082ms 1.7567ms 569.2384 Ops/s 568.0686 Ops/s $\color{#35bf28}+0.21\%$
test_gae_speed[generalized_advantage_estimate-False-1-512] 24.4499ms 24.0811ms 41.5264 Ops/s 43.6397 Ops/s $\color{#d91a1a}-4.84\%$
test_gae_speed[vec_generalized_advantage_estimate-True-1-512] 0.8816ms 0.6994ms 1.4297 KOps/s 1.4119 KOps/s $\color{#35bf28}+1.26\%$
test_gae_speed[vec_generalized_advantage_estimate-False-1-512] 0.7860ms 0.6432ms 1.5547 KOps/s 1.5536 KOps/s $\color{#35bf28}+0.07\%$
test_gae_speed[vec_generalized_advantage_estimate-True-32-512] 1.5851ms 1.4498ms 689.7346 Ops/s 691.7286 Ops/s $\color{#d91a1a}-0.29\%$
test_gae_speed[vec_generalized_advantage_estimate-False-32-512] 0.9339ms 0.6626ms 1.5092 KOps/s 1.5029 KOps/s $\color{#35bf28}+0.42\%$
test_dqn_speed 9.3838ms 1.4439ms 692.5519 Ops/s 674.3756 Ops/s $\color{#35bf28}+2.70\%$
test_ddpg_speed 2.8378ms 2.7046ms 369.7396 Ops/s 362.4938 Ops/s $\color{#35bf28}+2.00\%$
test_sac_speed 8.4986ms 8.0456ms 124.2911 Ops/s 122.8907 Ops/s $\color{#35bf28}+1.14\%$
test_redq_speed 10.9893ms 10.1926ms 98.1105 Ops/s 97.3556 Ops/s $\color{#35bf28}+0.78\%$
test_redq_deprec_speed 11.6414ms 11.1851ms 89.4050 Ops/s 88.5005 Ops/s $\color{#35bf28}+1.02\%$
test_td3_speed 8.1128ms 7.9728ms 125.4259 Ops/s 123.2039 Ops/s $\color{#35bf28}+1.80\%$
test_cql_speed 26.6641ms 25.3620ms 39.4291 Ops/s 39.3347 Ops/s $\color{#35bf28}+0.24\%$
test_a2c_speed 6.0044ms 5.6287ms 177.6595 Ops/s 177.2549 Ops/s $\color{#35bf28}+0.23\%$
test_ppo_speed 6.1887ms 5.9809ms 167.2003 Ops/s 166.6266 Ops/s $\color{#35bf28}+0.34\%$
test_reinforce_speed 5.0765ms 4.5186ms 221.3054 Ops/s 220.2442 Ops/s $\color{#35bf28}+0.48\%$
test_iql_speed 20.4734ms 19.5576ms 51.1309 Ops/s 50.5559 Ops/s $\color{#35bf28}+1.14\%$
test_rb_sample[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 3.0970ms 2.8933ms 345.6245 Ops/s 348.1894 Ops/s $\color{#d91a1a}-0.74\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 0.7162ms 0.5434ms 1.8403 KOps/s 1.8283 KOps/s $\color{#35bf28}+0.65\%$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 4.2693ms 0.5256ms 1.9027 KOps/s 1.9132 KOps/s $\color{#d91a1a}-0.55\%$
test_rb_sample[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 3.1250ms 2.9310ms 341.1836 Ops/s 344.6508 Ops/s $\color{#d91a1a}-1.01\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 0.7339ms 0.5350ms 1.8691 KOps/s 1.8596 KOps/s $\color{#35bf28}+0.51\%$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 4.3960ms 0.5133ms 1.9483 KOps/s 1.9507 KOps/s $\color{#d91a1a}-0.12\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-sampler6-10000] 1.6606ms 1.5172ms 659.1260 Ops/s 648.1918 Ops/s $\color{#35bf28}+1.69\%$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-sampler7-10000] 5.1153ms 1.4533ms 688.0736 Ops/s 680.2072 Ops/s $\color{#35bf28}+1.16\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 3.3795ms 3.0669ms 326.0600 Ops/s 331.4161 Ops/s $\color{#d91a1a}-1.62\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 1.5607ms 0.6682ms 1.4965 KOps/s 1.3094 KOps/s $\textbf{\color{#35bf28}+14.29\%}$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 0.8148ms 0.6412ms 1.5596 KOps/s 1.5195 KOps/s $\color{#35bf28}+2.64\%$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 3.1685ms 2.9368ms 340.5053 Ops/s 344.1439 Ops/s $\color{#d91a1a}-1.06\%$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 0.1105s 0.6727ms 1.4866 KOps/s 1.8354 KOps/s $\textbf{\color{#d91a1a}-19.00\%}$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 0.6833ms 0.5175ms 1.9324 KOps/s 1.9238 KOps/s $\color{#35bf28}+0.45\%$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 3.2132ms 2.9771ms 335.8984 Ops/s 342.5063 Ops/s $\color{#d91a1a}-1.93\%$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 0.7023ms 0.5374ms 1.8606 KOps/s 1.8600 KOps/s $\color{#35bf28}+0.03\%$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 4.5459ms 0.5193ms 1.9255 KOps/s 1.9240 KOps/s $\color{#35bf28}+0.08\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 3.2620ms 3.0628ms 326.4959 Ops/s 328.1334 Ops/s $\color{#d91a1a}-0.50\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 0.1065s 0.8124ms 1.2309 KOps/s 1.4769 KOps/s $\textbf{\color{#d91a1a}-16.65\%}$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 0.8264ms 0.6432ms 1.5547 KOps/s 1.5436 KOps/s $\color{#35bf28}+0.72\%$
test_rb_populate[TensorDictReplayBuffer-ListStorage-RandomSampler-400] 0.1005s 6.6759ms 149.7934 Ops/s 111.1827 Ops/s $\textbf{\color{#35bf28}+34.73\%}$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-400] 17.2540ms 14.9349ms 66.9572 Ops/s 67.1377 Ops/s $\color{#d91a1a}-0.27\%$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-400] 2.3486ms 1.1848ms 843.9990 Ops/s 840.4984 Ops/s $\color{#35bf28}+0.42\%$
test_rb_populate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-400] 0.1030s 8.6238ms 115.9580 Ops/s 149.4225 Ops/s $\textbf{\color{#d91a1a}-22.40\%}$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-400] 17.3331ms 14.8665ms 67.2655 Ops/s 67.5578 Ops/s $\color{#d91a1a}-0.43\%$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-400] 7.4481ms 1.2839ms 778.8780 Ops/s 860.2170 Ops/s $\textbf{\color{#d91a1a}-9.46\%}$
test_rb_populate[TensorDictPrioritizedReplayBuffer-ListStorage-None-400] 0.1021s 7.0692ms 141.4593 Ops/s 110.9642 Ops/s $\textbf{\color{#35bf28}+27.48\%}$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-400] 17.3268ms 15.0370ms 66.5028 Ops/s 65.7822 Ops/s $\color{#35bf28}+1.10\%$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-400] 7.2784ms 1.6286ms 614.0111 Ops/s 608.4177 Ops/s $\color{#35bf28}+0.92\%$

@vmoens vmoens changed the title [Feature] Preproc for datasets and better rb representation [Feature] Preproc for datasets Mar 4, 2024
@vmoens vmoens added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Mar 4, 2024
@vmoens vmoens linked an issue Mar 5, 2024 that may be closed by this pull request
@dennismalmgren
Copy link
Contributor

When I last did this, I didn't get the transforms right the first time (or second, or third...), and it took some iterations - sometimes jointly with algorithm development. Hence, I'd prefer an option not to replace the downloaded dataset, to speed up that iterative process.

@vmoens
Copy link
Contributor Author

vmoens commented Mar 6, 2024

Makes sense!
We could:

  • Provide a dest keyword arg for the dest. Either it defaults to the source and replaces it -- or we only do it if a "same" dest is provided.
  • Provide the option to do only X frames (e.g. max_frames). By default, the preproces will do the entire dataset, but if you can ask for less. This would be mainly useful for debugging

@vmoens
Copy link
Contributor Author

vmoens commented Mar 13, 2024

Upon reflection it seems it makes more sense to encourage users to copy the data somewhere else.

What should happen to:

  • Sampler? I guess we can generally keep the same sampler as the one set in the dataset
  • Writer? I would keep the immutable writer by default
  • transforms? Here I would vote for discarding any transform that has been set in the dataset. Transforms are presumably used to do the same job as this preproc

@vmoens
Copy link
Contributor Author

vmoens commented Mar 14, 2024

@nicklashansen @dennismalmgren I need some input here.

The issue I'm facing is that there is no way to tell what the storage will look like after the transform, and hence the sampler / writer / transforms will in many cases be obsolete.

If we recycle them by default there's a big chunk of cases where the transformed dataset will not be usable anymore.

Therefore, since we need at least the space for the dataset and the transformed dataset to use this functionality (you can't delete the first before you get the second), I think
(1) we should never replace the dataset with its transformed version: it's a bit chaotic, hard to implement because the dataset can come with many other attributes and on top of that we can't be sure that the signature of the dataset will work with the new one
(2) the transform should not return a replay buffer but just a storage and the user should build the replay buffer from scratch:

dataset = OpenXExperienceReplay(...)
new_storage = dataset.preprocess(func, ...)
# we give users the option to delete the dataset
dataset.clear()
new_dataset = ReplayBuffer(storage=new_storage, sampler=whatever, ...)

A bit more code is needed but at least people know what they're doing. Note that new_dataset = ReplayBuffer(storage=new_storage) will always work with a default writer and sampler.

Wdyt?

@nicklashansen
Copy link

@vmoens thank you for initiating this! i agree with @dennismalmgren comment, and in general i see this as being used mostly as a separate step as opposed to being part of the training process / data loading. and i agree that it the most convenient solution would be to store the transformed dataset somewhere else by default, but perhaps giving users the option of deleting the original data after preprocessing. im not overly concerned about the (potentially) 2x disk space requirements.

ideally one would be able to download all of the torchrl datasets to disk, preprocess and save them in the desired format (would probably have to be specified manually but so be it), and then whichever training process and data loader that users build downstream (most likely in a different process) will use that new, common (in case of multi-dataset training) format without the added overhead of preprocessing on the fly. that said, i can imagine scenarios in which a user might still want to apply some transforms when sampling, e.g., data augmentation or random slicing, but that seems like something that would be supported as is?

@vmoens
Copy link
Contributor Author

vmoens commented Mar 14, 2024

Yes since you're rebuilding the replay buffer you can pass whichever transform.

I made the changes you can check them here
https://docs-preview.pytorch.org/pytorch/rl/1989/reference/generated/torchrl.data.datasets.BaseDatasetExperienceReplay.html#torchrl.data.datasets.BaseDatasetExperienceReplay.preprocess

@nicklashansen
Copy link

Thanks! LGTM at first glance, I'm a bit caught up in other things atm but will give it a try soon!

@vmoens vmoens merged commit 29d9a5b into main Mar 18, 2024
64 of 67 checks passed
@vmoens vmoens deleted the preproc-datasets branch March 18, 2024 16:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. Data Data-related PR, will launch data-related jobs
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Feature Request] Streamline preproc of datasets
4 participants