Fix illegal memory access with multi_tensor_apply size above INT_MAX (#…

…1825) Currently, multi_tensor_apply causes an illegal memory access due to an overflow in the `size` field of `TensorListMetadata`. This can be reproduced using the following standalone script: ```python import torch, amp_C from apex.multi_tensor_apply import multi_tensor_applier multi_tensor_adam = amp_C.multi_tensor_adam size = 2**32+1 g_32 = [torch.zeros(size, dtype=torch.float32, device='cuda')] p_32 = [torch.zeros(size, dtype=torch.float32, device='cuda')] m_32 = [torch.zeros(size, dtype=torch.float32, device='cuda')] v_32 = [torch.zeros(size, dtype=torch.float32, device='cuda')] _dummy_overflow_buf = torch.zeros(1, dtype=torch.int32, device='cuda') multi_tensor_applier(multi_tensor_adam, _dummy_overflow_buf, [g_32, p_32, m_32, v_32], 0.0, 0.9, 0.95, 1e-08, 1, 1, 1, 0.1) print(g_32) ```
NVIDIA · Aug 17, 2024 · 79e3dc4 · 79e3dc4
1 parent 59b80ee
commit 79e3dc4
Showing 1 changed file with 1 addition and 1 deletion.
diff --git a/csrc/multi_tensor_apply.cuh b/csrc/multi_tensor_apply.cuh
@@ -19,7 +19,7 @@ constexpr int depth_to_max_blocks[6] = {320, 320, 320, 320, 320, 320};
 template<int n> struct TensorListMetadata
 {
   void* addresses[n][depth_to_max_tensors[n-1]];
-  int sizes[depth_to_max_tensors[n-1]];
+  int64_t sizes[depth_to_max_tensors[n-1]];
   unsigned char block_to_tensor[depth_to_max_blocks[n-1]];
   int block_to_chunk[depth_to_max_blocks[n-1]]; // I fear this needs to be a full int.
   int start_tensor_this_launch;