Hi,
I’m Khalid Ahmed, an experienced PyTorch developer with expertise in multi-GPU training, checkpointing, and model reproducibility. I’ve worked extensively with machine learning models and frameworks like Hugging Face Accelerate and PyTorch across multiple hardware setups (CPU, single GPU, multi-GPU).
For your project, I will:
Insert code to save checkpoints during training in the training_loop(), ensuring compatibility with CPU-only, single-GPU, and multi-GPU configurations.
Write code to load checkpoints on a different machine (COMPUTER B) and confirm that the model reproduces identical accuracy to the saved checkpoint, regardless of the hardware setup (CPU-only, single GPU, or multi-GPU).
Ensure smooth execution and consistent results across various hardware configurations using PyTorch and Hugging Face Accelerate.
I’ve worked on similar tasks in my projects, including EfficientNet and Transformers for image classification, where I ensured reproducibility across different devices.
Rate: $25
Looking forward to working with you to achieve identical accuracy across different setups.
Best regards,
Khalid Ahmed