Skip to content

Commit

Permalink
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Update perf_infer_gpu_one.md: fix a typo (huggingface#35441)
Browse files Browse the repository at this point in the history
martin0258 authored and AlanPonnachan committed Jan 1, 2025
1 parent ef34214 commit 9d359e8
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion docs/source/en/perf_infer_gpu_one.md
Original file line number Diff line number Diff line change
@@ -462,7 +462,7 @@ generated_ids = model.generate(**inputs)
outputs = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)
```

To load a model in 4-bit for inference with multiple GPUs, you can control how much GPU RAM you want to allocate to each GPU. For example, to distribute 1GB of memory to the first GPU and 2GB of memory to the second GPU:
To load a model in 8-bit for inference with multiple GPUs, you can control how much GPU RAM you want to allocate to each GPU. For example, to distribute 1GB of memory to the first GPU and 2GB of memory to the second GPU:

```py
max_memory_mapping = {0: "1GB", 1: "2GB"}

0 comments on commit 9d359e8

Please sign in to comment.