-
Notifications
You must be signed in to change notification settings - Fork 10.9k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Showing
2 changed files
with
66 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
fly.toml |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,65 @@ | ||
# Deploy Ollama to Fly.io | ||
|
||
## Prerequisites | ||
|
||
- Ollama: https://ollama.ai/download | ||
- Fly.io account. Sign up for a free account: https://fly.io/app/sign-up | ||
|
||
## Steps | ||
|
||
1. Login to Fly.io | ||
|
||
```bash | ||
fly auth login | ||
``` | ||
|
||
1. Create a new Fly app | ||
|
||
```bash | ||
fly launch --name <name> --image ollama/ollama --internal-port 11434 --vm-size shared-cpu-8x --now | ||
``` | ||
|
||
1. Pull and run `orca-mini:3b` | ||
|
||
```bash | ||
OLLAMA_HOST=https://<name>.fly.dev ollama run orca-mini:3b | ||
``` | ||
|
||
`shared-cpu-8x` is a free-tier eligible machine type. For better performance, switch to a `performance` or `dedicated` machine type or attach a GPU for hardware acceleration (see below). | ||
|
||
## (Optional) Persistent Volume | ||
|
||
By default Fly Machines use ephemeral storage which is problematic if you want to use the same model across restarts without pulling it again. Create and attach a persistent volume to store the downloaded models: | ||
|
||
1. Create the Fly Volume | ||
|
||
```bash | ||
fly volume create ollama | ||
``` | ||
|
||
1. Update `fly.toml` and add `[mounts]` | ||
|
||
```toml | ||
[mounts] | ||
source = "ollama" | ||
destination = "/mnt/ollama/models" | ||
``` | ||
|
||
1. Update `fly.toml` and add `[env]` | ||
|
||
```toml | ||
[env] | ||
OLLAMA_MODELS = "/mnt/ollama/models" | ||
``` | ||
|
||
1. Deploy your app | ||
|
||
```bash | ||
fly deploy | ||
``` | ||
|
||
## (Optional) Hardware Acceleration | ||
|
||
Fly.io GPU is currently in waitlist. Sign up for the waitlist: https://fly.io/gpu | ||
|
||
Once you've been accepted, create the app with the additional flags `--vm-gpu-kind a100-pcie-40gb` or `--vm-gpu-kind a100-pcie-80gb`. |