Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fly example #951

Merged
merged 2 commits into from
May 7, 2024
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions examples/flyio/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
fly.toml
67 changes: 67 additions & 0 deletions examples/flyio/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,67 @@
# Deploy Ollama to Fly.io

> Note: this example exposes a public endpoint and does not configure authentication. Use with care.

## Prerequisites

- Ollama: https://ollama.com/download
- Fly.io account. Sign up for a free account: https://fly.io/app/sign-up

## Steps

1. Login to Fly.io

```bash
fly auth login
```

1. Create a new Fly app

```bash
fly launch --name <name> --image ollama/ollama --internal-port 11434 --vm-size shared-cpu-8x --now
```

1. Pull and run `orca-mini:3b`

```bash
OLLAMA_HOST=https://<name>.fly.dev ollama run orca-mini:3b
```

`shared-cpu-8x` is a free-tier eligible machine type. For better performance, switch to a `performance` or `dedicated` machine type or attach a GPU for hardware acceleration (see below).

## (Optional) Persistent Volume

By default Fly Machines use ephemeral storage which is problematic if you want to use the same model across restarts without pulling it again. Create and attach a persistent volume to store the downloaded models:

1. Create the Fly Volume

```bash
fly volume create ollama
```

1. Update `fly.toml` and add `[mounts]`

```toml
[mounts]
source = "ollama"
destination = "/mnt/ollama/models"
```

1. Update `fly.toml` and add `[env]`

```toml
[env]
OLLAMA_MODELS = "/mnt/ollama/models"
```

1. Deploy your app

```bash
fly deploy
```

## (Optional) Hardware Acceleration

Fly.io GPU is currently in waitlist. Sign up for the waitlist: https://fly.io/gpu

Once you've been accepted, create the app with the additional flags `--vm-gpu-kind a100-pcie-40gb` or `--vm-gpu-kind a100-pcie-80gb`.
Loading