Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore: Update extensions and engines docs #4413

Merged
merged 22 commits into from
Jan 14, 2025
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
22 commits
Select commit Hold shift + click to select a range
c619f8c
Merge branch 'dev' of https://github.com/janhq/jan into dev
imtuyethan Jan 7, 2025
867a51c
Merge branch 'dev' of https://github.com/janhq/jan into dev
imtuyethan Jan 7, 2025
a90287b
[WIP] Updating Extensions & Engines guide
imtuyethan Jan 7, 2025
77ebd68
Merge branch 'dev' into chore/update-extensions-and-engines-docs
imtuyethan Jan 7, 2025
c3076f9
Updated Extensions pages
imtuyethan Jan 7, 2025
d4d6f2c
Merge branch 'chore/update-extensions-and-engines-docs' of https://gi…
imtuyethan Jan 7, 2025
a197dc2
Updated Engines pages
imtuyethan Jan 7, 2025
500529b
Updated Engines pages
imtuyethan Jan 8, 2025
59c6c3f
Merge branch 'dev' into chore/update-extensions-and-engines-docs
imtuyethan Jan 8, 2025
bc44c8e
updated images (app versions, remove tensorRT-LLM, remove Cortex sett…
imtuyethan Jan 13, 2025
f46eb87
Merge branch 'chore/update-extensions-and-engines-docs' of https://gi…
imtuyethan Jan 13, 2025
18f99b1
Merge branch 'dev' into chore/update-extensions-and-engines-docs
imtuyethan Jan 13, 2025
ed131f9
Removed unnecessary pages & files
imtuyethan Jan 13, 2025
5457d34
Merge branch 'chore/update-extensions-and-engines-docs' of https://gi…
imtuyethan Jan 13, 2025
1427f29
Merge branch 'dev' into chore/update-extensions-and-engines-docs
imtuyethan Jan 13, 2025
3b019ca
Resolved conflicts
imtuyethan Jan 13, 2025
1ee0095
updated remote engines pages
imtuyethan Jan 13, 2025
0048fe0
Small updates on Settings page
imtuyethan Jan 14, 2025
54aae45
Changed MMAP to mmap
imtuyethan Jan 14, 2025
1cf9e0e
Merge branch 'dev' into chore/update-extensions-and-engines-docs
imtuyethan Jan 14, 2025
389d83e
Updated Install Engines page with correct commands
imtuyethan Jan 14, 2025
d15f6fe
Merge branch 'dev' into chore/update-extensions-and-engines-docs
imtuyethan Jan 14, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Updated Engines pages
  • Loading branch information
imtuyethan committed Jan 8, 2025
commit 500529b41007b59f99bb2c449ecbe77d3f668d58
Binary file modified docs/src/pages/docs/_assets/extensions-09.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/src/pages/docs/_assets/tensorrt-llm-01.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/src/pages/docs/_assets/tensorrt-llm-02.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
26 changes: 20 additions & 6 deletions docs/src/pages/docs/install-engines.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -32,8 +32,17 @@ To add a new remote engine:

1. Navigate to **Settings** (<Settings width={16} height={16} style={{display:"inline"}}/>) > **Engines**
1. At **Remote Engine** category, click **+ Install Engine**

<br/>
![Install Remote Engines](./_assets/install-engines-01.png)
<br/>

2. Fill in the following required information:

<br/>
![Install Remote Engines](./_assets/install-engines-02.png)
<br/>

| Field | Description | Required |
|-------|-------------|----------|
| Engine Name | Name for your engine (e.g., "OpenAI", "Claude") | ✓ |
Expand All @@ -48,7 +57,14 @@ To add a new remote engine:
> - The conversion functions are only needed for providers that don't follow the OpenAI API format. For OpenAI-compatible APIs, you can leave these empty.
> - For OpenAI-compatible APIs like OpenAI, Anthropic, or Groq, you only need to fill in the required fields. Leave optional fields empty.

4. Click **Install** to complete
4. Click **Install**
5. Once completed, you should see your engine in **Engines** page:
- You can rename or uninstall your engine
- You can navigate to its own settings page

<br/>
![Install Remote Engines](./_assets/install-engines-03.png)
<br/>

### Examples
#### OpenAI-Compatible Setup
Expand Down Expand Up @@ -89,6 +105,9 @@ API Key: your_api_key_here
```

**Conversion Functions:**
> - Request: Convert from Jan's OpenAI-style format to your API's format
> - Response: Convert from your API's format back to OpenAI-style format

1. Request Format Conversion:
```javascript
function convertRequest(janRequest) {
Expand Down Expand Up @@ -117,11 +136,6 @@ function convertResponse(apiResponse) {
}
```

<Callout type="info">
The conversion functions should:
- Request: Convert from Jan's OpenAI-style format to your API's format
- Response: Convert from your API's format back to OpenAI-style format
</Callout>

**Expected Formats:**

Expand Down
91 changes: 74 additions & 17 deletions docs/src/pages/docs/local-engines/llama-cpp.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -57,26 +57,83 @@ Jan offers different backend variants for **llama.cpp** based on your operating
Choose the backend that matches your hardware. Using the wrong variant may cause performance issues or prevent models from loading.
</Callout>

### macOS
- `mac-arm64`: For Apple Silicon Macs (M1/M2/M3)
- `mac-amd64`: For Intel-based Macs

### Windows
- `win-cuda`: For NVIDIA GPUs using CUDA
- `win-cpu`: For CPU-only operation
- `win-directml`: For DirectML acceleration (AMD/Intel GPUs)
- `win-opengl`: For OpenGL acceleration

### Linux
- `linux-cuda`: For NVIDIA GPUs using CUDA
- `linux-cpu`: For CPU-only operation
- `linux-rocm`: For AMD GPUs using ROCm
- `linux-openvino`: For Intel GPUs/NPUs using OpenVINO
- `linux-vulkan`: For Vulkan acceleration
<Tabs items={['Windows', 'Linux', 'macOS']}>

<Tabs.Tab>
### CUDA Support (NVIDIA GPUs)
- `llama.cpp-avx-cuda-11-7`
- `llama.cpp-avx-cuda-12-0`
- `llama.cpp-avx2-cuda-11-7`
- `llama.cpp-avx2-cuda-12-0`
- `llama.cpp-avx512-cuda-11-7`
- `llama.cpp-avx512-cuda-12-0`
- `llama.cpp-noavx-cuda-11-7`
- `llama.cpp-noavx-cuda-12-0`

### CPU Only
- `llama.cpp-avx`
- `llama.cpp-avx2`
- `llama.cpp-avx512`
- `llama.cpp-noavx`

### Other Accelerators
- `llama.cpp-vulkan`

<Callout type="info">
For detailed hardware compatibility, please visit our guide for [Mac](/docs/desktop/mac#compatibility), [Windows](/docs/desktop/windows#compatibility), and [Linux](docs/desktop/linux).
- For detailed hardware compatibility, please visit our guide for [Windows](/docs/desktop/windows#compatibility).
- AVX, AVX2, and AVX-512 are CPU instruction sets. For best performance, use the most advanced instruction set your CPU supports.
- CUDA versions should match your installed NVIDIA drivers.
</Callout>

</Tabs.Tab>

<Tabs.Tab>
### CUDA Support (NVIDIA GPUs)
- `llama.cpp-avx-cuda-11-7`
- `llama.cpp-avx-cuda-12-0`
- `llama.cpp-avx2-cuda-11-7`
- `llama.cpp-avx2-cuda-12-0`
- `llama.cpp-avx512-cuda-11-7`
- `llama.cpp-avx512-cuda-12-0`
- `llama.cpp-noavx-cuda-11-7`
- `llama.cpp-noavx-cuda-12-0`

### CPU Only
- `llama.cpp-avx`
- `llama.cpp-avx2`
- `llama.cpp-avx512`
- `llama.cpp-noavx`

### Other Accelerators
- `llama.cpp-vulkan`
- `llama.cpp-arm64`

<Callout type="info">
- For detailed hardware compatibility, please visit our guide for [Linux](docs/desktop/linux).
- AVX, AVX2, and AVX-512 are CPU instruction sets. For best performance, use the most advanced instruction set your CPU supports.
- CUDA versions should match your installed NVIDIA drivers.
</Callout>

</Tabs.Tab>

<Tabs.Tab>
### Apple Silicon
- `llama.cpp-mac-arm64`: For M1/M2/M3 Macs

### Intel
- `llama.cpp-mac-amd64`: For Intel-based Macs

<Callout type="info">
For detailed hardware compatibility, please visit our guide for [Mac](/docs/desktop/mac#compatibility).
</Callout>


</Tabs.Tab>

</Tabs>






66 changes: 25 additions & 41 deletions docs/src/pages/docs/local-engines/tensorrt-llm.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -28,12 +28,9 @@ import { Settings, EllipsisVertical, Plus, FolderOpen, Pencil } from 'lucide-rea
Jan uses **TensorRT-LLM** as an optional engine for faster inference on NVIDIA GPUs. This engine uses [Cortex-TensorRT-LLM](https://github.com/janhq/cortex.tensorrt-llm), which includes an efficient C++ server that executes the [TRT-LLM C++ runtime](https://nvidia.github.io/TensorRT-LLM/gpt_runtime.html) natively. It also includes features and performance improvements like OpenAI compatibility, tokenizer improvements, and queues.

<Callout type="info">
Currently only available for **Windows** users, **Linux** support is coming soon!
TensorRT-LLM engine is only available for **Windows** users, **Linux** support is coming soon!
</Callout>

You can find its settings in **Settings** (<Settings width={16} height={16} style={{display:"inline"}}/>) > **Local Engine** > **TensorRT-LLM**:



## Requirements
- NVIDIA GPU with Compute Capability 7.0 or higher (RTX 20xx series and above)
Expand All @@ -45,59 +42,46 @@ You can find its settings in **Settings** (<Settings width={16} height={16} styl
For detailed setup guide, please visit [Windows](/docs/desktop/windows#compatibility).
</Callout>

## Engine Version and Updates
- **Engine Version**: View current version of TensorRT-LLM engine
- **Check Updates**: Verify if a newer version is available & install available updates when it's available

## Available Backends

TensorRT-LLM is specifically designed for NVIDIA GPUs. Available backends include:

**Windows**
- `win-cuda`: For NVIDIA GPUs with CUDA support

<Callout type="warning">
TensorRT-LLM requires an NVIDIA GPU with CUDA support. It is not compatible with other GPU types or CPU-only systems.
</Callout>



## Enable TensorRT-LLM

<Steps>
### Step 1: Install TensorRT-Extension
### Step 1: Install Additional Dependencies
1. Navigate to **Settings** (<Settings width={16} height={16} style={{display:"inline"}}/>) > **Local Engine** > **TensorRT-LLM**:
2. At **Additional Dependencies**, click **Install**

1. Click the **Gear Icon (⚙️)** on the bottom left of your screen.
2. Select the **TensorRT-LLM** under the **Model Provider** section.
<br/>
![Click Tensor](../_assets/tensor.png)
<br/>
3. Click **Install** to install the required dependencies to use TensorRT-LLM.
<br/>
![Install Extension](../_assets/install-tensor.png)
![Click Tensor](../_assets/tensorrt-llm-01.png)
<br/>
3. Check that files are correctly downloaded.

3. Verify that files are correctly downloaded:
```bash
ls ~/jan/data/extensions/@janhq/tensorrt-llm-extension/dist/bin
# Your Extension Folder should now include `nitro.exe`, among other artifacts needed to run TRT-LLM

# Your Extension Folder should now include `cortex.exe`, among other artifacts needed to run TRT-LLM
```
4. Restart Jan

### Step 2: Download a Compatible Model
### Step 2: Download Compatible Models

TensorRT-LLM can only run models in `TensorRT` format. These models, aka "TensorRT Engines", are prebuilt for each target OS+GPU architecture.
TensorRT-LLM can only run models in `TensorRT` format. These models, also known as "TensorRT Engines", are prebuilt specifically for each operating system and GPU architecture.

We offer a handful of precompiled models for Ampere and Ada cards that you can immediately download and play with:
We currently offer a selection of precompiled models optimized for NVIDIA Ampere and Ada GPUs that you can use right away:

1. Restart the application and go to the Hub.
2. Look for models with the `TensorRT-LLM` label in the recommended models list > Click **Download**.
1. Go to **Hub**
2. Look for models with the `TensorRT-LLM` label & make sure they're within your hardware compatibility
3. Click **Download**

<Callout type='info'>
This step might take some time. 🙏
<Callout type="info">
This download might take some time as TensorRT models are typically large files.
</Callout>

![image](https://hackmd.io/_uploads/rJewrEgRp.png)
<br/>
![Download TensorRT-LLM Model](../_assets/tensorrt-llm-02.png)
<br/>

### Step 3: Start Threads
Once the model(s) is downloaded, start using it in [Threads](/docs/threads)
</Steps>


3. Click **Download** to download the model.

</Steps>
Loading