refactor(breaking): unify LLM API #283

aarnphm · 2023-09-01T08:44:33Z

refactor: initial work to _gen
chore: remove bettertransformer
fix: run format
fix: rename backend and cleanup runtime [wip]
refactor: update naming and envvar

Signed-off-by: aarnphm-ec2-dev <29749331+aarnphm@users.noreply.github.com> Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>

Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>

…y-llm-impl

Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>

Signed-off-by: aarnphm-ec2-dev <29749331+aarnphm@users.noreply.github.com>

K-Mistele · 2024-03-18T18:59:23Z

Hi @aarnphm 👋 I have a quick question about this PR - I saw that the following line was added to the README in this PR:

To use the vLLM backend, you need a GPU with at least the Ampere architecture or newer and CUDA version 11.8.

I have dug through vLLM's documentation, and vLLM supports pre-Ampere architectures such as volta out-of-the-box.

Is there any documentation about where this limitation came from? My assumption is that it's because the bfloat16 data type is being used, but is it possible that I could add a configuration for vLLM that doesn't use that data type? Or does it have to do with pre-compiled kernels being used?

aarnphm added 6 commits August 30, 2023 11:46

refactor: initial work to _gen

03c402e

Signed-off-by: aarnphm-ec2-dev <29749331+aarnphm@users.noreply.github.com> Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>

chore: remove bettertransformer

939a1dc

Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>

merge: branch 'main' of github.com:bentoml/OpenLLM into refactor/unif…

14dda15

…y-llm-impl

fix: run format

9d4ec15

Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>

fix: rename backend and cleanup runtime [wip]

f224e70

Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>

refactor: update naming and envvar

025633a

Signed-off-by: aarnphm-ec2-dev <29749331+aarnphm@users.noreply.github.com>

aarnphm requested a review from GutZuFusss as a code owner September 1, 2023 08:44

aarnphm added 3 commits September 1, 2023 08:48

chore: add breaking change notes

973c29f

Signed-off-by: aarnphm-ec2-dev <29749331+aarnphm@users.noreply.github.com>

chore: update overload types

e292fe0

Signed-off-by: aarnphm-ec2-dev <29749331+aarnphm@users.noreply.github.com>

Merge branch 'main' into refactor/unify-llm-impl

20a9c0e

aarnphm merged commit 3e45530 into main Sep 1, 2023

aarnphm deleted the refactor/unify-llm-impl branch September 1, 2023 09:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor(breaking): unify LLM API #283

refactor(breaking): unify LLM API #283

aarnphm commented Sep 1, 2023

K-Mistele commented Mar 18, 2024

refactor(breaking): unify LLM API #283

refactor(breaking): unify LLM API #283

Conversation

aarnphm commented Sep 1, 2023

K-Mistele commented Mar 18, 2024