Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor(breaking): unify LLM API #283

Merged
merged 9 commits into from
Sep 1, 2023
Merged

refactor(breaking): unify LLM API #283

merged 9 commits into from
Sep 1, 2023

Conversation

aarnphm
Copy link
Collaborator

@aarnphm aarnphm commented Sep 1, 2023

  • refactor: initial work to _gen
  • chore: remove bettertransformer
  • fix: run format
  • fix: rename backend and cleanup runtime [wip]
  • refactor: update naming and envvar

Signed-off-by: aarnphm-ec2-dev <29749331+aarnphm@users.noreply.github.com>
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
Signed-off-by: aarnphm-ec2-dev <29749331+aarnphm@users.noreply.github.com>
@aarnphm aarnphm requested a review from GutZuFusss as a code owner September 1, 2023 08:44
Signed-off-by: aarnphm-ec2-dev <29749331+aarnphm@users.noreply.github.com>
Signed-off-by: aarnphm-ec2-dev <29749331+aarnphm@users.noreply.github.com>
@aarnphm aarnphm merged commit 3e45530 into main Sep 1, 2023
@aarnphm aarnphm deleted the refactor/unify-llm-impl branch September 1, 2023 09:15
@K-Mistele
Copy link

Hi @aarnphm 👋 I have a quick question about this PR - I saw that the following line was added to the README in this PR:

To use the vLLM backend, you need a GPU with at least the Ampere architecture or newer and CUDA version 11.8.

I have dug through vLLM's documentation, and vLLM supports pre-Ampere architectures such as volta out-of-the-box.

Is there any documentation about where this limitation came from? My assumption is that it's because the bfloat16 data type is being used, but is it possible that I could add a configuration for vLLM that doesn't use that data type? Or does it have to do with pre-compiled kernels being used?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants