Skip to content

Commit

Permalink
set default SAFETENSORS_FAST_GPU and HF_HUB_DISABLE_TELEMETRY in HF S…
Browse files Browse the repository at this point in the history
…erver (#3594)

* set SAFETENSORS_FAST_GPU and HF_HUB_DISABLE_TELEMETRY

Signed-off-by: Lize Cai <lize.cai@sap.com>

* add doc on the default value

Signed-off-by: Lize Cai <lize.cai@sap.com>

---------

Signed-off-by: Lize Cai <lize.cai@sap.com>
  • Loading branch information
lizzzcai authored Apr 29, 2024
1 parent 622f32f commit 1c5b0f9
Show file tree
Hide file tree
Showing 2 changed files with 8 additions and 0 deletions.
5 changes: 5 additions & 0 deletions python/huggingface_server.Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -52,6 +52,11 @@ COPY --from=builder huggingfaceserver huggingfaceserver

# Set a writable Hugging Face home folder to avoid permission issue. See https://github.com/kserve/kserve/issues/3562
ENV HF_HOME="/tmp/huggingface"
# https://huggingface.co/docs/safetensors/en/speed#gpu-benchmark
ENV SAFETENSORS_FAST_GPU="1"
# https://huggingface.co/docs/huggingface_hub/en/package_reference/environment_variables#hfhubdisabletelemetry
ENV HF_HUB_DISABLE_TELEMETRY="1"

USER 1000
ENTRYPOINT ["python3", "-m", "huggingfaceserver"]

3 changes: 3 additions & 0 deletions python/huggingfaceserver/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,9 @@ curl -H "content-type:application/json" -v localhost:8080/v1/models/bert:predict
## Deploy Huggingface Server on KServe
> 1. `SAFETENSORS_FAST_GPU` is set by default to improve the model loading performance.
> 2. `HF_HUB_DISABLE_TELEMETRY` is set by default to disable the telemetry.
1. Serve the huggingface model using KServe python runtime for both preprocess(tokenization)/postprocess and inference.
```yaml
apiVersion: serving.kserve.io/v1beta1
Expand Down

0 comments on commit 1c5b0f9

Please sign in to comment.