v0.13.0

@alexagriffith

🌈 What's New?

add support for async streaming in predict by @alexagriffith in #3475
Fix: Support model parallelism in HF transformer by @gavrishp in #3459
Support model revision and tokenizer revision in huggingface server by @lizzzcai in #3558
OpenAI schema by @tessapham in #3477
Support OpenAIModel in ModelRepository by @grandbora in #3590
updated xgboost to support json and ubj models by @andyi2it in #3551
Add OpenAI API support to Huggingfaceserver by @cmaddalozzo in #3582
VLLM support for OpenAI Completions in HF server by @gavrishp in #3589
Add a user friendly error message for http exceptions by @grandbora in #3581
feat: Provide minimal distribution of CRDs by @terrytangyuan in #3492
set default SAFETENSORS_FAST_GPU and HF_HUB_DISABLE_TELEMETRY in HF Server by @lizzzcai in #3594
Enabled the multiple domains support on an inference service by @houshengbo in #3615
Add base model for proxying request to an OpenAI API enabled model server by @cmaddalozzo in #3621
Add headers to predictor exception logging by @grandbora in #3658
Enhance controller setup based on available CRDs by @israel-hdez in #3472
Add openai models endpoint by @cmaddalozzo in #3666
feat: Support customizable deployment strategy for RawDeployment mode. Fixes #3452 by @terrytangyuan in #3603
Enable dtype support for huggingface server by @Datta0 in #3613
Add method for checking model health/readiness by @cmaddalozzo in #3673
Unify the log configuration using kserve logger by @sivanantha321 in #3577
Add the field ResponseStartTimeoutSeconds to create ksvc by @houshengbo in #3705
Add FP16 datatype support for OIP grpc by @sivanantha321 in #3695
Add option for returning probabilities in huggingface server by @andyi2it in #3607

⚠️ What's Changed

Remove conversion webhook from manifests by @Jooho in #3476
Remove cluster level list/watch for configmaps, serviceaccounts, secrets by @sivanantha321 in #3469
chore: Remove Seldon Alibi dependencies. Fixes #3380 by @terrytangyuan in #3443
docs: Move Alibi explainer to docs by @terrytangyuan in #3579
Remove generate endpoints by @cmaddalozzo in #3654
Remove conversion webhook from kubeflow manifest patch by @sivanantha321 in #3700

🐛 What's Fixed

Fix:Support Parallelism in vllm runtime by @gavrishp in #3464
fix: Instantiate HuggingfaceModelRepository only when model cannot be loaded. Fixes #3423 by @terrytangyuan in #3424
Fix isADirectoryError in Azure blob download by @tjandy98 in #3502
Fix bug: Remove redundant helm chart affinity on predictor CRD by @trojaond in #3481
Make the modelcar injection idempotent by @rhuss in #3517
Only pad left for decode-only architecture models. by @sivanantha321 in #3534
fix lint typo on Makefile by @spolti in #3569
fix: Set writable cache folder to avoid permission issue. Fixes #3562 by @terrytangyuan in #3576
Fix model unload in server stop method by @sivanantha321 in #3587
Fix golint errors by @andyi2it in #3552
Fix make deploy-dev-storage-initializer not working by @sivanantha321 in #3617
Fix Pydantic 2 warnings by @cmaddalozzo in #3622
build: Fix CRD copying in generate-install.sh by @terrytangyuan in #3620
Only load from model repository if model binary is not found under model_dir by @sivanantha321 in #3559
build: Remove misleading logs from minimal-crdgen.sh by @terrytangyuan in #3641
Assign device to input tensors in huggingface server with huggingface backend by @saileshd1402 in #3657
Fix Huggingface server stopping criteria by @cmaddalozzo in #3659
Explicitly specify pad token id when generating tokens by @sivanantha321 in #3565
Fix quick install does not cleans up Istio installer by @sivanantha321 in #3660
fix for extract zip from gcs by @andyi2it in #3510
fix: HPA equality check should include annotations by @terrytangyuan in #3650
Fix: model id and model dir check order by @yuzisun in #3680
Fix:vLLM Model Supported check throwing circular dependency by @gavrishp in #3688
Fix: Allow null in Finish reason streaming response in vLLM by @gavrishp in #3684
Fix kserve version is not updated properly by python-release.sh by @sivanantha321 in #3707
Add precaution again running v1 endpoints on openai models by @grandbora in #3694
Typos and minor fixes by @alpe in #3429
Fix model_id and model_dir precedence for vLLM by @yuzisun in #3718
Fixup max_length for HF and model info for vLLM by @Datta0 in #3715
Fix prompt token count and provide completion usage in OpenAI response by @sivanantha321 in #3712

⬆️ Version Upgrade

Upgrade orjson to version 3.9.15 by @spolti in #3488
feat: upgrade to new fastapi, update models to handle both pydantic v… by @timothyjlaurent in #3374
Update cert manager version in quick install script by @shauryagoel in #3496
ci: Bump minikube version to work with newer K8s version by @terrytangyuan in #3498
upgrade knative to 1.13 by @andyi2it in #3457
Upgrade istio to 1.20 works for the Github Actions by @houshengbo in #3529
chore: Bump ModelMesh version to v0.12.0-rc0 in Helm chart by @terrytangyuan in #3642
upgrade vllm/transformers version by @johnugeorge in #3671

🔨 Project SDLC

Enhance CI environment by @sivanantha321 in #3440
Fixed go lint error using golangci-lint tool. by @andyi2it in #3378
chore: Update list of reviewers by @ckadner in #3484
build: Add helm docs update to make generate command by @terrytangyuan in #3437
Added v2 infer test for supported model frameworks. by @andyi2it in #3349
fix the quote format same with others and docstrings by @leyao-daily in #3490
remove unnecessary Istio settings from quick_install.sh by @peterj in #3493
Remove GOARCH by @mkumatag in #3523
GH Alert: Potential file inclusion via variable by @spolti in #3520
Update codeQL to v3 by @spolti in #3548
switch e2e test inference graph to raw mode by @andyi2it in #3511
Black lint by @cmaddalozzo in #3568
Fix python linter by @sivanantha321 in #3571
build: Add flake8 and black to pre-commit hooks by @terrytangyuan in #3578
build: Allow pre-commit to keep changes in reformatted code by @terrytangyuan in #3604
Allow rerunning failed workflows by comment by @andyi2it in #3550
add re-run info in the PR templates by @spolti in #3633
Add e2e tests for huggingface by @sivanantha321 in #3600
Test image builds for ARM64 arch in CI by @sivanantha321 in #3629
workflow file for cherry-pick on comment by @andyi2it in #3653
Fix: huggingface runtime in helm chart by @yuzisun in #3679
Copy generated CRDs by kustomize to Helm by @Jooho in #3392

CVE patches

CVE-2024-24762 - update fastapi to 0.109.1 by @spolti in #3556
golang.org/x/net Allocation of Resources Without Limits or Throttling by @spolti in #3596
Fix CVE-2023-45288 for qpext by @sivanantha321 in #3618
Security fix - CVE 2024 24786 by @andyi2it in #3585

📝 Documentation Update

qpext: fix a typo in qpext doc by @daixiang0 in #3491
Update KServe project description by @yuzisun in #3524
Update kserve cake diagram by @yuzisun in #3530
Remove white background for the kserve diagram by @yuzisun in #3531
fix a typo in OPENSHIFT_GUIDE.md by @marek-veber in #3544
Fix typo in README.md by @terrytangyuan in #3575
Update Dockerfile and Readme by @gavrishp in #3676
Update huggingface readme by @alexagriffith in #3678

New Contributors

@leyao-daily made their first contribution in #3490
@peterj made their first contribution in #3493
@timothyjlaurent made their first contribution in #3374
@shauryagoel made their first contribution in #3496
@mkumatag made their first contribution in #3523
@marek-veber made their first contribution in #3544
@trojaond made their first contribution in #3481
@grandbora made their first contribution in #3590
@saileshd1402 made their first contribution in #3657
@Datta0 made their first contribution in #3613
@alpe made their first contribution in #3429

Full Changelog: v0.12.1...v0.13.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.13.0

🌈 What's New?

⚠️ What's Changed

🐛 What's Fixed

⬆️ Version Upgrade

🔨 Project SDLC

CVE patches

📝 Documentation Update

New Contributors

Contributors