v0.13.0
🌈 What's New?
- add support for async streaming in predict by @alexagriffith in #3475
- Fix: Support model parallelism in HF transformer by @gavrishp in #3459
- Support model revision and tokenizer revision in huggingface server by @lizzzcai in #3558
- OpenAI schema by @tessapham in #3477
- Support OpenAIModel in ModelRepository by @grandbora in #3590
- updated xgboost to support json and ubj models by @andyi2it in #3551
- Add OpenAI API support to Huggingfaceserver by @cmaddalozzo in #3582
- VLLM support for OpenAI Completions in HF server by @gavrishp in #3589
- Add a user friendly error message for http exceptions by @grandbora in #3581
- feat: Provide minimal distribution of CRDs by @terrytangyuan in #3492
- set default SAFETENSORS_FAST_GPU and HF_HUB_DISABLE_TELEMETRY in HF Server by @lizzzcai in #3594
- Enabled the multiple domains support on an inference service by @houshengbo in #3615
- Add base model for proxying request to an OpenAI API enabled model server by @cmaddalozzo in #3621
- Add headers to predictor exception logging by @grandbora in #3658
- Enhance controller setup based on available CRDs by @israel-hdez in #3472
- Add openai models endpoint by @cmaddalozzo in #3666
- feat: Support customizable deployment strategy for RawDeployment mode. Fixes #3452 by @terrytangyuan in #3603
- Enable dtype support for huggingface server by @Datta0 in #3613
- Add method for checking model health/readiness by @cmaddalozzo in #3673
- Unify the log configuration using kserve logger by @sivanantha321 in #3577
- Add the field ResponseStartTimeoutSeconds to create ksvc by @houshengbo in #3705
- Add FP16 datatype support for OIP grpc by @sivanantha321 in #3695
- Add option for returning probabilities in huggingface server by @andyi2it in #3607
⚠️ What's Changed
- Remove conversion webhook from manifests by @Jooho in #3476
- Remove cluster level list/watch for configmaps, serviceaccounts, secrets by @sivanantha321 in #3469
- chore: Remove Seldon Alibi dependencies. Fixes #3380 by @terrytangyuan in #3443
- docs: Move Alibi explainer to docs by @terrytangyuan in #3579
- Remove generate endpoints by @cmaddalozzo in #3654
- Remove conversion webhook from kubeflow manifest patch by @sivanantha321 in #3700
🐛 What's Fixed
- Fix:Support Parallelism in vllm runtime by @gavrishp in #3464
- fix: Instantiate HuggingfaceModelRepository only when model cannot be loaded. Fixes #3423 by @terrytangyuan in #3424
- Fix isADirectoryError in Azure blob download by @tjandy98 in #3502
- Fix bug: Remove redundant helm chart affinity on predictor CRD by @trojaond in #3481
- Make the modelcar injection idempotent by @rhuss in #3517
- Only pad left for decode-only architecture models. by @sivanantha321 in #3534
- fix lint typo on Makefile by @spolti in #3569
- fix: Set writable cache folder to avoid permission issue. Fixes #3562 by @terrytangyuan in #3576
- Fix model unload in server stop method by @sivanantha321 in #3587
- Fix golint errors by @andyi2it in #3552
- Fix make deploy-dev-storage-initializer not working by @sivanantha321 in #3617
- Fix Pydantic 2 warnings by @cmaddalozzo in #3622
- build: Fix CRD copying in generate-install.sh by @terrytangyuan in #3620
- Only load from model repository if model binary is not found under model_dir by @sivanantha321 in #3559
- build: Remove misleading logs from minimal-crdgen.sh by @terrytangyuan in #3641
- Assign device to input tensors in huggingface server with huggingface backend by @saileshd1402 in #3657
- Fix Huggingface server stopping criteria by @cmaddalozzo in #3659
- Explicitly specify pad token id when generating tokens by @sivanantha321 in #3565
- Fix quick install does not cleans up Istio installer by @sivanantha321 in #3660
- fix for extract zip from gcs by @andyi2it in #3510
- fix: HPA equality check should include annotations by @terrytangyuan in #3650
- Fix: model id and model dir check order by @yuzisun in #3680
- Fix:vLLM Model Supported check throwing circular dependency by @gavrishp in #3688
- Fix: Allow null in Finish reason streaming response in vLLM by @gavrishp in #3684
- Fix kserve version is not updated properly by python-release.sh by @sivanantha321 in #3707
- Add precaution again running v1 endpoints on openai models by @grandbora in #3694
- Typos and minor fixes by @alpe in #3429
- Fix model_id and model_dir precedence for vLLM by @yuzisun in #3718
- Fixup max_length for HF and model info for vLLM by @Datta0 in #3715
- Fix prompt token count and provide completion usage in OpenAI response by @sivanantha321 in #3712
⬆️ Version Upgrade
- Upgrade orjson to version 3.9.15 by @spolti in #3488
- feat: upgrade to new fastapi, update models to handle both pydantic v… by @timothyjlaurent in #3374
- Update cert manager version in quick install script by @shauryagoel in #3496
- ci: Bump minikube version to work with newer K8s version by @terrytangyuan in #3498
- upgrade knative to 1.13 by @andyi2it in #3457
- Upgrade istio to 1.20 works for the Github Actions by @houshengbo in #3529
- chore: Bump ModelMesh version to v0.12.0-rc0 in Helm chart by @terrytangyuan in #3642
- upgrade vllm/transformers version by @johnugeorge in #3671
🔨 Project SDLC
- Enhance CI environment by @sivanantha321 in #3440
- Fixed go lint error using golangci-lint tool. by @andyi2it in #3378
- chore: Update list of reviewers by @ckadner in #3484
- build: Add helm docs update to make generate command by @terrytangyuan in #3437
- Added v2 infer test for supported model frameworks. by @andyi2it in #3349
- fix the quote format same with others and docstrings by @leyao-daily in #3490
- remove unnecessary Istio settings from quick_install.sh by @peterj in #3493
- Remove GOARCH by @mkumatag in #3523
- GH Alert: Potential file inclusion via variable by @spolti in #3520
- Update codeQL to v3 by @spolti in #3548
- switch e2e test inference graph to raw mode by @andyi2it in #3511
- Black lint by @cmaddalozzo in #3568
- Fix python linter by @sivanantha321 in #3571
- build: Add flake8 and black to pre-commit hooks by @terrytangyuan in #3578
- build: Allow pre-commit to keep changes in reformatted code by @terrytangyuan in #3604
- Allow rerunning failed workflows by comment by @andyi2it in #3550
- add re-run info in the PR templates by @spolti in #3633
- Add e2e tests for huggingface by @sivanantha321 in #3600
- Test image builds for ARM64 arch in CI by @sivanantha321 in #3629
- workflow file for cherry-pick on comment by @andyi2it in #3653
- Fix: huggingface runtime in helm chart by @yuzisun in #3679
- Copy generated CRDs by kustomize to Helm by @Jooho in #3392
CVE patches
- CVE-2024-24762 - update fastapi to 0.109.1 by @spolti in #3556
- golang.org/x/net Allocation of Resources Without Limits or Throttling by @spolti in #3596
- Fix CVE-2023-45288 for qpext by @sivanantha321 in #3618
- Security fix - CVE 2024 24786 by @andyi2it in #3585
📝 Documentation Update
- qpext: fix a typo in qpext doc by @daixiang0 in #3491
- Update KServe project description by @yuzisun in #3524
- Update kserve cake diagram by @yuzisun in #3530
- Remove white background for the kserve diagram by @yuzisun in #3531
- fix a typo in OPENSHIFT_GUIDE.md by @marek-veber in #3544
- Fix typo in README.md by @terrytangyuan in #3575
- Update Dockerfile and Readme by @gavrishp in #3676
- Update huggingface readme by @alexagriffith in #3678
New Contributors
- @leyao-daily made their first contribution in #3490
- @peterj made their first contribution in #3493
- @timothyjlaurent made their first contribution in #3374
- @shauryagoel made their first contribution in #3496
- @mkumatag made their first contribution in #3523
- @marek-veber made their first contribution in #3544
- @trojaond made their first contribution in #3481
- @grandbora made their first contribution in #3590
- @saileshd1402 made their first contribution in #3657
- @Datta0 made their first contribution in #3613
- @alpe made their first contribution in #3429
Full Changelog: v0.12.1...v0.13.0