Python SDK for KServe Server and Client.
KServe Python SDK can be installed by pip
or poetry
.
pip install kserve
To install Kserve with storage support
pip install kserve[storage]
Install via Poetry.
make dev_install
To install Kserve with storage support
poetry install -E storage
or
poetry install --extras "storage"
KServe's python server libraries implement a standardized library that is extended by model serving frameworks such as Scikit Learn, XGBoost and PyTorch. It encapsulates data plane API definitions and storage retrieval for models.
It provides many functionalities, including among others:
- Registering a model and starting the server
- Prediction Handler
- Pre/Post Processing Handler
- Liveness Handler
- Readiness Handlers
It supports the following storage providers:
- Google Cloud Storage with a prefix: "gs://"
- By default, it uses
GOOGLE_APPLICATION_CREDENTIALS
environment variable for user authentication. - If
GOOGLE_APPLICATION_CREDENTIALS
is not provided, anonymous client will be used to download the artifacts.
- By default, it uses
- S3 Compatible Object Storage with a prefix "s3://"
- By default, it uses
S3_ENDPOINT
,AWS_ACCESS_KEY_ID
, andAWS_SECRET_ACCESS_KEY
environment variables for user authentication.
- By default, it uses
- Azure Blob Storage with the format: "https://{$STORAGE_ACCOUNT_NAME}.blob.core.windows.net/{$CONTAINER}/{$PATH}"
- By default, it uses anonymous client to download the artifacts.
- For e.g. https://kfserving.blob.core.windows.net/triton/simple_string/
- Local filesystem either without any prefix or with a prefix "file://". For example:
- Absolute path:
/absolute/path
orfile:///absolute/path
- Relative path:
relative/path
orfile://relative/path
- For local filesystem, we recommended to use relative path without any prefix.
- Absolute path:
- Persistent Volume Claim (PVC) with the format "pvc://{$pvcname}/[path]".
- The
pvcname
is the name of the PVC that contains the model. - The
[path]
is the relative path to the model on the PVC. - For e.g.
pvc://mypvcname/model/path/on/pvc
- The
- Generic URI, over either
HTTP
, prefixed withhttp://
orHTTPS
, prefixed withhttps://
. For example:https://<some_url>.com/model.joblib
http://<some_url>.com/model.joblib
For latency metrics, send a request to /metrics
. Prometheus latency histograms are emitted for each of the steps (pre/postprocessing, explain, predict).
Additionally, the latencies of each step are logged per request.
Metric Name | Description | Type |
---|---|---|
request_preprocess_seconds | pre-processing request latency | Histogram |
request_explain_seconds | explain request latency | Histogram |
request_predict_seconds | prediction request latency | Histogram |
request_postprocess_seconds | pre-processing request latency | Histogram |
KServe's python client interacts with KServe control plane APIs for executing operations on a remote KServe cluster, such as creating, patching and deleting of a InferenceService instance. See the Sample for Python SDK Client to get started.
Please review KServe Client API docs.
- KnativeAddressable
- KnativeCondition
- KnativeURL
- KnativeVolatileTime
- NetUrlUserinfo
- V1alpha1InferenceGraph
- V1alpha1InferenceGraphList
- V1alpha1InferenceGraphSpec
- V1alpha1InferenceGraphStatus
- V1alpha1InferenceRouter
- V1alpha1InferenceStep
- V1alpha1InferenceTarget
- V1beta1AlibiExplainerSpec
- V1beta1Batcher
- V1beta1ComponentExtensionSpec
- V1beta1ComponentStatusSpec
- V1beta1CustomExplainer
- V1beta1CustomPredictor
- V1beta1CustomTransformer
- V1beta1ExplainerConfig
- V1beta1ExplainerSpec
- V1beta1ExplainersConfig
- V1beta1InferenceService
- V1beta1InferenceServiceList
- V1beta1InferenceServiceSpec
- V1beta1InferenceServiceStatus
- V1beta1InferenceServicesConfig
- V1beta1IngressConfig
- V1beta1LoggerSpec
- V1beta1ModelSpec
- V1beta1ONNXRuntimeSpec
- V1beta1PodSpec
- V1beta1PredictorConfig
- V1beta1PredictorExtensionSpec
- V1beta1PredictorSpec
- V1beta1PredictorsConfig
- V1beta1SKLearnSpec
- V1beta1TFServingSpec
- V1beta1TorchServeSpec
- V1beta1TrainedModel
- V1beta1TrainedModelList
- V1beta1TrainedModelSpec
- V1beta1TrainedModelStatus
- V1beta1TransformerConfig
- V1beta1TransformerSpec
- V1beta1TransformersConfig
- V1beta1TritonSpec
- V1beta1XGBoostSpec