Skip to content

[Enhancement]: Sagemaker endpoint scaling to zero #40606

Closed
@emanation

Description

Description

It was announcement on 2024 re:Invent that Sagemaker can be scaled down to zero instance if no activity detected.
I was able to reproduce setup following steps in artice using boto3 v1.35.83.
However implementing that with Terraform is not yet possible.
First it returns
Error: expected production_variants.0.managed_instance_scaling.0.min_instance_count to be at least (1), got 0
on configuration like

resource "aws_sagemaker_endpoint_configuration" "sagemaker_configurations" {
  name_prefix = var.endpoint_name
  production_variants {
    variant_name           = "AllTraffic"
    model_name             = aws_sagemaker_model.sagemaker_model.name
    initial_instance_count = 1
    instance_type          = var.sagemaker_instance_type
    managed_instance_scaling {
      status = "ENABLED"
      min_instance_count = 0
      max_instance_count = 1
    }
  }
}

second does not have necessary inference_component resource yet. Though boto3 already has create_inference_component method.

Affected Resource(s) and/or Data Source(s)

aws_sagemaker_endpoint_configuration

Potential Terraform Configuration

resource "aws_sagemaker_interface_component" "sagemaker_component" {
  name          = var.component_name
  endpoint_name = aws_sagemaker_endpoint.sagemaker_endpoint.name
  variant_name  = "AllTraffic"
  specification = {
    model_name = aws_sagemaker_model.sagemaker_model.name
    startup_parameters {
       model_data_download_timeout_in_seconds = 3600,
       container_startup_health_check_timeout_in_seconds = 3600,
    }
    compute_resource_requirements {
       min_memory_required_in_mb = 1024,
       number_of_accelerator_devices_required = 1,
    }
  }
}

Also with component added as resource modelName for configuration should be now optional.

resource "aws_sagemaker_endpoint_configuration" "test_configurations" {
  name = "test-tf-config-zero-scale"
  execution_role_arn = aws_iam_role.iam_for_sagemaker.arn
  production_variants {
    variant_name           = "AllTraffic"
    initial_instance_count = 1
    instance_type          = var.sagemaker_instance_type
    managed_instance_scaling {
      status = "ENABLED"
      min_instance_count = 0
      max_instance_count = 1
    }
  }
}

this returns expected error
Error: Missing required argument. The argument "model_name" is required, but no definition was found.

References

No response

Would you like to implement a fix?

No

Metadata

Assignees

No one assigned

    Labels

    enhancementRequests to existing resources that expand the functionality or scope.service/sagemakerIssues and PRs that pertain to the sagemaker service.

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions