You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Configure and deploy an equivalent Weaviate Fargate service
Add a ALB listener and target group for HTTP/gRPC, changing nothing else about the service
What is the expected behavior?
Weaviate should start without issues, just as it does locally using a docker compose file generated via the configurator
Weaviate should also show log entries, indicating that Weaviate is running on port 8080 and 50051 (this usually takes ~3 mins from our experience on Fargate):
"grpc server listening at [::]:50051"
"Serving weaviate at http://[::]:8080"
What is the actual behavior?
Weaviate seems to get stuck on the following log line: "completed registering modules", Weaviate never fulfills requests made to port 8080 and 50051
Weaviate never progresses beyond the log line: "completed registering modules", even if left for much longer than 3 minutes
The ALB marks the tasks as unhealthy, making it impossible to deploy the standard Weaviate image behind an ALB
Supporting information
We saw a similar issue when attempting to implement the HTTP listener on an ALB. Using the HTTP healthcheck endpoint available in Weaviate caused similar issues where the service seemed to hang. We worked around this by running health-checker alongside Weaviate on a different port, and using this as a healthcheck for the target group:
http_health_check() {
# listen for health checks on port 8081, execute a dummy command in the
# background that's always true
exec /opt/health-checker --log-level warning \
--listener "0.0.0.0:8081" \
--port 8081 \
--script "true"
}
This allowed Weaviate to start up properly, however this isn't ideal for two reasons:
This doesn't check the health of Weaviate
This requires us to build a new image, complicating our tooling
To avoid adding too much to our Weaviate image, we attempted to bypass the same issue with the gRPC port by delaying port forwarding to Weaviate by 5 mins:
grpc_check_with_delay() {
start_delay=300
listen_port=50052
forward_to="localhost:50051"
echo "Delaying GRPC healthcheck for ${start_delay} seconds to allow Weaviate to start"
sleep "${start_delay}"
echo "Delay for ${start_delay} seconds complete, initialising port forwarding"
exec socat "TCP-LISTEN:${listen_port},fork,reuseaddr" "TCP:${forward_to}"
}
We were optimistic that this delay in forwarding traffic to Weaviate would work, but unfortunately it didn't. We're not sure why? We see Weaviate hanging at the same log line, even when the ALB is configured on a different port it still seems like something is still interfering with Weaviate's startup?
The reason we're confident that this interference is the root cause is that we found that the Weaviate helm charts have quite a substantial delay to their startup and liveness probes, at 300 and 900 seconds respectively! This is a really large number that's been explicitly set, so our understanding is that this is a known issue with Weaviate's startup? Unfortunately health checks are required in ALBs where their target is a Fargate service, and there's no way to delay when healthchecks start 😢 We wanted to ask if there's any better way that folks know of to work around this issue 🙏
TIA, and LMK if you need additional info!
gRPC workaround
EDIT: So I performed a similar workaround for the HTTP healthcheck for gRPC and spun up a gRPC healthcheck on a completely separate port, and this appears to work now! Here is the snippet from the Weaviate Dockerfile:
...
&& git clone -b v1.68.0 --depth 1 https://github.com/grpc/grpc-go \
&& cd grpc-go/examples/helloworld/greeter_server \
&& go build main.go \
&& mv main /opt/grpc-health-checker \
&& chmod +x /opt/grpc-health-checker
...
Simply changing Weaviate to use port 50052 (since the helloworld example in grpc-go uses 50051) and changing the ALB healthcheck to reference the helloworld example is enough to get Weaviate to start up correctly. I'm not clear why the port forwarding implementation doesn't work, so I'm hoping for some clarity here!
How to reproduce this bug?
What is the expected behavior?
What is the actual behavior?
Supporting information
We saw a similar issue when attempting to implement the HTTP listener on an ALB. Using the HTTP healthcheck endpoint available in Weaviate caused similar issues where the service seemed to hang. We worked around this by running health-checker alongside Weaviate on a different port, and using this as a healthcheck for the target group:
This allowed Weaviate to start up properly, however this isn't ideal for two reasons:
To avoid adding too much to our Weaviate image, we attempted to bypass the same issue with the gRPC port by delaying port forwarding to Weaviate by 5 mins:
We were optimistic that this delay in forwarding traffic to Weaviate would work, but unfortunately it didn't. We're not sure why? We see Weaviate hanging at the same log line, even when the ALB is configured on a different port it still seems like something is still interfering with Weaviate's startup?
The reason we're confident that this interference is the root cause is that we found that the Weaviate helm charts have quite a substantial delay to their startup and liveness probes, at 300 and 900 seconds respectively! This is a really large number that's been explicitly set, so our understanding is that this is a known issue with Weaviate's startup? Unfortunately health checks are required in ALBs where their target is a Fargate service, and there's no way to delay when healthchecks start 😢 We wanted to ask if there's any better way that folks know of to work around this issue 🙏
TIA, and LMK if you need additional info!
gRPC workaround
EDIT: So I performed a similar workaround for the HTTP healthcheck for gRPC and spun up a gRPC healthcheck on a completely separate port, and this appears to work now! Here is the snippet from the Weaviate Dockerfile:
Simply changing Weaviate to use port 50052 (since the helloworld example in
grpc-go
uses 50051) and changing the ALB healthcheck to reference the helloworld example is enough to get Weaviate to start up correctly. I'm not clear why the port forwarding implementation doesn't work, so I'm hoping for some clarity here!Server Version
1.25.10
Weaviate Setup
Single Node
Nodes count
1
Code of Conduct
The text was updated successfully, but these errors were encountered: