[Bug]: Internal Server Error When Fetching More Than 60 Traces in Jaeger Using OpenSearch Backend #5825
Open
Description
What happened?
I’m experiencing an issue when querying Jaeger for traces using an OpenSearch backend. When the query is limited to 60 traces, everything works as expected. However, when trying to fetch more than 60 traces, I receive an "Internal Server Error." Interestingly, I manually hit the OpenSearch _msearch API with 500 traces, and it returned a 200 status, indicating that OpenSearch itself is capable of handling larger queries. This suggests that the issue may be related to how Jaeger is interacting with OpenSearch.
Steps to reproduce
- Deploy Jaeger with an OpenSearch backend.
- Query for traces with a limit of 60. The query succeeds.
- Increase the limit to more than 60 traces.
- Observe the "Internal Server Error" response.
Expected behavior
Jaeger should successfully return more than 60 traces without encountering an internal server error.
Relevant log output
jaeger logs:
2024-08-11T06:52:02.699771596Z stderr F {"level":"error","ts":1723359122.6995814,"caller":"app/http_handler.go:505","msg":"HTTP handler, Internal Server Error","error":"elastic: Error 502 (Bad Gateway)","stacktrace":"github.com/jaegertracing/jaeger/cmd/query/app.(*APIHandler).handleError\n\tgithub.com/jaegertracing/jaeger/cmd/query/app/http_handler.go:505\ngithub.com/jaegertracing/jaeger/cmd/query/app.(*APIHandler).search\n\tgithub.com/jaegertracing/jaeger/cmd/query/app/http_handler.go:260\nnet/http.HandlerFunc.ServeHTTP\n\tnet/http/server.go:2171\ngithub.com/jaegertracing/jaeger/cmd/query/app.(*APIHandler).handleFunc.traceResponseHandler.func2\n\tgithub.com/jaegertracing/jaeger/cmd/query/app/http_handler.go:549\nnet/http.HandlerFunc.ServeHTTP\n\tnet/http/server.go:2171\ngo.opentelemetry.io/contrib/instrumentation/net/http/otelhttp.WithRouteTag.func1\n\tgo.opentelemetry.io/contrib/instrumentation/net/http/otelhttp@v0.53.0/handler.go:256\nnet/http.HandlerFunc.ServeHTTP\n\tnet/http/server.go:2171\ngo.opentelemetry.io/contrib/instrumentation/net/http/otelhttp.(*middleware).serveHTTP\n\tgo.opentelemetry.io/contrib/instrumentation/net/http/otelhttp@v0.53.0/handler.go:218\ngo.opentelemetry.io/contrib/instrumentation/net/http/otelhttp.NewMiddleware.func1.1\n\tgo.opentelemetry.io/contrib/instrumentation/net/http/otelhttp@v0.53.0/handler.go:74\nnet/http.HandlerFunc.ServeHTTP\n\tnet/http/server.go:2171\nnet/http.HandlerFunc.ServeHTTP\n\tnet/http/server.go:2171\ngithub.com/gorilla/mux.(*Router).ServeHTTP\n\tgithub.com/gorilla/mux@v1.8.1/mux.go:212\ngithub.com/jaegertracing/jaeger/cmd/query/app.createHTTPServer.additionalHeadersHandler.func4\n\tgithub.com/jaegertracing/jaeger/cmd/query/app/additional_headers_handler.go:28\nnet/http.HandlerFunc.ServeHTTP\n\tnet/http/server.go:2171\ngithub.com/jaegertracing/jaeger/cmd/query/app.createHTTPServer.CompressHandler.CompressHandlerLevel.func6\n\tgithub.com/gorilla/handlers@v1.5.2/compress.go:141\nnet/http.HandlerFunc.ServeHTTP\n\tnet/http/server.go:2171\ngithub.com/gorilla/handlers.recoveryHandler.ServeHTTP\n\tgithub.com/gorilla/handlers@v1.5.2/recovery.go:80\nnet/http.serverHandler.ServeHTTP\n\tnet/http/server.go:3142\nnet/http.(*conn).serve\n\tnet/http/server.go:2044"}
Opensearch logs:
TaskCancelledException[The parent task was cancelled, shouldn't start any child tasks, channel closed] at org.opensearch.tasks.TaskManager$CancellableTaskHolder.registerChildNode(TaskManager.java:671) at org.opensearch.tasks.TaskManager.registerChildNode(TaskManager.java:344) at org.opensearch.action.support.TransportAction.registerChildNode(TransportAction.java:78) at org.opensearch.action.support.TransportAction.execute(TransportAction.java:97) at org.opensearch.client.node.NodeClient.executeLocally(NodeClient.java:112) at org.opensearch.client.node.NodeClient.doExecute(NodeClient.java:99) at org.opensearch.client.support.AbstractClient.execute(AbstractClient.java:476) at org.opensearch.client.support.AbstractClient.search(AbstractClient.java:607) at org.opensearch.action.search.TransportMultiSearchAction.executeSearch(TransportMultiSearchAction.java:180) at org.opensearch.action.search.TransportMultiSearchAction$1.handleResponse(TransportMultiSearchAction.java:203) at org.opensearch.action.search.TransportMultiSearchAction$1.onFailure(TransportMultiSearchAction.java:188) at org.opensearch.action.support.TransportAction$1.onFailure(TransportAction.java:124) at org.opensearch.core.action.ActionListener$5.onFailure(ActionListener.java:277) at org.opensearch.action.search.AbstractSearchAsyncAction.raisePhaseFailure(AbstractSearchAsyncAction.java:797) at org.opensearch.action.search.AbstractSearchAsyncAction.onPhaseFailure(AbstractSearchAsyncAction.java:770) at org.opensearch.action.search.FetchSearchPhase$1.onFailure(FetchSearchPhase.java:127) at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:54) at org.opensearch.threadpool.TaskAwareRunnable.doRun(TaskAwareRunnable.java:78) at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:52) at org.opensearch.common.util.concurrent.TimedRunnable.doRun(TimedRunnable.java:59) at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:941) at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:52) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642) at java.base/java.lang.Thread.run(Thread.java:1583)
Envoy logs:
2024-08-11T06:52:10.495976131Z stdout F [2024-08-11T06:51:59.504Z] "GET /_msearch?rest_total_hits_as_int=true HTTP/1.1" 502 UPE 165089 87 3194 - "-" "elastic/6.2.37 (linux-amd64)" "0b9806da-fc57-4572-b860-3c31a31b922a"
### Screenshot
_No response_
### Additional context
_No response_
### Jaeger backend version
v1.58.0
### SDK
_No response_
### Pipeline
_No response_
### Stogage backend
OpenSearch 2.15
### Operating system
_No response_
### Deployment model
_No response_
### Deployment configs
_No response_