Skip to content

[Bug]: Internal Server Error When Fetching More Than 60 Traces in Jaeger Using OpenSearch Backend #5825

Open
@raman-goel

Description

What happened?

I’m experiencing an issue when querying Jaeger for traces using an OpenSearch backend. When the query is limited to 60 traces, everything works as expected. However, when trying to fetch more than 60 traces, I receive an "Internal Server Error." Interestingly, I manually hit the OpenSearch _msearch API with 500 traces, and it returned a 200 status, indicating that OpenSearch itself is capable of handling larger queries. This suggests that the issue may be related to how Jaeger is interacting with OpenSearch.

Steps to reproduce

  1. Deploy Jaeger with an OpenSearch backend.
  2. Query for traces with a limit of 60. The query succeeds.
  3. Increase the limit to more than 60 traces.
  4. Observe the "Internal Server Error" response.

Expected behavior

Jaeger should successfully return more than 60 traces without encountering an internal server error.

Relevant log output

jaeger logs:

2024-08-11T06:52:02.699771596Z stderr F {"level":"error","ts":1723359122.6995814,"caller":"app/http_handler.go:505","msg":"HTTP handler, Internal Server Error","error":"elastic: Error 502 (Bad Gateway)","stacktrace":"github.com/jaegertracing/jaeger/cmd/query/app.(*APIHandler).handleError\n\tgithub.com/jaegertracing/jaeger/cmd/query/app/http_handler.go:505\ngithub.com/jaegertracing/jaeger/cmd/query/app.(*APIHandler).search\n\tgithub.com/jaegertracing/jaeger/cmd/query/app/http_handler.go:260\nnet/http.HandlerFunc.ServeHTTP\n\tnet/http/server.go:2171\ngithub.com/jaegertracing/jaeger/cmd/query/app.(*APIHandler).handleFunc.traceResponseHandler.func2\n\tgithub.com/jaegertracing/jaeger/cmd/query/app/http_handler.go:549\nnet/http.HandlerFunc.ServeHTTP\n\tnet/http/server.go:2171\ngo.opentelemetry.io/contrib/instrumentation/net/http/otelhttp.WithRouteTag.func1\n\tgo.opentelemetry.io/contrib/instrumentation/net/http/otelhttp@v0.53.0/handler.go:256\nnet/http.HandlerFunc.ServeHTTP\n\tnet/http/server.go:2171\ngo.opentelemetry.io/contrib/instrumentation/net/http/otelhttp.(*middleware).serveHTTP\n\tgo.opentelemetry.io/contrib/instrumentation/net/http/otelhttp@v0.53.0/handler.go:218\ngo.opentelemetry.io/contrib/instrumentation/net/http/otelhttp.NewMiddleware.func1.1\n\tgo.opentelemetry.io/contrib/instrumentation/net/http/otelhttp@v0.53.0/handler.go:74\nnet/http.HandlerFunc.ServeHTTP\n\tnet/http/server.go:2171\nnet/http.HandlerFunc.ServeHTTP\n\tnet/http/server.go:2171\ngithub.com/gorilla/mux.(*Router).ServeHTTP\n\tgithub.com/gorilla/mux@v1.8.1/mux.go:212\ngithub.com/jaegertracing/jaeger/cmd/query/app.createHTTPServer.additionalHeadersHandler.func4\n\tgithub.com/jaegertracing/jaeger/cmd/query/app/additional_headers_handler.go:28\nnet/http.HandlerFunc.ServeHTTP\n\tnet/http/server.go:2171\ngithub.com/jaegertracing/jaeger/cmd/query/app.createHTTPServer.CompressHandler.CompressHandlerLevel.func6\n\tgithub.com/gorilla/handlers@v1.5.2/compress.go:141\nnet/http.HandlerFunc.ServeHTTP\n\tnet/http/server.go:2171\ngithub.com/gorilla/handlers.recoveryHandler.ServeHTTP\n\tgithub.com/gorilla/handlers@v1.5.2/recovery.go:80\nnet/http.serverHandler.ServeHTTP\n\tnet/http/server.go:3142\nnet/http.(*conn).serve\n\tnet/http/server.go:2044"}

Opensearch logs:

TaskCancelledException[The parent task was cancelled, shouldn't start any child tasks, channel closed] at org.opensearch.tasks.TaskManager$CancellableTaskHolder.registerChildNode(TaskManager.java:671) at org.opensearch.tasks.TaskManager.registerChildNode(TaskManager.java:344) at org.opensearch.action.support.TransportAction.registerChildNode(TransportAction.java:78) at org.opensearch.action.support.TransportAction.execute(TransportAction.java:97) at org.opensearch.client.node.NodeClient.executeLocally(NodeClient.java:112) at org.opensearch.client.node.NodeClient.doExecute(NodeClient.java:99) at org.opensearch.client.support.AbstractClient.execute(AbstractClient.java:476) at org.opensearch.client.support.AbstractClient.search(AbstractClient.java:607) at org.opensearch.action.search.TransportMultiSearchAction.executeSearch(TransportMultiSearchAction.java:180) at org.opensearch.action.search.TransportMultiSearchAction$1.handleResponse(TransportMultiSearchAction.java:203) at org.opensearch.action.search.TransportMultiSearchAction$1.onFailure(TransportMultiSearchAction.java:188) at org.opensearch.action.support.TransportAction$1.onFailure(TransportAction.java:124) at org.opensearch.core.action.ActionListener$5.onFailure(ActionListener.java:277) at org.opensearch.action.search.AbstractSearchAsyncAction.raisePhaseFailure(AbstractSearchAsyncAction.java:797) at org.opensearch.action.search.AbstractSearchAsyncAction.onPhaseFailure(AbstractSearchAsyncAction.java:770) at org.opensearch.action.search.FetchSearchPhase$1.onFailure(FetchSearchPhase.java:127) at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:54) at org.opensearch.threadpool.TaskAwareRunnable.doRun(TaskAwareRunnable.java:78) at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:52) at org.opensearch.common.util.concurrent.TimedRunnable.doRun(TimedRunnable.java:59) at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:941) at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:52) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642) at java.base/java.lang.Thread.run(Thread.java:1583)

Envoy logs:

2024-08-11T06:52:10.495976131Z stdout F [2024-08-11T06:51:59.504Z] "GET /_msearch?rest_total_hits_as_int=true HTTP/1.1" 502 UPE 165089 87 3194 - "-" "elastic/6.2.37 (linux-amd64)" "0b9806da-fc57-4572-b860-3c31a31b922a"


### Screenshot

_No response_

### Additional context

_No response_

### Jaeger backend version

v1.58.0

### SDK

_No response_

### Pipeline

_No response_

### Stogage backend

OpenSearch 2.15

### Operating system

_No response_

### Deployment model

_No response_

### Deployment configs

_No response_

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions