Skip to content

[Bug]: CLI Flags for es and es-archive unintentionally affect each otherΒ #3948

Open
@naveedyahyazadeh

Description

What happened?

If the values of es.num-shards and es-archive.num-shards don't match, indices will alternate the number of shards to both mismatched values based on which CLI flag got executed most recently. The same can be said for replicas. This is due to the fact that Jaeger primary and Jaeger archive index types are both created using the same pattern, and CLI flags for both the es and es-archive namespaces update the same pattern causing a race condition.

Steps to reproduce

  1. Set the CLI flags es-archive.num-shards and es.num-shards to different values
  2. Let a deployment spin up with mismatched values for the Jaeger Query/Ingester pods ( for example, set es-archive.num-shards to 5 on Query and es.num-shards to 6 on Ingester. )
  3. After the deployment is up, you can trigger an ElasticSearch index template updated by restarting either the Query or Ingester pod. Restarting the Query or Ingester pod will cause the value of es-archive.num-shards or es.num-shards to take precedence respectively
  4. You can view the current state of the ES index template by going running GET /_template/jaeger-span against ES (I was able to use Kibana Dev Tools here to do so, but I believe the command can be run directly as well). A key "number_of_shards" will be visible and the value will alternate based on which pod was restarted most recently. This template will then spin up new indices with whatever value is specified at the time

Expected behavior

es-archive.num-shards and es-archive.num-replicas should not have any effect on the shard counts on the primary indices, and vice versa.

Relevant log output

No response

Screenshot

This screenshot shows our primary indices being restarted with alternating shard counts based on which CLI flag was called most recently (7 for primary vs 5 for archive).
image

Additional context

Here's a PR to de-couple the primary and archive index patterns: #3947

Jaeger backend version

v1.30.0

SDK

No response

Pipeline

Collecter -> Kafka -> Ingester -> ES

Stogage backend

ES 7.16.1

Operating system

Linux

Deployment model

Kubernetes

Deployment configs

No response

Metadata

Assignees

No one assigned

    Labels

    bugstaleThe issue/PR has become stale and may be auto-closed

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions