Skip to content

[Stack Monitoring] PoC for kibana instrumentation using opentelemetry metrics sdk #128755

Closed
@matschaffer

Description

We discussed a number of possible implementations for ongoing kibana instrumentations in (internal) https://github.com/elastic/observability-dev/issues/2054

In this issue we'll build a proof of concept for how that might work.

Here are the two options we'd like to PoC on. They should both be very similar at the code level, the main difference is the collection mechanism (pull from metricbeat vs push to apm-server).

option 2: OpenTelemetry Metrics API prometheus endpoint with Elastic Agent prometheus input

Here we use the official otel metrics sdk and expose that via prometheus protocol for elastic-agent to poll via the underlying metricbeat prometheus module.

graph LR

subgraph ElasticDeployment["Elastic Deployment"]
  subgraph kibana
    OtelMetricsSDK["Otel Metrics SDK"]
    OtelMetricsPrometheusExporter["/metrics (prometheus-protocol)"]
    OtelMetricsSDK-->OtelMetricsPrometheusExporter

    click OtelMetricsSDK "https://opentelemetry.io/docs/instrumentation/js/getting-started/nodejs/#metrics"
  end

  subgraph elastic-agent
    Metricbeat["metrics/prometheus"]
  end

  Metricbeat-->|"poll (prometheus protocol)"|OtelMetricsPrometheusExporter
  Metricbeat-->|_bulk|elasticsearch
end
Loading

option 3: OpenTelemetry Metrics API exported as OpenTelemetry Protocol

Here we use the official otel metrics sdk and push that via OpenTelemetry Protocol. OpenTelemetry Protocol is natively supported by Elastic APM so we use that to receive the data. There are some caveats for otel collection, but none of them should hinder the collection of platform observability metrics today.

Ideally this apm-server is managed by elastic-agent, but that work is still TBD. See 2022-01 - Elastic Agent Pipeline Runtime Environment for latest info.

graph LR

subgraph ElasticDeployment["Elastic Deployment"]
  subgraph kibana
    OtelMetricsSDK["Otel Metrics SDK"]
  end

  subgraph elastic-agent
    APMServer["apm-server"]
  end

  OtelMetricsSDK-->|"push (OTLP)"|APMServer["apm-server"]
  APMServer-->|_bulk|elasticsearch
end
Loading

Some consumers to keep in mind (see internal companion issue):

  • Stack Monitoring
  • High Level Health API
  • APM instrumentation of stack
  • Telemetry (Event based telemetry) - could maybe leave this as it's own entity, the above are more critical to align

Steps

AC: Recording of PoC as walkthrough

Metadata

Assignees

Labels

Feature:Stack MonitoringPlatform ObservabilityPlatform Observability WG issues https://github.com/elastic/observability-dev/issues/2055Team:Infra Monitoring UI - DEPRECATEDDEPRECATED - Label for the Infra Monitoring UI team. Use Team:obs-ux-infra_services

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions