Skip to content

Kubernetes operator that alerts based on Elasticsearch log queries

License

Notifications You must be signed in to change notification settings

prosimcorp/searchruler

Repository files navigation

SearchRuler

🔥 Meet searchruler: The Log Alerting Engine You Didn’t Know You Needed!

SearchRuler Logo (Main) logo.

GitHub Release GitHub go.mod Go version Go Report Card image pulls GitHub License

GitHub User's stars GitHub followers

Ever wished Prometheus Ruler had a cool cousin for log searches? Say hello to searchruler! This Kubernetes operator lets you define, run, and manage log search rules (alerts) for platforms like Elasticsearch or Opensearch—all from the comfort of your K8s cluster. 🚀

Think of it as the rule engine your logs have been craving.

And here’s the best part: defining alerts with searchruler is totally free (ehem, ehem…) and, yes, everything is as code! You get to send webhook notifications wherever you want, just like Alertmanager. Flexibility, power, and no sneaky fees.

Your logs are about to get a whole lot smarter. 💡

Motivation

🕵️‍♂️ Say Goodbye to Expensive Log Alert Subscriptions!

Tired of shelling out big bucks for premium log alerting features? You know the drill: Want to set up rules or get notified? Pay up. Want to avoid endless click, click in a fancy UI? Too bad.

Well, no more! searchruler is here to save the day. This Kubernetes operator lets you define connectors, webhooks, rules, and alerts—right in your own cluster. And the best part? It’s free and code-driven! Finally, you can version-control your alerts like a pro. 🎉

🛠️ How It Works

Setting up searchruler is a breeze. Here are the three main building blocks that’ll make your log life so much easier:

  • 🔗 QueryConnector: This is where the magic starts. Connect to your log source—whether it’s Elasticsearch, Opensearch, or something cool we’re cooking up for the future.

  • 🚀 RulerAction: When a rule is triggered, where should the alert go? Set up webhooks, Slack channels, or anything else you need. We keep it simple, starting with a generic webhook (because everyone loves webhooks).

  • 📜 SearchRule: The heart of it all! Define your rules, set the conditions, and craft the message to send when something’s off. This is where you turn log data into actionable alerts.

🎉 Ready to Rule Your Logs?

No more hidden fees. No more manual clicks. Just pure, versioned, code-driven log alerting—right in Kubernetes. 🚀

Deployment

We have designed the deployment of this project to allow remote deployment using Kustomize or Helm. This way it is possible to use it with a GitOps approach, using tools such as ArgoCD or FluxCD.

If you prefer Kustomize, just make a Kustomization manifest referencing the tag of the version you want to deploy as follows:

apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
resources:
- https://github.com/prosimcorp/searchruler/releases/download/v0.1.0/install.yaml

🧚🏼 Hey, listen! If you prefer to deploy using Helm, go to the Helm registry

Flags

Some configuration parameters can be defined by flags that can be passed to the controller. They are described in the following table:

Name Description Default
--metrics-bind-address The address the metric endpoint binds to.
0 disables the server
0
--health-probe-bind-address he address the probe endpoint binds to :8081
--leader-elect Enable leader election for controller manager false
--metrics-secure If set the metrics endpoint is served securely false
--enable-http2 If set, HTTP/2 will be enabled for the metrirs false
--webserver-address Webserver listen address.
0 disables the webserver
0

Examples

After deploying this operator, you will have new resources available. Let's talk about them.

Tip

You can find the spec samples for all the versions of the resource in the examples directory

🔗 QueryConnector

A QueryConnector is where it all starts! It defines the "source" where your log search rules (defined in SearchRules) will run. Right now, searchruler supports Elasticsearch-like sources, but we’re cooking up more integrations—stay tuned! 👀

Here’s a quick example to show you how it works:

apiVersion: searchruler.prosimcorp.com/v1alpha1
kind: QueryConnector
metadata:
  labels:
    app.kubernetes.io/name: search-ruler
    app.kubernetes.io/managed-by: kustomize
  name: queryconnector-sample
spec:

  # URL for the query connector. We will execute the queries in this URL
  url: "https://127.0.0.1:9200"

  # Additional headers if needed for the connection
  headers: {}

  # Skip certificate verification if the connection is HTTPS
  tlsSkipVerify: true

  # Secret reference to get the credentials if needed for the connection
  credentials:

    # Interval to check secret credentials for any changes
    # Default value is 1m
    #syncInterval: 1m

    secretRef:
      name: elasticsearch-main-credentials
      keyUsername: username
      keyPassword: password

🚀 RulerAction

A RulerAction defines where your alerts will be sent when a SearchRule is triggered (a.k.a. "firing"). Whether it’s a Slack channel, a webhook endpoint, alertmanager or another notification service—you’re in control! 🛠️

Here’s a quick example:

apiVersion: searchruler.prosimcorp.com/v1alpha1
kind: RulerAction
metadata:
  labels:
    app.kubernetes.io/name: search-ruler
    app.kubernetes.io/managed-by: kustomize
  name: ruleraction-sample
spec:

  # Webhook integration configuration to send alerts.
  # Note: The webhook integration is the only one implemented yet.
  webhook:

    # URL to send the webhook message
    url: http://127.0.0.1:8080

    # HTTP method to send the webhook message
    verb: POST

    # Skip certificate verification if the connection is HTTPS
    tlsSkipVerify: false

    # Additional headers if needed for the connection
    headers: {}

    # Validator configuration to validate the response of the webhook
    # Just alertmanager validation available yet.
    # If you use alertmanager validator, message data must be in alertmanager format:
    # https://prometheus.io/docs/alerting/latest/clients/
    # validator: alertmanager

    # Credentials to authenticate in the webhook if needed
    # credentials:   
    #   secretRef:
    #     name: alertmanager-credentials
    #     keyUsername: username
    #     keyPassword: password

📜 SearchRule

This is where the magic happens! SearchRules define the conditions to check in your log sources (via queryconnectors) and specify where to send alerts (using ruleractions). You get to decide what matters and how to act on it. 🎯

Here are two quick examples to show you what’s possible:

1️⃣ Simple Match Count Alert. Trigger an alert when the number of matching documents exceeds a threshold:

apiVersion: searchruler.prosimcorp.com/v1alpha1
kind: SearchRule
metadata:
  labels:
    app.kubernetes.io/name: search-ruler
    app.kubernetes.io/managed-by: kustomize
  name: searchrule-sample
spec:

  # Description for the Rule. It is not used in the rule execution, but is useful for the
  # message template in the RuleAction.
  description: "Alert when there are a high error rate in the application."

  # QueryConnector reference to execute the queries for the rule evaluation.
  queryConnectorRef:
    name: queryconnector-sample

  # Interval time for checking the value of the query. For example, every 30s we will
  # execute the query value to elasticsearch
  checkInterval: 30s

  # Elasticsearch configuration for the query execution.
  # Just elasticsearch is implemented yet.
  elasticsearch:

    # Index, index pattern or alias where the query will be executed
    # It will be appended to <URL>/<index>/_search endpoint
    index: "kibana_sample_data_logs"

    # Elasticsearch query to execute.
    # Normally it is a JSON query, but we are using YAML format for the manifest ;D
    # so please, transform your JSON query to YAML in the manifest.
    # This option will execute the query: {"_source": [""], "query": { "bool": { "must": [ { "range": { "response": { "gte": 499 } } } ] } } }
    query:
      _source: [""]
      query: 
        bool:
          must:
            - range:
                response:
                  gte: 499

    # Okay, if you don't like YAML format, you can use the queryJSON field to put the JSON query
    # directly in the manifest. It will be parsed to the query field. But, if you use both fields,
    # the operator will fail.
    # queryJSON: >
    #   {
    #     "_source": [""],
    #     "query": {
    #       "bool": {
    #         "must": [
    #           {
    #             "range": {
    #               "response": {
    #                 "gte": 499
    #               }
    #             }
    #           }
    #         ]
    #       }
    #     }
    #   }

    # Response JSON field to watch for the condition check. Each query to elasticsearch
    # returns a JSON response like:
    # { "hits": "total": { "value": 100 }, hits: [ ... ] }
    # hits.total.value checks the total hits of the query
    # Underhood searchruler uses GJson to get this conditionField to check, so if you
    # want to get a value from an array you can use aggregations.hosts.buckets.#.total_response_time.value@values|#(>100)
    conditionField: "hits.total.value"

  # Condition for the rule evaluation. It will check the conditionField value with the
  # operator and threshold. If the condition is true, the RuleAction will be executed.
  condition:
    # Available options: greaterThan, greaterThanOrEqual, lessThan, lessThanOrEqual or equal
    operator: "greaterThan"
    # Threshold value to check the condition
    threshold: "100"
    # Time window to check the condition. For example, if the condition is greaterThan 100 for 1m
    for: "1m"

  # RuleAction reference to execute when the condition is true.
  actionRef:
    name: ruleraction-sample
    # Message template to send in the RuleAction execution. It is a Go template with the
    # object, value and, if exists, elasticsearch aggregations field variables. The object 
    # variable is the SearchRule object and the value variable is the value of the conditionField.

    # If the ruleaction is a alertmanager webhook, the message must be in alertmanager format:
    # https://prometheus.io/docs/alerting/latest/clients/
    data: |
      {{- $object := .object -}}
      {{- $value := .value -}}
      {{ printf "Hi, I'm on fire!" }}
      {{ printf "Name: %s" $object.Name }}
      {{ printf "Description: %s" $object.Spec.Description }}
      {{ printf "Current value: %v" $value }}

Tip

Underhood searchrule uses in conditionField field GJson library, so you can use whatever expression you want for GJson to check your JSONs responded by Elasticsearch. Here you have a debugger --> https://gjson.dev/

Important

By the moment, conditionField MUST return just a single value (number or float), it is not prepared for array elements. But it's just by the moment, we are working hard to implement it :D

📩 Customizing Alert Messages for Alertmanager

In the actionRef.data field, you define the message that gets sent to your webhook. If your webhook is Alertmanager, you'll need to structure the message according to Alertmanager's format. Plus, you can enable the validator in the RulerAction to ensure everything’s correctly formatted.

Here’s an example to show how to configure an Alertmanager-compatible message:

  # RuleAction reference to execute when the condition is true.
  actionRef:
    name: ruleraction-sample
    # Message template to send in the RuleAction execution. It is a Go template with the
    # object, value and, if exists, elasticsearch aggregations field variables. The object 
    # variable is the SearchRule object and the value variable is the value of the conditionField.

    # If the ruleaction is a alertmanager webhook, the message must be in alertmanager format:
    # https://prometheus.io/docs/alerting/latest/clients/
    data: |
        {{- $now := now | date "2006-01-02T15:04:05Z07:00" }}
        {{- $object := .object -}}
        {{- $value := .value -}}
        {{- $aggregations := .aggregations -}}

        {{- $alertList := list }}

        {{- $bucketsStr := "" }}
        {{- range .aggregations.hosts.buckets }}
          {{- $bucketInfo := printf "Number of documents -> %v, Response time -> %v" .doc_count .total_response_time.value }}
          {{- $bucketsStr = printf "%s\n%s" $bucketsStr $bucketInfo }}
        {{- end }}

        {{- $description := printf `

        Description: %s
        Value: %v
        Aggregations values: %s

        -------------------------------
        Name: %s
        Namespace: %s
        -------------------------------
        ` .object.Spec.Description .value $bucketStr .object.Name .object.Namespace }}

        {{- $description = ((regexReplaceAll "(?m)^[ \\t]+" $description "") | trim) }}

        {{- $annotations := dict
        "sent_by" "searchruler"
        "summary" "There are rules firing"
        "description" $description }}

        {{- $labels := dict
        "alertname" .object.Name
        "namespace" .object.Namespace
        "name" .object.Name
        "severity" "warning"
        "type" "searchruler-alert" }}

        {{- $alert := dict "startsAt" $now "annotations" $annotations "labels" $labels "generatorURL" "string-placeholder" }}
        {{- $alertList = append $alertList $alert }}

        {{- $alertJson := toJson $alertList }}
        {{- $alertJson }}

Tip

🔍 Why Use This?: By customizing the alert message to fit Alertmanager’s structure, you ensure seamless integration and make sure your alerts get delivered exactly the way you need. Plus, with validation enabled, you won’t have to worry about > formatting errors—everything’s checked before it’s sent! 🚀

2️⃣ Average Field Value Alert. Alert if the average value of a field exceeds a limit (e.g., high response times):

apiVersion: searchruler.prosimcorp.com/v1alpha1
kind: SearchRule
metadata:
  labels:
    app.kubernetes.io/name: search-ruler
    app.kubernetes.io/managed-by: kustomize
  name: searchrule-sample
spec:

  # Description for the Rule. It is not used in the rule execution, but is useful for the
  # message template in the RuleAction.
  description: "Alert when there are a high latency in the application."

  # QueryConnector reference to execute the queries for the rule evaluation.
  queryConnectorRef:
    name: queryconnector-sample

  # Interval time for checking the value of the query. For example, every 30s we will
  # execute the query value to elasticsearch
  checkInterval: 30s

  # Elasticsearch configuration for the query execution.
  # Just elasticsearch is implemented yet.
  elasticsearch:

    # Index, index pattern or alias where the query will be executed
    # It will be appended to <URL>/<index>/_search endpoint
    index: "kibana_sample_data_logs"

    Another example for queries with aggregations
    query:
      _source: [""]
      query:
        bool:
          must:
            - range:
                timestamp:
                  gte: "now-5m/m"
                  lte: "now/m"
        aggs:
          average_response_time:
            avg:
              field: "upstream_response_time_f"
    conditionField: "aggregations.average_response_time.value"

  # Condition for the rule evaluation. It will check the conditionField value with the
  # operator and threshold. If the condition is true, the RuleAction will be executed.
  condition:
    # Available options: greaterThan, greaterThanOrEqual, lessThan, lessThanOrEqual or equal
    operator: "greaterThan"
    # Threshold value to check the condition
    threshold: "5"
    # Time window to check the condition. For example, if the condition is greaterThan 100 for 1m
    for: "1m"

  # RuleAction reference to execute when the condition is true.
  actionRef:
    name: ruleraction-sample
    # Message template to send in the RuleAction execution. It is a Go template with the
    # object, value and, if exists, elasticsearch aggregations field variables. The object 
    # variable is the SearchRule object and the value variable is the value of the conditionField.

    # If the ruleaction is a alertmanager webhook, the message must be in alertmanager format:
    # https://prometheus.io/docs/alerting/latest/clients/
    data: |
      {{- $object := .object -}}
      {{- $value := .value -}}
      {{ printf "Hi, I'm on fire!" }}
      {{ printf "Name: %s" $object.Name }}
      {{ printf "Description: %s" $object.Spec.Description }}
      {{ printf "Current value: %v" $value }}

Templating engine

❤️ Special mention to Notifik

What you can use

In the actionRef.Data you can use everything you already know from Helm Template

How to use collected data

When a rule is firing, the data field is the one which the RulerAction will fire to the webhook. You can access many data for creating the message template like:

  • .object: The SearchRule manifest.
  • .value: The value of the query which detonates the alert firing.
  • .aggregations: The value of elasticsearch aggregation response if exists. We transform the JSON response of elasticsearch into an structure to be queried in your template. For example, for queries with aggregations, the value of this field will be like:
    aggregationName:
      doc_count_error_upper_bound: 2
      sum_other_doc_count: 120
      buckets:
      - key: key1
        doc_count: 100
      - key: key2
        doc_count: 200
    

This means that the objects can be accessed or stored in variables in the following way:

apiVersion: notifik.freepik.com/v1alpha1
kind: SearchRule
metadata:
  name: searchrule-sample-simple
spec:
  .
  .
  .
  actionRef:
    name: ruleraction-sample
    data: |
      {{- $object := .object -}}
      {{- $value := .value -}}    
      {{- $aggregationValues := .aggregations.buckets }}   
      {{ printf "Name: %s" $object.Name }}
      {{ printf "Description: %s" $object.Spec.Description }}
      {{ printf "Current value: %v" $value }}
      {{- range $aggregationValues }}
      {{ printf "%s: %s", .key .doc_count }}
      {{- end }}

Remember: with a big power comes a big responsibility

{{- $source := . -}}

How to debug

Templating issues are thrown on controller logs, but you also can see the State of your SearchRuler in EvaluateTemplateError state if there is any error evaluating the template.

To debug templates easy, we recommend using helm-playground. You can create a template on the left side, put your manifests in the middle, and the result is shown on the right side.

How to develop

Prerequisites

  • Kubebuilder v4.0.0+
  • go version v1.22.0+
  • docker version 17.03+.
  • kubectl version v1.11.3+.
  • Access to a Kubernetes v1.11.3+ cluster.

The process

We recommend you to use a development tool like Kind or Minikube to launch a lightweight Kubernetes on your local machine for development purposes

For learning purposes, we will suppose you are going to use Kind. So the first step is to create a Kubernetes cluster on your local machine executing the following command:

kind create cluster

Once you have launched a safe play place, execute the following command. It will install the custom resource definitions (CRDs) in the cluster configured in your ~/.kube/config file and run Kuberbac locally against the cluster:

make install run

If you would like to test the operator against some resources, our examples can be applied to see the result in your Kind cluster

kubectl apply -k config/samples/

Remember that your kubectl is pointing to your Kind cluster. However, you should always review the context your kubectl CLI is pointing to

How releases are created

Each release of this operator is done following several steps carefully in order not to break the things for anyone. Reliability is important to us, so we automated all the process of launching a release. For a better understanding of the process, the steps are described in the following recipe:

  1. Test the changes on the code:

    make test

    A release is not done if this stage fails

  2. Define the package information

    export VERSION="0.0.1"
    export IMG="ghcr.io/prosimcorp/searchruler:v$VERSION"
  3. Generate and push the Docker image (published on Docker Hub).

    make docker-build docker-push
  4. Generate the manifests for deployments using Kustomize

     make build-installer

How to collaborate

This project is done on top of Kubebuilder, so read about that project before collaborating. Of course, we are open to external collaborations for this project. For doing it you must fork the repository, make your changes to the code and open a PR. The code will be reviewed and tested (always)

We are developers and hate bad code. For that reason we ask you the highest quality on each line of code to improve this project on each iteration.

License

Copyright 2022.

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.