Ever wished Prometheus Ruler had a cool cousin for log searches? Say hello to searchruler! This Kubernetes operator lets you define, run, and manage log search rules (alerts) for platforms like Elasticsearch or Opensearch—all from the comfort of your K8s cluster. 🚀
Think of it as the rule engine your logs have been craving.
And here’s the best part: defining alerts with searchruler is totally free (ehem, ehem…) and, yes, everything is as code! You get to send webhook notifications wherever you want, just like Alertmanager. Flexibility, power, and no sneaky fees.
Your logs are about to get a whole lot smarter. 💡
Tired of shelling out big bucks for premium log alerting features? You know the drill: Want to set up rules or get notified? Pay up. Want to avoid endless click, click in a fancy UI? Too bad.
Well, no more! searchruler is here to save the day. This Kubernetes operator lets you define connectors, webhooks, rules, and alerts—right in your own cluster. And the best part? It’s free and code-driven! Finally, you can version-control your alerts like a pro. 🎉
Setting up searchruler is a breeze. Here are the three main building blocks that’ll make your log life so much easier:
-
🔗 QueryConnector: This is where the magic starts. Connect to your log source—whether it’s Elasticsearch, Opensearch, or something cool we’re cooking up for the future.
-
🚀 RulerAction: When a rule is triggered, where should the alert go? Set up webhooks, Slack channels, or anything else you need. We keep it simple, starting with a generic webhook (because everyone loves webhooks).
-
📜 SearchRule: The heart of it all! Define your rules, set the conditions, and craft the message to send when something’s off. This is where you turn log data into actionable alerts.
No more hidden fees. No more manual clicks. Just pure, versioned, code-driven log alerting—right in Kubernetes. 🚀
We have designed the deployment of this project to allow remote deployment using Kustomize or Helm. This way it is possible to use it with a GitOps approach, using tools such as ArgoCD or FluxCD.
If you prefer Kustomize, just make a Kustomization manifest referencing the tag of the version you want to deploy as follows:
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
resources:
- https://github.com/prosimcorp/searchruler/releases/download/v0.1.0/install.yaml
🧚🏼 Hey, listen! If you prefer to deploy using Helm, go to the Helm registry
Some configuration parameters can be defined by flags that can be passed to the controller. They are described in the following table:
Name | Description | Default |
---|---|---|
--metrics-bind-address |
The address the metric endpoint binds to. 0 disables the server |
0 |
--health-probe-bind-address |
he address the probe endpoint binds to | :8081 |
--leader-elect |
Enable leader election for controller manager | false |
--metrics-secure |
If set the metrics endpoint is served securely | false |
--enable-http2 |
If set, HTTP/2 will be enabled for the metrirs | false |
--webserver-address |
Webserver listen address. 0 disables the webserver |
0 |
After deploying this operator, you will have new resources available. Let's talk about them.
Tip
You can find the spec samples for all the versions of the resource in the examples directory
A QueryConnector
is where it all starts! It defines the "source" where your log search rules (defined in SearchRules) will run. Right now, searchruler supports Elasticsearch-like sources, but we’re cooking up more integrations—stay tuned! 👀
Here’s a quick example to show you how it works:
apiVersion: searchruler.prosimcorp.com/v1alpha1
kind: QueryConnector
metadata:
labels:
app.kubernetes.io/name: search-ruler
app.kubernetes.io/managed-by: kustomize
name: queryconnector-sample
spec:
# URL for the query connector. We will execute the queries in this URL
url: "https://127.0.0.1:9200"
# Additional headers if needed for the connection
headers: {}
# Skip certificate verification if the connection is HTTPS
tlsSkipVerify: true
# Secret reference to get the credentials if needed for the connection
credentials:
# Interval to check secret credentials for any changes
# Default value is 1m
#syncInterval: 1m
secretRef:
name: elasticsearch-main-credentials
keyUsername: username
keyPassword: password
A RulerAction defines where your alerts will be sent when a SearchRule is triggered (a.k.a. "firing"). Whether it’s a Slack channel, a webhook endpoint, alertmanager or another notification service—you’re in control! 🛠️
Here’s a quick example:
apiVersion: searchruler.prosimcorp.com/v1alpha1
kind: RulerAction
metadata:
labels:
app.kubernetes.io/name: search-ruler
app.kubernetes.io/managed-by: kustomize
name: ruleraction-sample
spec:
# Webhook integration configuration to send alerts.
# Note: The webhook integration is the only one implemented yet.
webhook:
# URL to send the webhook message
url: http://127.0.0.1:8080
# HTTP method to send the webhook message
verb: POST
# Skip certificate verification if the connection is HTTPS
tlsSkipVerify: false
# Additional headers if needed for the connection
headers: {}
# Validator configuration to validate the response of the webhook
# Just alertmanager validation available yet.
# If you use alertmanager validator, message data must be in alertmanager format:
# https://prometheus.io/docs/alerting/latest/clients/
# validator: alertmanager
# Credentials to authenticate in the webhook if needed
# credentials:
# secretRef:
# name: alertmanager-credentials
# keyUsername: username
# keyPassword: password
This is where the magic happens! SearchRules define the conditions to check in your log sources (via queryconnectors) and specify where to send alerts (using ruleractions). You get to decide what matters and how to act on it. 🎯
Here are two quick examples to show you what’s possible:
1️⃣ Simple Match Count Alert. Trigger an alert when the number of matching documents exceeds a threshold:
apiVersion: searchruler.prosimcorp.com/v1alpha1
kind: SearchRule
metadata:
labels:
app.kubernetes.io/name: search-ruler
app.kubernetes.io/managed-by: kustomize
name: searchrule-sample
spec:
# Description for the Rule. It is not used in the rule execution, but is useful for the
# message template in the RuleAction.
description: "Alert when there are a high error rate in the application."
# QueryConnector reference to execute the queries for the rule evaluation.
queryConnectorRef:
name: queryconnector-sample
# Interval time for checking the value of the query. For example, every 30s we will
# execute the query value to elasticsearch
checkInterval: 30s
# Elasticsearch configuration for the query execution.
# Just elasticsearch is implemented yet.
elasticsearch:
# Index, index pattern or alias where the query will be executed
# It will be appended to <URL>/<index>/_search endpoint
index: "kibana_sample_data_logs"
# Elasticsearch query to execute.
# Normally it is a JSON query, but we are using YAML format for the manifest ;D
# so please, transform your JSON query to YAML in the manifest.
# This option will execute the query: {"_source": [""], "query": { "bool": { "must": [ { "range": { "response": { "gte": 499 } } } ] } } }
query:
_source: [""]
query:
bool:
must:
- range:
response:
gte: 499
# Okay, if you don't like YAML format, you can use the queryJSON field to put the JSON query
# directly in the manifest. It will be parsed to the query field. But, if you use both fields,
# the operator will fail.
# queryJSON: >
# {
# "_source": [""],
# "query": {
# "bool": {
# "must": [
# {
# "range": {
# "response": {
# "gte": 499
# }
# }
# }
# ]
# }
# }
# }
# Response JSON field to watch for the condition check. Each query to elasticsearch
# returns a JSON response like:
# { "hits": "total": { "value": 100 }, hits: [ ... ] }
# hits.total.value checks the total hits of the query
# Underhood searchruler uses GJson to get this conditionField to check, so if you
# want to get a value from an array you can use aggregations.hosts.buckets.#.total_response_time.value@values|#(>100)
conditionField: "hits.total.value"
# Condition for the rule evaluation. It will check the conditionField value with the
# operator and threshold. If the condition is true, the RuleAction will be executed.
condition:
# Available options: greaterThan, greaterThanOrEqual, lessThan, lessThanOrEqual or equal
operator: "greaterThan"
# Threshold value to check the condition
threshold: "100"
# Time window to check the condition. For example, if the condition is greaterThan 100 for 1m
for: "1m"
# RuleAction reference to execute when the condition is true.
actionRef:
name: ruleraction-sample
# Message template to send in the RuleAction execution. It is a Go template with the
# object, value and, if exists, elasticsearch aggregations field variables. The object
# variable is the SearchRule object and the value variable is the value of the conditionField.
# If the ruleaction is a alertmanager webhook, the message must be in alertmanager format:
# https://prometheus.io/docs/alerting/latest/clients/
data: |
{{- $object := .object -}}
{{- $value := .value -}}
{{ printf "Hi, I'm on fire!" }}
{{ printf "Name: %s" $object.Name }}
{{ printf "Description: %s" $object.Spec.Description }}
{{ printf "Current value: %v" $value }}
Tip
Underhood searchrule uses in conditionField
field GJson
library, so you can use whatever expression you
want for GJson to check your JSONs responded by Elasticsearch. Here you have a debugger --> https://gjson.dev/
Important
By the moment, conditionField
MUST return just a single value (number or float),
it is not prepared for array elements. But it's just by the moment, we are working hard to implement it :D
In the actionRef.data
field, you define the message that gets sent to your webhook. If your webhook is Alertmanager, you'll need to structure the message according to Alertmanager's format. Plus, you can enable the validator in the RulerAction to ensure everything’s correctly formatted.
Here’s an example to show how to configure an Alertmanager-compatible message:
# RuleAction reference to execute when the condition is true.
actionRef:
name: ruleraction-sample
# Message template to send in the RuleAction execution. It is a Go template with the
# object, value and, if exists, elasticsearch aggregations field variables. The object
# variable is the SearchRule object and the value variable is the value of the conditionField.
# If the ruleaction is a alertmanager webhook, the message must be in alertmanager format:
# https://prometheus.io/docs/alerting/latest/clients/
data: |
{{- $now := now | date "2006-01-02T15:04:05Z07:00" }}
{{- $object := .object -}}
{{- $value := .value -}}
{{- $aggregations := .aggregations -}}
{{- $alertList := list }}
{{- $bucketsStr := "" }}
{{- range .aggregations.hosts.buckets }}
{{- $bucketInfo := printf "Number of documents -> %v, Response time -> %v" .doc_count .total_response_time.value }}
{{- $bucketsStr = printf "%s\n%s" $bucketsStr $bucketInfo }}
{{- end }}
{{- $description := printf `
Description: %s
Value: %v
Aggregations values: %s
-------------------------------
Name: %s
Namespace: %s
-------------------------------
` .object.Spec.Description .value $bucketStr .object.Name .object.Namespace }}
{{- $description = ((regexReplaceAll "(?m)^[ \\t]+" $description "") | trim) }}
{{- $annotations := dict
"sent_by" "searchruler"
"summary" "There are rules firing"
"description" $description }}
{{- $labels := dict
"alertname" .object.Name
"namespace" .object.Namespace
"name" .object.Name
"severity" "warning"
"type" "searchruler-alert" }}
{{- $alert := dict "startsAt" $now "annotations" $annotations "labels" $labels "generatorURL" "string-placeholder" }}
{{- $alertList = append $alertList $alert }}
{{- $alertJson := toJson $alertList }}
{{- $alertJson }}
Tip
🔍 Why Use This?: By customizing the alert message to fit Alertmanager’s structure, you ensure seamless integration and make sure your alerts get delivered exactly the way you need. Plus, with validation enabled, you won’t have to worry about > formatting errors—everything’s checked before it’s sent! 🚀
2️⃣ Average Field Value Alert. Alert if the average value of a field exceeds a limit (e.g., high response times):
apiVersion: searchruler.prosimcorp.com/v1alpha1
kind: SearchRule
metadata:
labels:
app.kubernetes.io/name: search-ruler
app.kubernetes.io/managed-by: kustomize
name: searchrule-sample
spec:
# Description for the Rule. It is not used in the rule execution, but is useful for the
# message template in the RuleAction.
description: "Alert when there are a high latency in the application."
# QueryConnector reference to execute the queries for the rule evaluation.
queryConnectorRef:
name: queryconnector-sample
# Interval time for checking the value of the query. For example, every 30s we will
# execute the query value to elasticsearch
checkInterval: 30s
# Elasticsearch configuration for the query execution.
# Just elasticsearch is implemented yet.
elasticsearch:
# Index, index pattern or alias where the query will be executed
# It will be appended to <URL>/<index>/_search endpoint
index: "kibana_sample_data_logs"
Another example for queries with aggregations
query:
_source: [""]
query:
bool:
must:
- range:
timestamp:
gte: "now-5m/m"
lte: "now/m"
aggs:
average_response_time:
avg:
field: "upstream_response_time_f"
conditionField: "aggregations.average_response_time.value"
# Condition for the rule evaluation. It will check the conditionField value with the
# operator and threshold. If the condition is true, the RuleAction will be executed.
condition:
# Available options: greaterThan, greaterThanOrEqual, lessThan, lessThanOrEqual or equal
operator: "greaterThan"
# Threshold value to check the condition
threshold: "5"
# Time window to check the condition. For example, if the condition is greaterThan 100 for 1m
for: "1m"
# RuleAction reference to execute when the condition is true.
actionRef:
name: ruleraction-sample
# Message template to send in the RuleAction execution. It is a Go template with the
# object, value and, if exists, elasticsearch aggregations field variables. The object
# variable is the SearchRule object and the value variable is the value of the conditionField.
# If the ruleaction is a alertmanager webhook, the message must be in alertmanager format:
# https://prometheus.io/docs/alerting/latest/clients/
data: |
{{- $object := .object -}}
{{- $value := .value -}}
{{ printf "Hi, I'm on fire!" }}
{{ printf "Name: %s" $object.Name }}
{{ printf "Description: %s" $object.Spec.Description }}
{{ printf "Current value: %v" $value }}
❤️ Special mention to Notifik
In the actionRef.Data you can use everything you already know from Helm Template
When a rule is firing, the data field is the one which the RulerAction
will fire to the webhook. You can access many data for creating the message template like:
.object
: TheSearchRule
manifest..value
: The value of the query which detonates the alert firing..aggregations
: The value of elasticsearch aggregation response if exists. We transform the JSON response of elasticsearch into an structure to be queried in your template. For example, for queries with aggregations, the value of this field will be like:aggregationName: doc_count_error_upper_bound: 2 sum_other_doc_count: 120 buckets: - key: key1 doc_count: 100 - key: key2 doc_count: 200
This means that the objects can be accessed or stored in variables in the following way:
apiVersion: notifik.freepik.com/v1alpha1
kind: SearchRule
metadata:
name: searchrule-sample-simple
spec:
.
.
.
actionRef:
name: ruleraction-sample
data: |
{{- $object := .object -}}
{{- $value := .value -}}
{{- $aggregationValues := .aggregations.buckets }}
{{ printf "Name: %s" $object.Name }}
{{ printf "Description: %s" $object.Spec.Description }}
{{ printf "Current value: %v" $value }}
{{- range $aggregationValues }}
{{ printf "%s: %s", .key .doc_count }}
{{- end }}
Remember: with a big power comes a big responsibility
{{- $source := . -}}
Templating issues are thrown on controller logs, but you also can see the State
of your SearchRuler
in EvaluateTemplateError
state if there is any error evaluating the template.
To debug templates easy, we recommend using helm-playground. You can create a template on the left side, put your manifests in the middle, and the result is shown on the right side.
- Kubebuilder v4.0.0+
- go version v1.22.0+
- docker version 17.03+.
- kubectl version v1.11.3+.
- Access to a Kubernetes v1.11.3+ cluster.
We recommend you to use a development tool like Kind or Minikube to launch a lightweight Kubernetes on your local machine for development purposes
For learning purposes, we will suppose you are going to use Kind. So the first step is to create a Kubernetes cluster on your local machine executing the following command:
kind create cluster
Once you have launched a safe play place, execute the following command. It will install the custom resource definitions (CRDs) in the cluster configured in your ~/.kube/config file and run Kuberbac locally against the cluster:
make install run
If you would like to test the operator against some resources, our examples can be applied to see the result in your Kind cluster
kubectl apply -k config/samples/
Remember that your
kubectl
is pointing to your Kind cluster. However, you should always review the context your kubectl CLI is pointing to
Each release of this operator is done following several steps carefully in order not to break the things for anyone. Reliability is important to us, so we automated all the process of launching a release. For a better understanding of the process, the steps are described in the following recipe:
-
Test the changes on the code:
make test
A release is not done if this stage fails
-
Define the package information
export VERSION="0.0.1" export IMG="ghcr.io/prosimcorp/searchruler:v$VERSION"
-
Generate and push the Docker image (published on Docker Hub).
make docker-build docker-push
-
Generate the manifests for deployments using Kustomize
make build-installer
This project is done on top of Kubebuilder, so read about that project before collaborating. Of course, we are open to external collaborations for this project. For doing it you must fork the repository, make your changes to the code and open a PR. The code will be reviewed and tested (always)
We are developers and hate bad code. For that reason we ask you the highest quality on each line of code to improve this project on each iteration.
Copyright 2022.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.