Zabbix Blog

File Integrity Monitoring with Zabbix

Paulo R. Deolindo Jr. — Thu, 12 Dec 2024 09:53:13 +0000

We have often seen Zabbix used as a simple tool for monitoring network assets as well as Information and Communication Technology (ICT) infrastructure. While this concept is not incorrect, it is equally important to understand that with the advancement of Zabbix versions, more and more functionalities have been made available for other types of monitoring, enabling advanced data analysis and stunning visualizations through new and modern widgets in the frontend layer.

In this short blog post, we will explore some of the existing yet under-discussed features of Zabbix that contribute to the maturity of the cybersecurity discipline within organizations — a topic that is becoming increasingly critical in the corporate environment.

Table of Contents

FIM – File Integrity Monitoring

FIM is a very common concept among information security tools, specifically in tools like SIEM/XDR (Security Information Event Management/Extended Detection and Response). The name is quite suggestive of its usability, but while some tools highlight this feature as one of their main functionalities, it is also available for those who use Zabbix – just not explicitly labeled under this name.
Here, we will approach FIM as a concept rather than just a functionality. This is because we aim to achieve a result, not merely have a menu with a name to claim compliance while using our tool. In fact, the outcome needs to be more important than mere “marketing.”

What should we expect from FIM?

Imagine that your servers have certain directories and/or files so critical that you cannot afford to neglect monitoring them for changes, insertions, or deletions. Additionally, these files may have owners and properties that must not be altered – otherwise, the systems that depend on them might lose the ability to read or execute their functions. This, at a minimum, is what we expect from FIM as a functionality.
To illustrate this a bit further, consider a database service like MariaDB:

# ls -lR /etc/mysql/
/etc/mysql/:
total 24
drwxr-xr-x 2 root root 4096 Jun 25 18:40 conf.d
-rwxr-xr-x 1 root root 1740 Nov 30 2023 debian-start
-rw------- 1 root root 544 Jun 25 18:43 debian.cnf
-rw-r--r-- 1 root root 1126 Nov 30 2023 mariadb.cnf
drwxr-xr-x 2 root root 4096 Sep 30 16:36 mariadb.conf.d
lrwxrwxrwx 1 root root 24 Oct 20 2020 my.cnf -> /etc/alternatives/my.cnf
-rw-r--r-- 1 root root 839 Oct 20 2020 my.cnf.fallback

/etc/mysql/conf.d:
total 8
-rw-r--r-- 1 root root 8 Oct 20 2020 mysql.cnf
-rw-r--r-- 1 root root 55 Oct 20 2020 mysqldump.cnf

/etc/mysql/mariadb.conf.d:
total 40
-rw-r--r-- 1 root root 575 Nov 30 2023 50-client.cnf
-rw-r--r-- 1 root root 231 Nov 30 2023 50-mysql-clients.cnf
-rw-r--r-- 1 root root 927 Nov 30 2023 50-mysqld_safe.cnf
-rw-r--r-- 1 root root 3795 Sep 30 16:36 50-server.cnf
-rw-r--r-- 1 root root 570 Nov 30 2023 60-galera.cnf
-rw-r--r-- 1 root root 76 Nov 8 2023 provider_bzip2.cnf
-rw-r--r-- 1 root root 72 Nov 8 2023 provider_lz4.cnf
-rw-r--r-- 1 root root 74 Nov 8 2023 provider_lzma.cnf
-rw-r--r-- 1 root root 72 Nov 8 2023 provider_lzo.cnf
-rw-r--r-- 1 root root 78 Nov 8 2023 provider_snappy.cnf

All the files, directories, and subdirectories listed above have already been configured, and the system (whatever it may be) is functioning perfectly. However, if someone suddenly decides to alter a configuration in the file /etc/mysql/mariadb.conf.d/50-server.cnf, this could be disastrous for the service. Regardless, the important thing to do is to monitor this scope and notify the relevant stakeholders so that an appropriate analysis can be conducted.

Zabbix can help with that. Let’s see how.

Zabbix and File Integrity Monitoring functions

Consider that the Zabbix agent is installed on the server to be monitored:

vfs.dir.count[/etc/mysql]

With this key, we can count the objects present within the /etc/mysql directory. Subsequently, we can create a trigger to be activated if there is any change related to the initial collection count, such as someone deleting or adding a file or directory in this location.

vfs.dir.size[/etc/mysql]

With this key, we can determine the total size in bytes used by the directories and configuration files. In the future, we can create a trigger that activates when this size changes, indicating the deletion or addition of a file.

vfs.file.exists[/etc/mysql/mariadb.conf.d/50-server.cnf]

Among several important files, we may have a greater interest in some configuration files, and we can validate their existence by creating a trigger that activates when such a file ceases to exist. This will clearly indicate that something important has disappeared.

In this case, the value “1” represents “OK” for the existence of the file.

vfs.file.cksum[/etc/mysql/mariadb.conf.d/50-server.cnf,sha256]

In addition to verifying the existence of the configuration file we consider important, we need to be informed if anything in it changes. This key handles that by generating a hash in a variety of possible formats, allowing a trigger to be activated in case of a hash change, which would reflect a file modification (unfortunately, we won’t know what exactly was altered).

vfs.file.regmatch[/etc/mysql/mariadb.conf.d/50-server.cnf,^max_connections\s+=\s+(\d+)]

We might have a specific parameter of interest – for example, the maximum number of connections allowed to the database. Monitoring this is important because if the configuration is set to the default value, it means that no “tuning” has been applied to the database. Alternatively, it could mean that someone simply deleted or commented out this line, causing it to be ignored by the system. Therefore, verifying whether the parameter exists and is properly configured is crucial.

In this case, the value “1” indicates that the regular expression was successfully found, meaning that the configuration or parameter we need to exist is indeed present.

vfs.file.regexp[/etc/mysql/mariadb.conf.d/50-server.cnf,^max_connections\s+=\s+(\d+),,,,\1]

Beyond verifying the existence and integrity of the file, it is also possible to determine what was changed within it. However, we would need to specify the configuration of interest using a regular expression. For example, considering that the maximum number of connections allowed by the database system is “x,” we can be alerted by a trigger if it changes to “y,” “z,” or any other value different from “x.” This setup allows us to monitor the parameter of interest with precision. This logic can be applied to any other parameter you consider important. Of course, there is another way to automate this process, but we will not cover that automation here.

In this case, the parameter defining the maximum number of connections is not only present, but we also know the exact number of connections. This way, we will have a history of the applied parameterization in case it is changed at any point.

vfs.file.owner[/etc/mysql/mariadb.conf.d/50-server.cnf]

vfs.file.owner[/etc/mysql/mariadb.conf.d/50-server.cnf,group]

The two keys above allow us to determine the owner of a file and (in the case of a Linux system) the owning group. We can also choose to monitor the user’s name or their UID in the system. Naturally, a trigger can be activated to alert us in case of an ownership change, indicating that someone might be “taking over” an important file in the system.

vfs.file.permissions[/etc/mysql/mariadb.conf.d/50-server.cnf]

The key above allows us to determine a file’s permissions—read, write, read and write, execution, or a special permission bit. Naturally, a trigger can be activated to alert us if there is any permission change in the file.

vfs.file.attr[/etc/mysql/mariadb.conf.d/50-server.cnf]

The key above does not exist by default. It was created with a UserParameter, which is a customization for verifying a command that, in this case, checks the attributes of a specific file. Consider the following command executed directly in your system’s terminal:

# lsattr /etc/mysql/mariadb.conf.d/50-server.cnf
--------------e------- /etc/mysql/mariadb.conf.d/50-server.cnf

What interests us are the attributes:

--------------e-------

If someone who invades the system modifies the attribute of a file (for example) using this command…

# chattr +A /etc/mysql/mariadb.conf.d/50-server.cnf
# lsattr /etc/mysql/mariadb.conf.d/50-server.cnf
-------A------e------- /etc/mysql/mariadb.conf.d/50-server.cnf

…it could mean that someone does not want the system to log when this file was accessed (refer to the chattr command manual). Additionally, any other attribute can be added or removed, which poses a risk to the system because these attributes can alter how files are accessed, stored on disk, and later read. Therefore, we can create a UserParameter as follows:

# cd /etc/zabbix/zabbix_agent2.d/
# echo "UserParameter=vfs.file.attr[*],lsattr \$1 | cut -d\" \" -f1" > attr.conf
# zabbix_agent2 -R userparameter_reload

Finally, we can test the reading of attributes directly from the terminal:

# zabbix_agent2 -t vfs.file.attr[/etc/mysql/mariadb.conf.d/50-server.cnf]
vfs.file.attr[/etc/mysql/mariadb.conf.d/50-server.cnf][s|-------A------e-------]

You can also try this now through the frontend.

When creating the item, don’t forget to create the trigger that should be activated in case there is a change in the attribute of a file, whatever it may be.

Paying attention to file access and modification times

To delve a bit deeper into the concept of FIM, we should ask ourselves if we are monitoring file access and modifications concerning their timestamps. In a way, if we have implemented everything proposed above, the answer is yes.

That said, there is an easier way to keep track of all the things we’ve discussed. It involves using this key:

vfs.dir.get[/etc/mysql]

When creating an item with this key, we will recursively obtain all its objects, such as subdirectories and files. The output format will be a JSON, which allows us to create LLD (Low-level Discovery) rules to automate FIM. Below is a small snippet of the monitoring output:

{
"basename": "mariadb.cnf",
"pathname": "/etc/mysql/mariadb.cnf",
"dirname": "/etc/mysql",
"type": "file",
"user": "root",
"group": "root",
"permissions": "0644",
"uid": 0,
"gid": 0,
"size": 1126,
"time": {
"access": "2024-11-30T23:01:01-0300",
"modify": "2023-11-30T01:42:37-0300",
"change": "2024-06-25T18:41:01-0300"
},
"timestamp": {
"access": 1733018461,
"modify": 1701319357,
"change": 1719351661
}
...

Considering that the output includes all objects from the main directory, this would be the most sensible approach to configure our FIM. However, it is necessary to create the LLD and prototypes. We will not cover this in detail in this article, but this is the path I recommend you follow.

Below is a “blueprint” for an LLD to create automated File Integrity Monitoring:

The “Master item”:

The “Dependent rule”:

The LLD Macro:

The item prototypes:

Below are the components of a trigger prototype (I created just one to symbolize a type of alert for file modification):

Name: Object: {#BASENAME} just changed

Event name: Object: {#BASENAME} just changed. Last hash: {ITEM.VALUE} The previous one: {?last(/MySQLDB/vfs.file.cksum["{#PATHNAME}",sha256],#2)} Object: {#BASENAME} just changed. Last hash: {ITEM.VALUE} The previous one: {?last(/MySQLDB/vfs.file.cksum["{#PATHNAME}",sha256],#2)}

Severity: Warning

Expression: last(/MySQLDB/vfs.file.cksum["{#PATHNAME}",sha256],#1)<>last(/MySQLDB/vfs.file.cksum["{#PATHNAME}",sha256],#2)

And then, some results:

Conclusion

The implementation of a robust File Integrity Monitoring system helps to ensure the security of IT infrastructure. Detecting unauthorized changes in critical files helps prevent attacks, identify security breaches, and ensure the integrity and availability of systems. With Zabbix, we have an effective solution to implement FIM, enabling process automation and the real-time visualization of changes. This monitoring not only reinforces protection against intrusions but also facilitates auditing and compliance with regulatory standards.

The main benefits of integrating File Integrity Monitoring with Zabbix include:

1. Early detection of changes in critical files, enabling quick responses.
2. Enhanced compliance with security regulations and internal policies.
3. Protection against malware and ransomware by identifying changes in essential files.
4. Ease of auditing with automated reports and modification histories.
5. Greater visibility and control over the integrity of data and systems in real time.
6. Operational efficiency through the automation of alerts and reports.
7. Improved proactive security, helping prevent attacks before they become critical.

By using Zabbix, organizations can strengthen their security posture and optimize risk management, ensuring that any unauthorized changes are detected and promptly corrected.

The post File Integrity Monitoring with Zabbix appeared first on Zabbix Blog.

See what’s possible in Zabbix 7.2!

Arturs Lontons — Wed, 11 Dec 2024 10:00:37 +0000

Zabbix 7.2 is out now and available for download! The latest Zabbix major release introduces a range of new visualization features and widgets while adding a variety of updated monitoring features to support new use cases and scenarios. Read more to find out about the latest Zabbix features and improvements.

The previously deprecated Data overview widget has been converted to the new Top items widget. The Top items widget enables item selection via item patterns. The selected items are then displayed for hosts based on host and host group filters. This means that users are not limited to explicitly selected items or hosts, which enables dynamically matching items in rapidly changing environments.

Items can be matched using pattern matching in the Top items widget

The widget supports Bar, Indicator, Sparkline, and As-is value visualization as well as defining value thresholds, enabling value highlighting for values exceeding the defined threshold.

Top items widget supports As-is, Bar, Indicator, and Sparkline value visualization

The Host card widget adds the ability to display host information on Zabbix dashboards. The widget configuration supports selecting and ordering fields containing a variety of information about the host.

The Host card widget allows for selecting and ordering host information fields

The widget also supports a multi-column layout. Host information can be displayed in 1-3 columns, depending on how the widget is placed on the dashboard.

The host card widget layout can be customized by resizing the widget

Sparkline chart

Sparkline charts have been introduced in Zabbix 7.2 as an additional visualization option for existing widgets. The goal of a sparkline chart is to provide additional over-time context when viewing collected values in widgets, such as the Item value widget. Sparkline charts are supported in Top items, Top hosts, and Item value widgets.

Sparkline charts can be displayed in Item value, Top Items, and Top hosts widgets

NVIDIA GPU monitoring template and Zabbix agent 2 plugin

Starting with Zabbix release 7.2.1, the newly released NVIDIA GPU monitoring template and Zabbix agent 2 plugin will allow agent 2 to automatically discover NVIDIA GPUs on Windows and Linux environments and start monitoring items such as GPU temperature, power usage, memory, frequency, and much more. The list of discovered and supported metrics may vary depending on the GPU model.

GPU metrics can be automatically discovered and displayed on Zabbix dashboards

NETCONF monitoring with SSH item subsystem support

SSH subsystems are a set of remote commands predefined on the monitored endpoint. A common use case of an SSH subsystem is the NETCONF subsystem, used to manage network device configuration.

Zabbix 7.2 introduces a new parameter for the SSH monitoring item – ssh.run[unique short description,,,,,]

The subsystem parameter is used to specify an SSH subsystem and can be used to execute commands via SSH subsystems such as NETCONF or SFTP.

New and updated macros

New {*.TIMESTAMP} macros can be used to populate alerts with the UNIXTIME value of problem detection, recovery, and update timestamps.
The {EVENT.UPDATE.ACTIONJSON} macro resolves to a JSON array containing details of the actions performed during a problem update. This JSON value can be later used in integrations or scripts.
The {SERVICE.ID} macro resolves to the numeric ID of the service that triggered the action.
The {HOST.PORT} macro can now be used in the same locations as the {HOST.CONN} macro.
The new {FUNCTION.VALUE<1-9>} and {FUNCTION.RECOVERY.VALUE<1-9>} macros can be used in expression macros to display a value of the Nth item-based function in the trigger expression. This can be used to display values in map labels or graph names.

VMware monitoring improvements

VMware monitoring has received multiple improvements and fixes in Zabbix 7.2:

In addition to the previously supported VMware hypervisor discovery workflow, the template VMware Hypervisor can now be manually linked to a stand-alone hypervisor host.
There is now a new item used to monitor the VMware virtual machine hypervisor maintenance status: vmware.vm.hv.maintenance[url,uuid]
VMware event collection has been improved by adding the support of pagination. This reduces memory consumption resulting from a large number of collected VMware events.

New and updated templates

Zabbix 7.2 introduces multiple new templates:

A variety of templates for LAMP stack monitoring by Zabbix agent active
NVIDIA GPU
Juniper MX series
Huawei OceanStor V6 Dorado
Nutanix Prism Element
Website certificate by Zabbix agent 2 active

The following existing templates have also received fixes and updates:

Dell iDrac and PowerEdge updated to use SNMP walk items
Proxmox VE by HTTP – new disk space usage items/triggers
MSSQL by ODBC performance counter query fixes
Linux and Nextcloud – removed unnecessary discard unchanged preprocessing from LLD rules
Microsoft 365 reports by HTTP description fixes

Additional changes and improvements

Additional changes and improvements introduced in Zabbix 7.2:

Added support for CP_SPIN CPU state on OpenBSD
Implemented new column configuration options in the Top hosts widget and support for binary item display
Added support for LLD Macro {#UNIT.SERVICETYPE} in systemd.unit.discovery for Zabbix agent 2
Updated maximum supported TimescaleDB version to 2.17
Updated maximum supported PostgreSQL version to 17
Added PubkeyAcceptedKeyTypes SSH public key algorithm configuration option
Items now become unsupported when there are no pollers
Removed support for Oracle DB
Removed the dependent item count limit
Added support of logarithmic Y-axis scaling in graphs
Increased the max number of rows for some widgets, such as Top hosts
Enabled usage of the mediatype.get method for users with the User role with a limited field scope
Added the ability to assign override host (Widget, Dashboard) for graph widget data sets
Implemented automatic selection of the first element of a broadcast-capable widget
Implemented a new filter in media type list view to filter out media types by their usage in action

Download and install Zabbix 7.2

You can find instructions and download the new version on the download page .

In order to  upgrade to Zabbix 7.2  you need to upgrade your repository package and download and install the new Zabbix component packages (Zabbix server, proxy, frontend, and other Zabbix components). When you start the Zabbix server, an automatic database schema upgrade will be performed. Zabbix agents are backward compatible, so installing the new agent versions is not required. Agent upgrade can be performed at a later time. 

You can find detailed step-by-step upgrade instructions on our Upgrade procedure page. 

Learn about new features and changes introduced in Zabbix 7.2 by visiting  the “What’s new in Zabbix 7.2” page .

A detailed description of the new features can be found in the “What’s new” documentation section .

Take a look at the release notes  to see the full list of new features and improvements.

The post See what’s possible in Zabbix 7.2! appeared first on Zabbix Blog.

NetBox as Home CMDB and Integrated with Zabbix

Janne Pikkarainen — Fri, 06 Dec 2024 09:59:58 +0000

Welcome to another episode of What’s up, home? weirdness! Who wouldn’t have their own NetBox at home – and who wouldn’t think of it as a home CMDB? I’ve just started experimenting with it. For those who do not know, a Configuration Management Database (CMDB) is the source of truth for your inventory of stuff. In data centers, it keeps track of your servers, their cables, and everything else, telling you in which data center and which rack they are.

For me… well, take a look at for yourself. One picture says more than a thousand words of my storytelling.

What is it good for?

Well… in the real business world, it’s good for many things – from knowing about your assets, their serial numbers, purchase dates, hardware configuration, and so much else. I could go as deep as that, but there’s a limit how far even I want to go with these little experiments. Today’s case is merely to demonstrate the flexibility of Zabbix, yet again.

How did I do this?

I quickly threw the data in to NetBox by hand — it looks by a lot of work to do, but in fact, it wasn’t too bad – took me about 45 minutes to do the following:

Create a Site called “”What’s up, home?”
Create the rooms by adding new locations and making the previously created site as their parent
Add some manufacturers
Add some device roles
Add some device types

After that, adding the devices themselves is a breeze. If you have not used NetBox, this is what adding a new device looks like. Yes yes, in the real business world there would have been many more items for me to fill in, but for this case I only added the mandatory items and even those I could do just by choosing from the drop-down menus. Not a big deal.

…and the Zabbix integration?

Actually, this is something I created many years ago for other purposes, but still seems to work with today’s versions of NetBox. My little template queries NetBox over its API and asks if it has anything that matches with the host name that’s in Zabbix. If it has, then it gets the rack location and other stuff.

How this then works is pretty standard stuff. Retrieve a master item…

…and the dependent items then gather the data, parse some JSONPaths with Zabbix item preprocessing, and at least some of the items also populate bits and pieces in the Zabbix inventory. This is handy in real world, as your alerts can then contain the exact rack location and so forth about your failing devices. Add them as tags or add them as part of the alert text, your imagination is your limit.

Does it work?

Of course it does! Here’s the inventory grouped by manufacturer:

If I click on any of them, I get this:

Of course I can also browse the data through the latest data, for example…

…or I could just create some dashboards for visualizing all this. I have not done that yet, as this is what I did tonight so far and now I’m going to bed. To be continued – maybe! For now, the template only pulls data from NetBox, but I’d like to push data towards it as well, to also tell if a light bulb is powered on or not, for example. Stay tuned!

The post NetBox as Home CMDB and Integrated with Zabbix appeared first on Zabbix Blog.

An Introduction to Browser Monitoring

Alexander Petrov-Gavrilov — Tue, 03 Dec 2024 09:05:53 +0000

Website and web application monitoring can vary from simple use cases to complex multi-step scenarios. To fully cover the scope of modern website monitoring requirements, Zabbix has introduced Browser item, a new item type that brings with it multiple accompanying improvements for simulating browser behavior and collecting website metrics.

Table of Contents

What is browser monitoring?

Browser monitoring allows users to monitor complex websites and web applications using an actual browser. It involves the constant tracking and analysis of the performance, reliability, and functionality of a website or web application from a real user perspective. This process ensures that key pages, features, and user navigation work as expected. By monitoring critical pages and flows specific to different businesses, companies can ensure optimal user experience, resolve potential or ongoing issues, and proactively address any potential problems.

Browser monitoring can be split into two main approaches:

Browser real user monitoring – Monitors how your web page or web application is performing, using real user data to analyze overall performance and user experience.
Browser synthetic monitoring – Analyzes application availability and performance, using scheduled testing to analyze website availability and emulate real user experience.

Since Zabbix is not a real person (yet) but is fully capable of emulating real user behavior on a website very precisely, we will focus on browser synthetic monitoring.

What business goals can we achieve with browser monitoring?

There are a multitude of goals that can be achieved, depending on what business we are running or expect to monitor, but some examples include:

Improving user experience

Browser monitoring helps ensure that users have a fast, smooth, and reliable experience on a website or web application. A positive user experience leads to higher user satisfaction and a greater likelihood of repeated visits or purchases.

Ensuring cross-browser and cross-device compatibility

Users access websites from a host of browsers and devices. Browser monitoring helps to detect compatibility issues that could affect certain users (e.g., JavaScript errors on specific browsers or layout shifts on mobile). By monitoring these scenarios, we can deliver a consistent experience across platforms, which is essential as multi-device usage continues to grow.

E-commerce checkout monitoring

Retailers can ensure a smooth checkout process by monitoring page load times, form interactions, and payment processing to confirm that users can easily complete purchases.

Form performance

Browser monitoring makes it easy to detect any issues preventing form completion, such as slow response times or broken validation. It also ensures a smooth, error-free experience to improve lead capture and gain more conversions.

Subscription renewal page monitoring

Subscription-based businesses rely on customers regularly renewing or upgrading their plans. Monitoring the subscription renewal page for load speed, usability, and any payment processing issues is essential, as issues on this page can directly the amount of renewals and lead to customer loss.

Supporting portal uptime

Many businesses provide a customer support portal where users can submit requests or use a knowledge database. Downtime or slow response times can lead to frustrated customers and an increased number of complaints.

How to set up browser monitoring

There are a lot of goals we can reach, but the question remains – how can we reach them with Zabbix? The answer is that we can use the already mentioned and newly introduced browser item.

Browser item configuration window

Browser items gather information by running custom JavaScript code and fetching data via HTTP or HTTPS protocols. These items can mimic browser activities like clicking buttons, typing text, navigating across webpages, and performing other user interactions within websites or web applications.

Along with the script, users can specify optional parameters (name-value pairs) and set a timeout limit for the actions. But before we can actually use the item, we will need to configure Zabbix server or Zabbix proxy with a WebDriver, so that Zabbix can actually control a browser through scripts.

What is a WebDriver? A WebDriver controls a browser directly, mimicking user interactions through a local machine or on a remote server, enabling full browser automation. The term WebDriver includes both the language-specific bindings and the individual browser control implementations, often simply called WebDriver. WebDriver is designed to offer a straightforward and streamlined programming interface through an object-oriented API which efficiently manages and drives browser actions.

In this guide, for instance, we’ll use a WebDriver with Chrome within a Docker container and make a script that includes actions like button clicks and text entry.

WebDriver installation

One of the simplest ways to install a WebDriver is to use containers. To install a chrome WebDriver on a local or remote machine, you can use Docker or any other preferred container engines:

# podman run --name webdriver -d \
-p 4444:4444 \
-p 7900:7900 \
--shm-size="2g" \
--restart=always -d docker.io/selenium/standalone-chrome:latest

Port 4444 will be the port on which the WebDriver will be listening and port 7900 will be used by NoVNC, which allows us to observe browser behavior in case a browser with a GUI is used.

Zabbix server/proxy configuration

After WebDriver is installed, we need to set up the communication between Zabbix and the driver. This can be done by editing the Zabbix server/proxy configuration file and updating the following parameters:

### Option: WebDriverURL
#       WebDriver interface HTTP[S] URL. For example http://localhost:4444 used with 
#       Selenium WebDriver standalone server.
#
# WebDriverURL=
WebDriverURL=http://localhost:4444

### Option: StartBrowserPollers
#       Number of pre-forked instances of browser item pollers.
#
# Range: 0-1000
# StartBrowserPollers=1
StartBrowserPollers=5

With the configuration parameters in place, we will now configure our Browser item to collect and monitor the list of upcoming Zabbix trainings from the training schedule page.

Creating a host

First, we need to navigate to the “Data collection” > “Hosts” section and create a host that represents our web page. This is more than anything – a logical representation. This means we don’t need any specific interfaces or additional configuration. The host in our example will look like this:

Training page monitoring host

Creating a browser item

Since the data collection is done by items, we need to navigate to the “Items” section on the “Zabbix training schedule” host and create an item with the type “Browser.” It should look something like this:

Training schedule browser item

Now comes the most important part – creating the script to monitor the schedule. Click on the “Script” field.

First, we will need to define what browser we will use, and any extra options we might want to specify, like screen resolution or whether the browser should run in headless mode or not. This can be done using the Browser object. The Browser object manages WebDriver sessions and initializes a session upon creation, then terminates it upon destruction. A single script can support up to four Browser objects.

var browser, result;
var  opts = Browser.chromeOptions();
opts.capabilities.alwaysMatch['goog:chromeOptions'].args = []
browser = new Browser(opts);
browser.setScreenSize(Number(1980), Number(1020));

In this snippet, we defined that we will use the Chrome browser with a GUI. As you can see, the screen size is set to the pretty common 1980x1020p.

Now we will need to define what the browser will be doing. This can be done by using such Browser object methods as navigate – to point to the correct URL of the web page or application and (for example) findElement/findElements to return some element of the web page.

findElement/findElements methods allow us to define strategies to locate an element and selectors to provide what to look for. Strategies and selectors can be of multiple kinds:
strategy – (string, CSS selector/link text/partial link text/tag name/Xpath)
selector – (string) Element selector using the specified location strategy

Let’s take a look at the next snippet:

try {
    browser.navigate("https://www.zabbix.com/");
    browser.collectPerfEntries("open page");

    el = browser.findElement("xpath", "//span[text()='Training']");
    if (el === null) {
     throw Error("cannot find training");
    }
    el.click();

    el = browser.findElement("link text", "Schedule");
    if (el === null) {
        throw Error("cannot find application form");
    }
    el.click();

In this snippet,

I am using a browser to navigate to the Zabbix page.
I collect a range of performance entries related to opening the page (download speed, response time, etc.).
I look for an element with the text “Training” using the XPath strategy, and the selector “Training.”
I click on it, which is a method to interact with elements.
In the next part, I use the strategy “link text” to find a link with the text selector “Schedule.”
I click on it

A visual description would look like this:

Browser interaction with the zabbix.com website

Now, let’s do some more clicking to filter out all other trainings and leave only trainings in Korean and Dutch:

    el = browser.findElement("link text", "English");
    if (el === null) {
        throw Error("cannot find application form");
    }
    el.click();

    el = browser.findElement("xpath", "//span[text()='English']");
    if (el === null) {
        throw Error("cannot find application form");
    }
    el.click();

    el = browser.findElement("xpath", "//span[text()='Korean']");
    if (el === null) {
        throw Error("cannot find application form");
    }
    el.click();

    el = browser.findElement("xpath", "//span[text()='Dutch']");
    if (el === null) {
        throw Error("cannot find password input field");
    }
    el.click();

    Zabbix.sleep(2000);

English is selected by default, so the script “unclicks” it. Then it selects Korean and Dutch and uses the sleep function to have some extra time for the page to load and make a screenshot of the currently opened page:

List of trainings with language filters applied on it

Now let’s get the list of dates so we can monitor which trainings we have left in 2024:

el = browser.findElements("xpath", "//*[contains(text(), ' 20')]");
var dates = [];
for (var n = 0; n < el.length; n++) { 
    dates.push(el[n].getText('2024')); 
}

// Remove entries that do not contain "2024"
dates = dates.filter(function(date) {
    return date.includes('2024');
});

dates = uniq(dates);

In this case we do a bit of a jump, and now search for all elements that contain text 20 (to include all years), but filter them out by year 2024 specifically, which later can be easily replaced with 2025. The end result contains all the upcoming training dates:

Items containing the upcoming training dates

The full host export with the script snippet can be found by following this link.

An additional example

But what if I want to fill in a form? Maybe to make a purchase, create an order, or just test a contact form? Good news – that’s an even simpler operation! Let’s take a look at this snippet:

// enter name
var el = browser.findElement("xpath", "//label[text()='First Name']/following::input");
if (el === null) {throw Error("cannot find name input field");}
el.sendKeys("Aleksandrs");

// enter last name
var el = browser.findElement("xpath", "//label[text()='Last name']/following::input");
if (el === null) {throw Error("cannot find name input field");}
el.sendKeys("Petrovs-Gavrilovs");

// enter cert number
var el = browser.findElement("xpath", "//label[text()='Certificate number']/following::input");
if (el === null) {throw Error("cannot find name input field");}
el.sendKeys("CT-2404-003");

// select version
var el = browser.findElement("css selector", "form#certificate_validation>fieldset>div:nth-of-type(5)>select");
if (el === null) {throw Error("cannot find name input field");}
el.sendKeys("7.0");

// check certificate
var el = browser.findElement("xpath", "//button[text()='Check Certificate']");
if (el === null) {throw Error("cannot find name input field");}
el.click();

This way, I can validate that my certificate is still valid!

As you can see, there are multiple ways to make a browser emulate user behavior and allow us to validate whether our pages and businesses are performing the way we expect them to! You can find even more examples in Zabbix documentation and Zabbix Certified Training, which I welcome you to attend!

The post An Introduction to Browser Monitoring appeared first on Zabbix Blog.

What’s Up, Home? – Zabbix Plays Rock-Paper-Scissors

Janne Pikkarainen — Fri, 29 Nov 2024 08:30:04 +0000

Zabbix 7.0 is so fast that in a small environment such as What’s up, home? it gets bored. Very bored.

What does Zabbix do when it gets bored? It uses its new Selenium-based Browser item type and plays some Rock-Paper-Scissors against this blog site.

But how does that work?

The idea is simple. My website hosts a very simple PHP script which returns back a random value of “Rock”, “Paper” or “Scissors”. Likewise, my Zabbix Selenium test picks up a random word out of those. Then, the Selenium test checks both answers and gets back the result.

So, in all seriousness, this blog post demonstrates you how the new Browser item type can react to different responses.

Backend code

Here’s the PHP script in all its g(l)ory:

Whatsuphome.fi :: rock-paper-scissors

Nothing to call home about in that script: array with three choices, pick a random choice, print the result, done.

Zabbix side

I created a new Browser type item like this:

… and here’s the script part I just hammered in, so there might or might not be bugs. I really did not test this very thoroughly.

var browser = new Browser(Browser.chromeOptions());

const moves = ["Rock", "Scissors", "Paper"];

const zabbixMove = moves[Math.floor(Math.random() * moves.length)];

try {

browser.navigate("https://whatsuphome.fi/rps.php");

var opponentMove = browser.findElement("xpath", "//p").getText();

if (zabbixMove === opponentMove) {

var winner = "Draw";

}

else if (

(zabbixMove === "Rock" && opponentMove === "Scissors") ||

(zabbixMove === "Scissors" && opponentMove === "Paper") ||

(zabbixMove === "Paper" && opponentMove === "Rock")

) {

var winner="Zabbix";

}

else {

var winner="Opponent";

}

}

finally {

return ("Winner is " + winner + ". Zabbix move was " + zabbixMove + " and opponent move was " + opponentMove);

}

That’s it! From now on my Zabbix will play the game once per hour, although for this blog post I did manually click the Execute now button a few times. Again, here’s the same screenshot that was also in the beginning of this blog post.

Happy gaming!

The post What’s Up, Home? – Zabbix Plays Rock-Paper-Scissors appeared first on Zabbix Blog.

Open Source: The Option for a Connected and Collaborative World

Luciano Alves — Tue, 26 Nov 2024 12:08:31 +0000

In my previous article, where we explored the TCO and ROI of open-source software, I raised topics that sparked substantive discussions, new research, and renewed insights. It is undeniable that we live in an era where collaboration and connectivity go beyond trends. They represent the foundation of current technology, especially in a world based on APIs.

In this context, open-source software stands out and positions itself as a logical and natural choice for companies and organizations (both public and private) that seek innovation, flexibility, security, and agility. Over the last two decades, the technology sector has validated this direction. Recently, the Open Source Program Office (OSPO) appeared in Gartner’s Hype Cycle for Emerging Technologies report, reinforcing its relevance and emerging as a maturing trend within 2 to 5 years.

Open Source in Gartner’s Hype Cycle

Gartner’s Hype Cycle for Emerging Technologies is a well-known tool for illustrating the phases of maturity, adoption, and impact of new technologies. In the current cycle, the Open Source Program Office (OSPO) appears as an emerging technology with the potential for corporate transformation in the coming years.

This highlights that it is not only a viable alternative to proprietary software, but an engine of innovation within organizations. The OSPO is, essentially, an internal structure in companies dedicated to promoting and managing the use of open-source software, ensuring compliance and governance.

With the strengthening of these structures, organizations not only maximize the benefits of open source but also foster a culture of continuous innovation and active collaboration with communities, whether through service contracts, participation in working groups, or even funding new functionalities.

A Natural Strategic Choice

Experience shows that open source is a strategic path for organizations aiming to thrive in an increasingly interconnected and competitive market. The transparency, flexibility, and scalability offered by such solutions surpass the limitations of proprietary solutions, facilitating a more adaptable and agile adoption.

Additionally, the collaborative approach of this model aligns with today’s reality, where knowledge sharing and co-creation are essential for technological development within organizations. Companies like Google, Microsoft, and Red Hat have already recognized this reality and invest in their own Open Source Program Offices. These initiatives not only underline the commitment to open innovation but also highlight tangible benefits in terms of efficiency, cost reduction, and speed in the development of innovations.

The Future is Open Source

The inclusion of OSPO in Gartner’s Hype Cycle indicates that companies that have not yet embarked on this journey need to reconsider their strategies. In an environment where constant adaptation and innovation are essential for growth and efficiency, open source has ceased to be optional and has become a necessity. As adoption expands across various sectors and applications, companies that build a solid framework for evaluating and maximizing the benefits of these technologies will be in a privileged position to lead their markets.

At Zabbix, we understand the importance of open source not just as a technological solution, but as a philosophy aimed at democratizing technology, fostering continuous innovation, and cultivating a culture of collaboration—a vision that OSPOs have been solidifying in companies across multiple industries. The discussion about the Total Cost of Ownership (TCO) and Return on Investment (ROI) in open-source solutions is just the starting point.

Tools like Zabbix prove that this is an effective strategy for monitoring and maintaining critical environments. Open source is, and will continue to be, the driving force behind the innovations that will transform the way companies sustain their businesses and interact with customers and users. The future is already open source, and the time to embrace this transformation is now.

The post Open Source: The Option for a Connected and Collaborative World appeared first on Zabbix Blog.

Monitoring VMware vSphere with Zabbix

Mateusz Romaniuk — Wed, 20 Nov 2024 09:01:16 +0000

Zabbix is an open-source monitoring tool designed to oversee multiple IT infrastructure components, including networks, servers, virtual machines, and cloud services. It operates using both agent-based and agentless monitoring methods. Agents can be installed on monitored devices to collect performance data and report back to a centralized Zabbix server.

Zabbix provides comprehensive integration capabilities for monitoring VMware environments, including ESXi hypervisors, vCenter servers, and virtual machines (VMs). This integration allows administrators to effectively track performance metrics and resource usage across their VMware infrastructure.

In this post, I will show you how to set up Zabbix monitoring with a VMware vSphere infrastructure.

Table of Contents

Requirements:

Zabbix server
Access to the VMware vCenter Server

Step one: Create a Zabbix service user in the vCenter

First things first, let’s create a service user on the vCenter that will be used by the Zabbix server to collect data. To make life easier, in my lab setup the user zabbix@vsphere.local will have full Administrator privileges. Read-only permissions should be enough, however.

1. In the vSphere Client, choose Menu -> Administration -> Users and Groups. From the Users tab, select Domain vsphere.local, and click the ADD button to add a new user.

2. Type a username and password. Click ADD to create a new user.

3. Change the tab to Groups and select the Administrators group.

4. Find a new user zabbix, click on it and save. The user is added to the Administrators group.

5. From the Host and Clusters view, choose vCenter name and go to the Permissions tab. Click the Add button.

6. Choose a proper domain (vsphere.local), find the user zabbix, set the role to Administrator, and check Propagate to children. Click OK to give those permissions.

Step two: Make changes on the Zabbix server

Next, we need to edit zabbix_server.conf. In this file we need to enable the vmware collector process. It’s necessary to start VMware monitoring. FYI, I have installed Zabbix server in version 7.0.4.

1. Edit a configuration file zabbix_server.conf

vim /etc/zabbix/zabbix_server.conf

2. Find the StartVMwareCollectors parameter, delete “#” before it and change the value from 0 to at least 2. Save the file and exit.

Except for StartVMwareCollectors which is mandatory, it’s possible to enable and modify additional VMware parameters. You can find more details about them HERE.
VMwareCacheSize
VMwareFrequency
VMwarePerfFrequency
VMwareTimeout

3. Restart the zabbix-server service.

systemctl restart zabbix-server

Step three: Configure the VMware template on Zabbix

1.Log in to the Zabbix server via GUI – http://zabbix_server/zabbix. Go to the Hosts section under the Monitoring tab.

2. Create a new “Host.” Click Create Host in the right upper corner.

3. In the Host tab provide the following details:

Host name – type the name of the system that we want to monitor – here it is VMware Infrastructure.
Templates – type/find template name “VMware”, more info about VMware template you can find HERE.
Host groups – find/type “VMware(new)” host group.

At this point, go to the Macros tab.

4. In the Macros tab you need to provide 3 values/macros. These macros describes data that is needed to connect Zabbix to the VMware vCenter:

{$VMWARE.URL} – VMware service (vCenter or ESXi hypervisor) SDK URL (https://servername/sdk) that we want to connect.
{$VMWARE.USERNAME} – VMware service username created in the 1 section.
{$VMWARE.PASSWORD} – VMware service user password created in the 1 section.

Click the Add button.

5. A new Host was created and data collection is in progress.

6. Depending on the size of the infrastructure, data collection takes different amounts of time. Once configured, Zabbix will automatically discover VMs and begin collecting performance data. You can find an overview of the latest data in the Dashboard screen.

7. More specific and detailed data can be found in Latest data under the Monitoring tab.

In Host groups or Hosts choose the name of the item you are looking for (you can also click the “Select” button). Select the name of the ESXi host, the virtual machine, the vCenter name, the datastore, or all VMware information.

Zabbix can collect multiple metrics from VMware using its built-in templates. These metrics include:

– CPU usage
– Memory consumption
– Disk I/O statistics
– Network traffic
– Datastore capacity

In conclusion

Integrating Zabbix with VMware provides a robust solution for monitoring virtualized environments and enhancing visibility into system performance and resource utilization, while enabling timely alerts and responses to operational issues.

The post Monitoring VMware vSphere with Zabbix appeared first on Zabbix Blog.

Using the zabbix_utils Library for Tool Development

Aleksandr Iantsen — Tue, 12 Nov 2024 12:16:49 +0000

In this article, we will explore a practical example of using the zabbix_utils library to solve a non-trivial task – obtaining a list of alert recipients for triggers associated with a specific Zabbix host. You will learn how to easily automate the process of collecting this information, and see examples of real code that can be adapted to your needs.

Table of Contents

Over the last year, the zabbix_utils library has become one of the most popular tools for working with the Zabbix API. It is a convenient tool that simplifies interacting with the Zabbix server, proxy, or agent, especially for those who automate monitoring and management tasks.

Due to its ease of use and extensive functionality, zabbix_utils has found a following among system administrators, monitoring, and DevOps engineers. According to data from PyPI, the library has already been downloaded over 140,000 times since its release, confirming its demand within the community. It’s all thanks to you and your attention to zabbix_utils!

Task Description

Administrators often need to check which Zabbix users receive alerts for specific triggers in the Zabbix monitoring system. This can be useful for auditing, configuring new notifications, or simply for a quick diagnosis of issues. The task becomes especially relevant when you have plenty of hosts containing numerous triggers, and manually checking the recipients for each trigger through the Zabbix interface becomes very time-consuming.

In such cases, it is advisable to use a custom solution based on the Zabbix API. You can directly access all the required data using the API, and then use additional logic to determine the final alert recipients. The zabbix_utils library makes working with the Zabbix API more convenient and allows you to automate this process. In this project, we use the zabbix_utils library to write a Python script that collects a list of alert recipients for the triggers of the selected Zabbix host. This will allow you to obtain the necessary information faster and with minimal effort.

Environment Setup and Installation

To get started with zabbix_utils, you need to install the library and configure the connection to the Zabbix API. This article provides more details and examples on getting started with the library. However, it would be better if I describe the basic steps to prepare the environment here.

The library supports several installation methods described in the official README, making it convenient for use in different environments.

1. Installation via pip

The simplest and most common installation method is using the pip package manager. To do this, execute the command:

~$ pip install zabbix_utils

To install all necessary dependencies for asynchronous work, you can use the command:

~$ pip install zabbix_utils[async]

This method is suitable for most users, as pip automatically installs all required dependencies.

2. Installation from Zabbix Repository

Since writing the previous articles, we have added one more installation method – from the official Zabbix repository. First and foremost, you need to add the repository to your system if it has not been installed yet. Official Zabbix packages for Red Hat Enterprise Linux and Debian-based distributions are available on the Zabbix website.

For Red Hat Enterprise Linux and derivatives:

~# dnf install python3-zabbix-utils

For Debian / Ubuntu and derivatives:

~# apt install python3-zabbix-utils

3. Installation from Source Code

If you require the latest version of the library that has not yet been published on PyPI, or you want to customize the code, you can install the library directly from GitHub:

1. Clone the repository from GitHub:

~$ git clone https://github.com/zabbix/python-zabbix-utils

2. Navigate to the project folder:

~$ cd python-zabbix-utils/

3. Install the library by executing the command:

~$ python3 setup.py install

4. Testing the Connection to Zabbix API

After installing zabbix_utils, it is a good idea to check the connection to your Zabbix server via the API. To do this, use the URL to the Zabbix server, the token, or the username and password of the user who has permission to access the Zabbix API.

Example code for checking the connection:

from zabbix_utils import ZabbixAPI

ZABBIX_AUTH = {
    "url": "your_zabbix_server",
    "user": "your_username",
    "password": "your_password"
}
api = ZabbixAPI(**ZABBIX_AUTH)
hosts = api.host.get(
    output=['hostid', 'name']
)
print(hosts)
api.logout()

Main Steps of the Task Solution

Now that the environment is set up, let’s look at the main steps for solving the task of retrieving the list of alert recipients for triggers associated with a specific Zabbix host in Zabbix.

In zabbix_utils, asynchronous API interaction support is built in through the AsyncZabbixAPI class. This allows multiple requests to be sent simultaneously and their results to be handled as they become ready, significantly reducing latencies when making multiple API calls. Therefore, we will use the AsyncZabbixAPI class and the asynchronous approach in this project.

Below are the main steps for solving the task, and code examples for each step. Please note that the code in this project is for demonstration purposes, may not be optimal, or could contain errors. Use it as an example or a base for your project, but not as a complete tool.

Step 1. Obtain Host ID

The first step is to identify the host for which we will retrieve information about triggers and alerts. We need to find the hostid using its name/host to do this. The Zabbix API provides a method to obtain this information, and using zabbix_utils makes this process much simpler.

Example of obtaining the host ID by its name:

host = api.host.get(
    output=["hostid"],
    filter={"name": "your_host_name"}
)

This method returns a unique identifier for the host, which can be used further. However, for our test project, we will use a manually specified host identifier.

Step 2. Retrieve Host Triggers

With the hostid in hand, the next step is to retrieve all triggers associated with this host. Triggers contain the conditions that trigger the alerts. We need to collect information about all triggers so that we can then use it to select actions that match all the conditions.

Example of retrieving node triggers:

triggers = api.trigger.get(
    hostids=[hostid],
    selectTags="extend",
    selectHosts=["hostid"],
    selectHostGroups=["groupid"],
    selectDiscoveryRule=["templateid"],
    output="extend",
)

This request returns complete information about the triggers for the host. We get not only the triggers but also their tags, associated host and host groups, and discovery rule information. All this information will be necessary to check the conditions of the actions.

Step 3. Initialize Trigger Metadata

At this stage, objects for each trigger are created to store their metadata. This is done using the Trigger class, which includes information about the trigger such as its name, ID, associated host groups, hosts, tags, templates, and operations.

Here’s the code defining the Trigger class:

class Trigger:
    def __init__(self, trigger):
        self.name = trigger["description"]
        self.triggerid = trigger["triggerid"]
        self.hostgroups = [g["groupid"] for g in trigger["hostgroups"]]
        self.hosts = [h["hostid"] for h in trigger["hosts"]]
        self.tags = {t["tag"]: t["value"] for t in trigger["tags"]}
        self.tmpl_triggerid = self.triggerid
        self.lld_rule = trigger["discoveryRule"] or {}
        if trigger["templateid"] != "0":
            self.tmpl_triggerid = trigger["templateid"]
        self.templates = []
        self.messages = []
        self._conditions = {
            "0": self.hostgroups,
            "1": self.hosts,
            "2": [self.triggerid],
            "3": trigger["event_name"] or trigger["description"],
            "4": trigger["priority"],
            "13": self.templates,
            "25": self.tags.keys(),
            "26": self.tags,
        }

    def eval_condition(self, operator, value, trigger_data):
        # equals or does not equal
        if operator in ["0", "1"]:
            equals = operator == "0"
            if isinstance(value, dict) and isinstance(
                trigger_data, dict):
                if value["tag"] in trigger_data:
                    if value["value"] == trigger_data[
                        value["tag"]]:
                        return equals
            elif value in trigger_data and isinstance(
                trigger_data, list):
                return equals
            elif value == trigger_data:
                return equals
            return not equals
        # contains or does not contain
        if operator in ["2", "3"]:
            contains = operator == "2"
            if isinstance(value, dict) and isinstance(
                trigger_data, dict):
                if value["tag"] in trigger_data:
                    if value["value"] in trigger_data[
                        value["tag"]]:
                        return contains
            elif value in trigger_data:
                return contains
            return not contains
 
        # is greater/less than or equals
        if operator in ["5", "6"]:
            greater = operator != "5"
            try:
                if int(value) < int(trigger_data):
                    return not greater
                if int(value) == int(trigger_data):
                    return True
                if int(value) > int(trigger_data):
                    return greater
            except:
                raise ValueError(
                    "Values must be numbers to compare them"
                )
 
    def select_templates(self, templates):
        for template in templates:
            if self.tmpl_triggerid in [
                t["triggerid"] for t in template["triggers"]]:
                self.templates.append(template["templateid"])
            if self.lld_rule.get("templateid") in [
                d["itemid"] for d in template["discoveries"]
            ]:
                self.templates.append(template["templateid"])

    def select_actions(self, actions):
        selected_actions = []
        for action in actions:
            conditions = []
            if "filter" in action:
                conditions = action["filter"]["conditions"]
                eval_formula = action["filter"]["eval_formula"]
            # Add actions without conditions directly
            if not conditions:
                selected_actions.append(action)
                continue
            condition_check = {}
            for condition in conditions:
                if (
                    condition["conditiontype"] != "6"
                    and condition["conditiontype"] != "16"
                ):
                    if (
                        condition["conditiontype"] == "26"
                        and isinstance(condition["value"], str)
                    ):
                        condition["value"] = {
                            "tag": condition["value2"],
                            "value": condition["value"],
                        }
                    if condition["conditiontype"] in self._conditions:
                        condition_check[
                            condition["formulaid"]
                        ] = self.eval_condition(
                            condition["operator"],
                            condition["value"],
                            self._conditions[
                                condition["conditiontype"]
                            ],
                        )
                else:
                    condition_check[
                        condition["formulaid"]
                    ] = True
            for formulaid, bool_result in condition_check.items():
                eval_formula = eval_formula.replace(
                    formulaid, str(bool_result))

            # Evaluate the final condition formula
            if eval(eval_formula):
                selected_actions.append(action)

        return selected_actions
 
    def select_operations(self, actions, mediatypes):
        messages_metadata = []
        for action in self.select_actions(actions):
            messages_metadata += self.check_operations(
                "operations", action, mediatypes
            )
            messages_metadata += self.check_operations(
                "update_operations", action, mediatypes
            )
            messages_metadata += self.check_operations(
                "recovery_operations", action, mediatypes
            )
        return messages_metadata


    def check_operations(self, optype, action, mediatypes):
        messages_metadata = []
        optype_mapping = {
            "operations": "0",  # Problem event
            "recovery_operations": "1",  # Recovery event
            "update_operations": "2",  # Update event
        }

        operations = copy.deepcopy(action[optype])

        # Processing "notify all involved" scenarios
        for idx, _ in enumerate(operations):
            if operations[idx]["operationtype"] not in ["11", "12"]:
                continue
            # Copy operation as a template for reuse
            op_template = copy.deepcopy(operations[idx])
            del operations[idx]
            # Checking for message sending operations
            for key in [
                k for k in ["operations", "update_operations"] if k != optype
            ]:
                if not action[key]:
                    continue
                # Checking for message sending type operations
                for op in [
                    o for o in action[key] if o["operationtype"] == "0"
                ]:
                    # Copy template for the current operation
                    operation = copy.deepcopy(op_template)
                    operation.update(
                        {
                            "operationtype": "0",
                            "opmessage_usr": op["opmessage_usr"],
                            "opmessage_grp": op["opmessage_grp"],
                        }
                    )
                    operation["opmessage"]["mediatypeid"] = op[
                        "opmessage"
                    ]["mediatypeid"]
                    operations.append(operation)
        for operation in operations:
            if operation["operationtype"] != "0":
                continue
            # Processing "all mediatypes" scenario
            if operation["opmessage"]["mediatypeid"] == "0":
                for mediatype in mediatypes:
                    operation["opmessage"]["mediatypeid"] = mediatype[
                        "mediatypeid"
                    ]
                    messages_metadata.append(
                        self.create_messages(
                            optype_mapping[optype], action, operation, [
                                mediatype
                            ]
                        )
                    )
            else:
                messages_metadata.append(
                    self.create_messages(
                        optype_mapping[optype],
                        action,
                        operation,
                        mediatypes
                    )
                )
        return messages_metadata
 
    def create_messages(self, optype, action, operation, mediatypes):
        message = Message(optype, action, operation)
        message.select_mediatypes(mediatypes)
        self.messages.append(message)
        return message

The code for creating Trigger class objects for each of the retrieved triggers:

for trigger in triggers:
    triggers_metadata[trigger["triggerid"]] = Trigger(trigger)

This loop iterates through all triggers and saves them in a dictionary called triggers_metadata, where the key is the triggerid and the value is the trigger object.

Step 4. Retrieve Template Information

The next step is to obtain data about the templates associated with all the triggers:

templates = api.template.get(
    triggerids=list(set([t.tmpl_triggerid for t in triggers_metadata.values()])),
    selectTriggers=["triggerid"],
    selectDiscoveries=["itemid"],
    output=["templateid"],
)

This request returns information about all templates linked to the host’s triggers being examined. Executing a single query for all triggers is a more optimal solution than making individual requests for each trigger. This information will be needed for evaluating the “Template” condition in actions.

Step 5. Get Actions and Media Types

Next, we obtain the list of actions and media types configured in the system:

actions = api.action.get(
    selectFilter="extend",
    selectOperations="extend",
    selectRecoveryOperations="extend",
    selectUpdateOperations="extend",
    filter={"eventsource": 0, "status": 0},
    output=["actionid", "esc_period", "eval_formula", "name"],
)


mediatypes = api.mediatype.get(
    selectUsers="extend",
    selectActions="extend",
    selectMessageTemplates="extend",
    filter={"status": 0},
    output=["mediatypeid", "name"],
)

Here we retrieve actions that define how and to whom alerts are sent, and mediatypes through which users can receive notifications (for example, email or SMS).

Step 6. Match Triggers with Templates and Actions

At this stage, each trigger is associated with the corresponding templates and actions:

for trigger in triggers_metadata.values():
    trigger.select_templates(templates)
    messages += trigger.select_operations(actions, mediatypes)

Here, for each trigger, we update information about its templates and configured actions for sending notifications. The list of associated actions is determined by checking the conditions specified in them against the accumulated data for each trigger.

For each operation of the corresponding trigger action, a Message class object is created:

class Message:
    def __init__(self, optype, action, operation):
        self.optype = optype
        self.mediatypename = ""
        self.actionid = action["actionid"]
        self.actionname = action["name"]
        self.operationid = operation["operationid"]
        self.mediatypeid = operation["opmessage"]["mediatypeid"]
        self.subject = operation["opmessage"]["subject"]
        self.message = operation["opmessage"]["message"]
        self.default_msg = operation["opmessage"]["default_msg"]
        self.users = [u["userid"] for u in operation["opmessage_usr"]]
        self.groups = [g["usrgrpid"] for g in operation["opmessage_grp"]]
        self.recipients = []
        # Escalation period set to action's period if not specified
        self.esc_period = operation.get("esc_period", "0")
        if self.esc_period == "0":
            self.esc_period = action["esc_period"]
        # Use action's escalation period if unset
        self.esc_step_from = self.multiply_time(
            self.esc_period, int(operation.get("esc_step_from", "1")) - 1
        )
        if operation.get("esc_step_to", "0") != "0":
            self.repeat_count = str(
                int(operation["esc_step_to"]) - int(operation["esc_step_from"]) + 1
            )
        # If not a problem event, set repeat count to 1
        elif self.optype != "0":
            self.repeat_count = "1"
        # Infinite repeat count if esc_step_to is 0
        else:
            self.repeat_count = “∞”
 
    def multiply_time(self, time_str, multiplier):
        # Multiply numbers within the time string
        result = re.sub(
            r"(\d+)",
            lambda m: str(int(m.group(1)) * multiplier),
            time_str
        )
        if result[0] == "0":
            return "0"
        return result
 
    def select_mediatypes(self, mediatypes):
        for mediatype in mediatypes:
            if mediatype["mediatypeid"] == self.mediatypeid:
                self.mediatypename = mediatype["name"]
                # Select message templates related to operation type
                msg_template = [
                    m
                    for m in mediatype["message_templates"]
                    if (
                        m["recovery"] == self.optype 
                        and m["eventsource"] == "0"
                    )
                ]
                # Use default message if applicable
                if msg_template and self.default_msg == "1":
                    self.subject = msg_template[0]["subject"]
                    self.message = msg_template[0]["message"]
 
    def select_recipients(self, user_groups, recipients):
        for groupid in self.groups:
            if groupid in user_groups:
                self.users += user_groups[groupid]
        for userid in self.users:
            if userid in recipients:
                recipient = copy.deepcopy(recipients[userid])
                if self.mediatypeid in recipient.sendto:
                    recipient.mediatype = True
                self.recipients.append(recipient)

Each such object represents a separate message sent to users (recipients) and will contain all message information – its subject, text, recipients, and escalation parameters.

Step 7. Collect User and Group Identifiers

After matching the triggers with actions, the process of collecting unique identifiers for users and groups starts:

userids = set()
groupids = set()

for message in messages:
    userids.update(message.users)
    groupids.update(message.groups)

This code snippet collects the IDs of all users and groups involved in the operations for each trigger. This is necessary to perform only one request to the Zabbix API for all involved users and their groups, rather than making separate requests for each trigger.

Step 8. Obtain User and Group Information

The next step is to collect detailed information about users and user groups:

usergroups = {
    group["usrgrpid"]: group
    for group in api.usergroup.get(
        selectUsers=["userid"],
        selectHostGroupRights="extend",
        output=["usrgrpid", "role"],
    )
}
 
users = {
    user["userid"]: user
    for user in api.user.get(
        selectUsrgrps=["usrgrpid"],
        selectMedias=["mediatypeid", "active", "sendto"],
        selectRole=["roleid", "type"],
        filter={"status": 0},
        output=["userid", "username", "name", "surname"],
    )
}

Here we gather data about users, including their role and media types through which they receive notifications, as well as data about user groups, including access rights to host groups and the list of users in each group. All this information will be needed to check access to the host with the triggers we are working with.

Step 9. Match Users and Groups with Triggers

After obtaining user information, we match users and groups with their respective rights to receive notifications. Here we also link users with groups, updating the information regarding rights and groups for each user.

for userid in userids:
    if userid in users:
        user = users[userid]
        recipients[userid] = Recipient(user)
        for group in user["usrgrps"]:
            if group["usrgrpid"] in usergroups:
                recipients[userid].permissions.update([
                    h["id"]
                    for h in usergroups[group["usrgrpid"]]["hostgroup_rights"]
                    if int(h["permission"]) > 1
                ])
 
for groupid in groupids:
    if groupid in usergroups:
        group = usergroups[groupid]
        user_groups[group["usrgrpid"]] = []
        for user in group["users"]:
            user_groups[group["usrgrpid"]].append(user["userid"])
            if user["userid"] in recipients:
                recipients[user["userid"]].groups.update(group["usrgrpid"])
            elif user["userid"] in users:
                recipients[user["userid"]] = Recipient(users[user["userid"]])
            recipients[user["userid"]].permissions.update([
                h["id"]
                for h in group["hostgroup_rights"]
                if int(h["permission"]) > 1
            ])

This code fragment connects each user with their groups and vice versa, creating a complete list of users with their access rights to the host, and thus their eligibility to receive notifications about events for this host.

For each recipient, a Recipient class object is created containing data about the recipient, such as the notification address, access rights to hosts, configured mediatypes, etc.

Here’s the code that describes the Recipient class:

class Recipient:
    def __init__(self, user):
        self.userid = user["userid"]
        self.username = user["username"]
        self.fullname = "{name} {surname}".format(**user).strip()
        self.type = user["role"]["type"]
        self.groups = set([g["usrgrpid"] for g in user["usrgrps"]])
        self.has_right = False
        self.permissions = set()
        self.sendto = {
            m["mediatypeid"]: m["sendto"] for m in user["medias"] if m["active"] == "0"
        }
        # Check if the user is a super admin (type 3)
        if self.type == "3":
            self.has_right = True

Step 10. Match Messages with Recipients

Finally, we match recipients with specific messages from Step 6:

for message in messages:
    message.select_recipients(user_groups, recipients)

This step completes the main process – each message is assigned to the relevant recipients.

Step 11. Check Recipient Access Rights and Output the Result

Before the actual output of the result with the list of recipients, we can perform a check of the recipients’ message rights and filter only those who have the corresponding rights to receive notifications for the events related to the trigger, or those who have all configured media types specified and active. After these actions, the information can be output in any convenient way – whether it be exporting to a file or displaying it on the screen:

for trigger in triggers_metadata.values():
    for message in trigger.messages:
        for recipient in message.recipients:
            recipient.show = True
            if not recipient.has_right:
                recipient.has_right = (len([gid
                    for gid in trigger.hostgroups
                    if gid in recipient.permissions
                ]) > 0)
            if not recipient.has_right and not show_unavail:
                recipient.show = False

Example Implementation

All the examples and code snippets described above have been compiled to create a solution demonstrating the algorithm for obtaining notification recipients for triggers associated with the selected host. We have implemented this algorithm as a simple web interface to make the result more illustrative and convenient for familiarization.

This interface allows users to enter the host’s ID. The script then processes the data and provides a list of notification recipients associated with the triggers on that host. The web interface uses asynchronous requests to the Zabbix API and the zabbix_utils library to ensure fast data processing and ease of use with many triggers and users.

This lets you familiarize yourself with the theoretical steps and code examples and also try to put this solution into action.

Please note once again that the code in this project is for demonstration purposes, may not be optimal, or could contain errors. Use it as an example or a base for your project, but not as a complete tool.

The web interface’s complete source code and installation instructions can be found on GitHub.

Conclusion

In this article, we explored a practical example of using the zabbix_utils library to solve the task of obtaining alert recipients for triggers associated with a selected Zabbix host using the Zabbix API. We detailed the key steps, from setting up the environment and initializing trigger metadata to working with notification recipients and optimizing performance with asynchronous requests.

Using zabbix_utils allowed us to optimize and accelerate interaction with the Zabbix API, expanding the capabilities of the Zabbix web interface and increasing efficiency when working with large volumes of data. Thanks to support for asynchronous processing and selective API requests, it is possible to significantly reduce the load on the server and improve system performance when working with Zabbix, which is especially important in large infrastructures.

We hope this example will assist you in implementing your own solutions based on the Zabbix API and zabbix_utils, and demonstrate the possibilities for optimizing your interaction with the Zabbix API.

The post Using the zabbix_utils Library for Tool Development appeared first on Zabbix Blog.

Maximizing TCO and ROI with Open-Source Solutions

Luciano Alves — Wed, 06 Nov 2024 09:00:40 +0000

In recent years, the debate around total cost of ownership (TCO) and return on investment (ROI) for open-source solutions has intensified, particularly within the scope of technology operations. With increasingly complex IT infrastructures and pressure to optimize costs, the choice between open-source and proprietary solutions has become a crucial strategic decision. By using a platform like Zabbix, which is both open-source and low-maintenance, multiple operational needs can be met, increasing returns.

Table of Contents

The use of open-source tools by area of operation

Let’s explore how different areas (disciplines or approaches) related to information technology can benefit from adopting Zabbix.

IT Operations (ITOps)

ITOps is the foundation of daily IT operations, responsible for maintaining and monitoring an organization’s technological infrastructure. Using a platform like Zabbix allows ITOps teams to continuously monitor the entire IT infrastructure, identifying and solving problems before they impact the business. Zabbix stands out as a cost-effective alternative by eliminating the need for expensive licenses while supporting the fulfillment of SLAs and improving operational efficiency.

Operational Technology (OT)

In the context of Operational Technology, which encompasses the supervision and control of industrial processes, Zabbix excels in its ability to monitor critical equipment and systems in real-time. Zabbix’s ability to integrate with a wide variety of devices and protocols makes it ideal for complex industrial environments where reliability and operational continuity are crucial. Moreover, Zabbix can be configured to send personalized alerts and reports, ensuring that all stakeholders are informed about the monitored environment.

IT Infrastructure Management

Managing physical or virtual IT infrastructure involves overseeing all components that keep the IT environment running, from servers and network equipment to cloud applications and services. Zabbix, with its ability to monitor both on-premise and cloud environments, offers a unified solution for managing and optimizing the entire technology infrastructure. Zabbix’s scalability also ensures that it can grow along with the company’s needs, but without the additional costs that often come with proprietary solutions.

IT Service Management (ITSM)

In ITSM disciplines, the focus is on efficiently delivering IT services that meet business needs. Zabbix integrates well with ITSM frameworks and tools, offering valuable data and insights that can be used to improve incident, problem, and change management. Zabbix’s ability to provide real-time monitoring and trend analysis can also directly contribute to the continuous improvement of IT services, resulting in a higher ROI.

Technology Operations

A broader term that encompasses both ITOps and OT, technology operations benefit from Zabbix through its versatility in monitoring a wide range of systems and devices. Whether supporting infrastructure evolution or managing critical configurations, Zabbix offers integrations with tools used to ensure that technology aligns with business goals, minimizing risks and maximizing operational efficiency.

Why going open-source is a winning strategy

Going open-source is a winning strategy for monitoring and operating critical environments because it offers transparency, security, flexibility, and rapid innovation through collaboration with a wide developer community. Let’s explore the details of each benefit.

Licensing Costs

One of the greatest advantages of open source solutions is the absence of licensing costs. Unlike proprietary solutions, which require significant initial and recurring investments, open source platforms allow companies to redirect those resources to other critical areas, such as infrastructure improvement and internal skills development.

Flexibility and Customization

In today’s dynamic environments, the ability to customize and adapt tools to specific business needs is a competitive differentiator. Open source solutions like Zabbix, for example, offer flexibility that is often lacking in proprietary alternatives.This customization not only meets operational demands but also avoids vendor lock-in, a common concern with closed solutions.

Support and Documentation

While both proprietary and open source solutions, like Zabbix, offer professional support and services to clients, open source communities have proven increasingly effective in creating content that shares knowledge and use cases for tools. IDC studies confirm that organizations adopting open source can achieve a positive ROI in less time, especially when they have or develop the necessary skills to manage these solutions internally. In the case of Zabbix, there is a career path with courses and certifications for interested professionals.

Integration and Scalability

Integrating open source tools into mission-critical environments can be more seamless and less costly in terms of both time and money, especially when organizations possess the necessary internal technical skills. Zabbix is also scalable, allowing growth without significant additional costs, in contrast to proprietary solutions that often require paid upgrades.

TCO and ROI: The Zabbix case

A recent comparative study by Gartner highlighted that open source solutions (such as Zabbix) often outperform proprietary alternatives in terms of TCO, particularly in long-term implementations. Furthermore, IDC reinforces that the ROI of open source solutions can be maximized when companies invest in training teams to effectively use and explore these tools.

Internal data shows that 80% of Zabbix users (non-clients) do not use more than 15% of the platform’s existing features. This same data also demonstrates that team training and the hiring of official services increase operational efficiency by over 35%. The discussion about TCO and ROI of open source solutions in technology operations is not just a trend but a reality that more and more organizations are exploring to maximize resources and increase competitiveness.

The post Maximizing TCO and ROI with Open-Source Solutions appeared first on Zabbix Blog.

Monitoring a Complex Infrastructure Environment with Zabbix

Nyein Chan Zaw — Wed, 30 Oct 2024 12:00:33 +0000

Inviting the members of our global community to share their Zabbix dashboards with us prompted a flood of fascinating responses, and we’re highlighting a few of the most interesting submissions here on our blog. This week’s entry comes to us from Nyein Chan Zaw, who is based in Bangkok, Thailand and works as an Infrastructure Specialist for Green Will Solution. Read on to see how he uses his Zabbix dashboard to monitor a highly intricate infrastructure in real time.

I appreciate the chance to share my dashboard, and I would also like to share a use case that demonstrates the practical implementation of Zabbix for real-time infrastructure monitoring.

This Zabbix dashboard provides a comprehensive view of the network’s real-time health, server availability, traffic patterns, and key performance metrics of essential infrastructure components. It is designed for monitoring production, office, and virtual server zones, including network devices, physical servers, and virtual machines. The current view is the first page of a two-page dashboard, which focuses on general network monitoring:

The second page is dedicated solely to monitoring infrastructure nodes:

Key features monitored

Traffic Monitoring: The dashboard tracks real-time traffic from critical network uplinks, including AIS and TRUE, offering visibility into bandwidth usage (e.g., 64.50 Kbps and 13.05 Kbps). It also monitors internal traffic and key devices like the FortiGate firewall, helping ensure optimal network performance and security.

Host Health Monitoring: CPU and memory utilization for top hosts (e.g., GW-WINDOW11, GW-AD-DOMAIN) are displayed, enabling efficient resource management. Alerts are triggered for high resource usage, allowing for a proactive response to performance issues.

Disk Usage: Disk space on key hosts, such as the Zabbix virtual machine and other core servers, is monitored to avoid file system over-utilization, which can lead to potential service interruptions.

Availability Overview: The dashboard provides a summary of host availability, including how many are available, unavailable, or have unknown statuses. Monitoring methods like active agent and SNMP are also shown, giving an overall view of network health.

Visual Topology Map: A detailed network map shows the production, office, virtual, and test zones, along with devices and connections. This visualization aids in quickly identifying problem areas and understanding how systems are interlinked.

Severity and Problem Monitoring: The dashboard classifies issues by severity, from critical problems to warnings. Real-time issues (such as VM downtime or system failures) are highlighted, enabling the team to resolve issues quickly.

Performance Metrics: Graphs display performance metrics, such as bandwidth usage and CPU load, offering insights into system bottlenecks or overuse, particularly in critical devices like firewalls.

Impact

This Zabbix dashboard enables an infrastructure team to efficiently monitor network performance, manage resource usage, and ensure device availability. The clear visual interface helps quickly identify issues, reducing downtime and ensuring higher reliability of critical services.

Conclusion

The first page of the dashboard demonstrates Zabbix’s capabilities for centralized monitoring across large infrastructures. By integrating data from network devices, servers, and virtual machines, it empowers IT teams to make informed decisions and address issues before they escalate. The second page provides a detailed focus on the infrastructure nodes, ensuring that all critical systems are effectively monitored for optimal operation across the IT environment.

The post Monitoring a Complex Infrastructure Environment with Zabbix appeared first on Zabbix Blog.

Monitoring Failed Jobs in NetBackup with Zabbix

Patrik Uytterhoeven — Wed, 23 Oct 2024 08:00:08 +0000

Monitoring backup solutions can be an arduous task – especially since many backup tools don’t provide APIs and simply are not easy to work with. One such solution – NetBackup – provides its own set of challenges, but fortunately we have Zabbix, with its low-level discovery (LLD) features and the possibility to leverage user parameters to extend Zabbix agent.

Table of Contents

How does LLD work ?

For those not familiar with LLD, Zabbix is able to create items, triggers, graphs, and other entities based on LLD rules. JSON is used to detect those entities by Zabbix.

https://www.zabbix.com/documentation/current/en/manual/discovery/low_level_discovery/custom_rules

If we create a script that returns this information to Zabbix, then we can automatically create items based on the received low-level discovery macros and their values. In this example from the Zabbix website, Zabbix will map {#FSNAME} to one of the detected logical volumes.

[    
{ "{#FSNAME}":"/",                           "{#FSTYPE}":"rootfs"   },
{ "{#FSNAME}":"/sys",                        "{#FSTYPE}":"sysfs"    },
{ "{#FSNAME}":"/proc",                       "{#FSTYPE}":"proc"     },
{ "{#FSNAME}":"/dev",                        "{#FSTYPE}":"devtmpfs" },
{ "{#FSNAME}":"/dev/pts",                    "{#FSTYPE}":"devpts"   },
{ "{#FSNAME}":"/lib/init/rw",                "{#FSTYPE}":"tmpfs"    },
{ "{#FSNAME}":"/dev/shm",                    "{#FSTYPE}":"tmpfs"    },
{ "{#FSNAME}":"/home",                       "{#FSTYPE}":"ext3"     },
{ "{#FSNAME}":"/tmp",                        "{#FSTYPE}":"ext3"     },
{ "{#FSNAME}":"/usr",                        "{#FSTYPE}":"ext3"     },
{ "{#FSNAME}":"/var",                        "{#FSTYPE}":"ext3"     },
{ "{#FSNAME}":"/sys/fs/fuse/connections",    "{#FSTYPE}":"fusectl"  }
]

Zabbix can automatically create items with this information. If we then create another script where we sent the values for each of the volumes, then we can return for example the free space for the “/” volume as a value and do this for all other volumes as well.

With this knowledge, we can create a solution to monitor our backups. We will further optimize this approach because we don’t want to rely on multiple scripts, such as a script that sends us a list of failed backups, another script that returns the status codes, etc. We will use the dependent item feature, which allows us to simply create one master item to collect all the values and then process them further in Zabbix.

Monitoring with Python and user parameters

To format our data in JSON, we need to extract it first from the API. For this, we can create a script with the user parameters in our Zabbix agent. The Python script we will use for this can be copied to “/etc/zabbix” or another place that is accessible by the Zabbix user on our system.

https://github.com/Trikke76/Zabbix/blob/master/Netbackup/netbackup-failed-jobs-zabbix.py

Don’t forget to adapt the script and update settings like user name, password, URL, and page limit!

# NetBackup API configuration
BASE_URL = "https://:1556/netbackup"
USERNAME = ""
PASSWORD = ""
PAGELIMIT = "100" # adapt to your needs

The page limit will limit the search to the last 100 lines

If you want you can also adapt how many days we have to look back in history standard is 7 days

# Set the time range for job retrieval (last 7 days)
end_time = datetime.utcnow()
start_time = end_time - timedelta(hours=168)

The script will collect errors in backups and the resulting output will display a list of failed backups over the last 100 jobs:

{
  "data": [
    {
      "{#JOBID}": 257086,
      "JOBTYPE": "DBBACKUP",
      "STATUSCODE": 11,
      "STATE": "DONE",
      "POLICYNAME": "NBU-Catalog",
      "CLIENTNAME": "NetBackup-server",
      "STARTTIME": "2024-07-29T12:46:34.000Z",
      "ENDTIME": "2024-07-29T12:47:53.000Z",
      "ELAPSEDTIME": "PT1M19S",
      "KILOBYTESTRANSFERRED": 0
    }
  ]
}

This data is perfect for our LLD rules in Zabbix. Once we have copied our script to the server, we have to define our Zabbix user parameter. You can download an example here:

https://github.com/Trikke76/Zabbix/blob/master/Netbackup/Userparameter-netbackup.conf

Copy this file to your Zabbix agent in the config folder, usually somewhere in:

“/etc/zabbix/zabbix_agent2.d/” or “/etc/zabbix/zabbix_agentd.d/” depending if you use Zabbix agent or Zabbix agent 2.

Don’t forgot to modify the file permissions so that only the agent can read it, and restart Zabbix agent. Also, make sure that the user parameter points are at the correct location of the Python script. The last thing we have to do now is create or import our Zabbix template:

https://github.com/Trikke76/Zabbix/blob/master/Netbackup/Templates_Netbackup.yaml

How does it work?

The first thing we have to do is create a master item that collects the data from our script.

Since the error check is executed every 15 minutes, we can use throttling pre-processing to discard duplicate data, since most of the time there will be no errors in our backups.

Also, if our script fails to connect to the API, our data collection will fail. Therefore, we can use custom on fail pre-processing and set a custom, more human-readable error message.

Now we have to create a discovery rule in Zabbix based on this data. In this discovery rule we will extract the required data and map it to custom LLD macros.

Those macros can be used later in our items. As you can see, we use .first() at the end of our JSONPATH expression – otherwise, we would get all our matching data between the [ ], as our data comes in a list. By making use of .first() we filter out all other data we don’t need.

To create our LLD items, we need to create an item prototype so that items can be generated when they are detected. Our item will be a dependent item, so it will get its data from the master item.

In our item prototype we can make use of the Zabbix LLD macros we created before. To extract the data we need, we have to add a preprocessing rule first to extract the data we want from our master item.

First line will look for the “JOBID” and will use the LLD macro we created before. Remember we used .last() ? If we had not done this our ID here would have been a list [ ] instead of just the ID number.
We also have to remove the [ ] – this we can do with trim. Since our data is returned as text we also add some JS to convert our data to an Integer. This allows us to create triggers based on the error code we have received.

Monitoring with an http item

There is another way to do the same thing in Zabbix without writing those complex python scripts. Since Zabbix 4.0 we have “HTTP agent” item type. This allows us to connect to the API and retrieve the required data from the API. Combined with LLD and dependent items this becomes a very powerful way to collect metrics.

First thing we have to do is create a master item to retrieve the data from the API. This item is of the type “HTTP agent” and we have to fill in the URL of the API endpoint. To authenticate we have to pass information like the authentication token in the headers. For this you need to create a token first in NetBackup. As you can see I used a macro {$BEARER.TOKEN} – this is so we can make it secret.

So next step is to add our secret token. Let’s create our macro in the template under the Macros section. Here we can choose to keep it hidden for everyone. An even more secure way to store sensitive information like authentication tokens would be using a secret vault.

The data we get back from our API is a bit different from what we have seen in the output of the Python script we defined previously, but not by much.

{
  "data": [
    {
      "type": "job",
      "id": "260136",
      "attributes": {
          "jobId": 260136,
          "parentJobId": 0,
          "jobType": "DBBACKUP",
          "policyType": "NBU_CATALOG",
          "policyName": "NBU-Catalog",
          "scheduleType": "DIFFERENTIAL_INCREMENTAL_BACKUP",
          "scheduleName": "-",
      …

With this knowledge and what we know from our first try with Python, we can now make a dependent discovery rule.

The same logic applies again – we need to map our data to LLD macros so that we can use them later in our LLD items and triggers.

These LLD macros can later be used in our item prototypes and triggers. We only need JOBID and STATE, but you can create some extra mappings in case you like to use the extra information later. With our JSON path we will once again extract the data from our master item.

The next step is to create the LLD item prototype. Here we can use the macros we extracted earlier.

The item is dependent on our master item, so without any pre-processing the data will be exactly the same as in our master item. Therefore, we can add some rules to get the data we need.

Here, we use the JSON path to extract the data. With our LLD macros we can extract the data dynamically for every item we have discovered. With Trim, we remove the [ ] that comes around our data.

If there are backup errors, the end result will look something like this:

The steps can look a bit abstract, so the best thing to do is to try and perform everything step-by-step and use the Test button in Zabbix to test every step before you continue.

Websites like https://jsonpath.com/ and https://jsonformatter.org/ can also be helpful to beautify your data and do some testing with your JSONPath pre-processing.

If you want to test the template, feel free to download it from my github:

https://github.com/Trikke76/Zabbix/blob/master/Netbackup/Templates_Netbackup_HTTP.yaml

In conclusion

That’s it! If you’ve set up everything correctly, you should now get a list of failed jobs collected from NetBackup. Once the failed jobs are gone, Zabbix will disable the related entities and clean them up after some time.

If you need help optimizing your Zabbix environment, or you need a support contract, some consultancy, or training, feel free to contact sales@open-future.be or visit us at https://www.open-future.be.

We are always available to help!

The post Monitoring Failed Jobs in NetBackup with Zabbix appeared first on Zabbix Blog.

Monitoring My Home Network with Zabbix

Cesar Caceres — Wed, 16 Oct 2024 11:15:40 +0000

Recently, we reached out to the members of our global community with an invitation to share their dashboards and give us a quick tour of what they do with our product. The response was so incredible that we have decided to highlight a few of the most interesting submissions here on our blog.

First up is Cesar Caceres, an independent IT consultant with nearly 10 years of experience in critical system monitoring within the banking sector. Cesar enjoys being alerted to changes within his home network so much that he composed a custom song to let him know when a new alert arrives!

My environment

My environment includes ping monitoring for multiple devices (Google Nest, Smart LEDs, Smart Lights, and TV). I also track home network devices: one personal MikroTik router and two belonging to my colleague Alejandro Velasquez, along with the temperature of these devices. Additionally, I monitor WAN consumption from my internet provider, as well as the bandwidth consumption of a connected client, my colleague, and the VPN.

I have a MikroTik and TP-Link router. When I connect the TP-Link to a port on the MikroTik, I can capture information about any devices connected to my home network. Using SNMP v2, I can then retrieve detailed information from these devices. From the WinBox console of the MikroTik router, I can navigate to IP > DHCP Server to locate the active hostnames to monitor.

In WinBox, I navigate to IP > SNMP Settings. Here, assign a community name for identification, select SNMP version 2, and enter the IP address of the MikroTik device.

Once configured, I verify from the Zabbix server that communication has been successfully established through the SNMP v2 protocol.

On the Zabbix server, I verify the host name of the device to make sure it’s visible. Since version 6.0, Zabbix includes a template specifically for the RB4011GS device, which simplifies the monitoring process.

Temperature monitoring for my location (Maracaibo, Venezuela) is integrated with OpenWeatherMap. I also monitor my phone using an agent from the Android Play Store. The template for this is available on this GitHub repository, but customization will always depend on individual needs.

The temperature of my Zabbix Server is monitored using a repository available on GitHub. It’s important to know the operating temperature of the Zabbix server.

If possible, I adjust the default parameters to suit the specific environment.

I also monitor the performance of our Zabbix server and database using the MySQL integration with the Zabbix agent, focusing on key elements like buffer usage.

I track the behavior of my portfolio (ccaceresoln.com) with web scenarios , including certificate monitoring. When querying SSL for my portfolio, I make a folder in the Zabbix server and create a script called checkssl.sh inside it. Then, I grant execution permissions chmod +x to the checkssl.sh script.

In the configuration of these items, the call will be made to the URL. Each hosting provider may automatically generate a new SSL certificate periodically. In my case, I don’t use a trigger for certificate renewal.

On the right side, there is a new widget for navigating based on alerts, which allows me to view more details about these issues.

Alerts

Alerts are delivered through WhatsApp, using a repository available on GitHub. This repository is based on the WhatsApp Web + Multi-Device API library. It’s important to ensure that the Mudslide libraries are up-to-date. Step-by-step instructions can be found in the Zabbix forums.

The assistant is based on a custom GitHub repository, customizing the language model using the Gemini 1.5 API. I chose this because it’s free to use and doesn’t require installation on the server. With the emergence of artificial intelligence, I’m hopeful that this could act as a proof of concept and an idea to help people understand how to resolve such alerts and learn from them. It’s more than just having everything in one place! Why MARIA? MARIA stands for:
M: Machine
A: Assistant
R: Reasoning
I: Intelligence
A: Artificial

Additional features

I had the idea to create a Zabbix song in order to have a sound that greets me every morning. Just a reminder that it’s a new day and Zabbix is here for alerts.
Song with sunoai:

Conclusion

Having a home network monitoring environment offers advantages such as receiving alerts about device status or specific equipment behavior even when you’re away from home. This allows for continuous supervision and proactive issue resolution.

The post Monitoring My Home Network with Zabbix appeared first on Zabbix Blog.