System Monitoring with Monit and Monitorix

Background

Previously, an advanced monitoring scheme using Grafana, Influxdb and Telegraf was used at the Ma_Sys.ma: dashboards_with_docker(37). As of 04/2021, this ceased to work due to errors in the Influxdb database: It would fail to start and reject all attempts to send data with 401 Unauthorized errors.

The issue might have been caused by submitting a lot of metrics while creating the backup tool comparison backup_tests_borg_bupstash_kopia(37), but the actual reasons can only be speculated about.

The advanced monitoring scheme had been under testing for some time, but was still lacking some checks like MDADM array health. Given the failure of the Influxdb, three ideas came to mind:

Hence, the search for an alternative monitoring program began.

A Variety of less complicated Monitoring Programs for Linux

This section collects all the canditates considered for monitoring that were not used for the purpose in the end. It serves as a reference should there be need to reconstruct the monitoring again :)

Atop

Links: https://atoptool.nl/, Debian package atop

Atop screenshot with 100% CPU load caused by a Java process

Atop screenshot with 100% CPU load caused by a Java process

Atop is one of my favourite live monitoring tools for console usage on Linux. Atop displays the activity on a system in a seemingly intimidating text-based fashion squeezing as much information on the screen as possible.

Here are the reaons for this being my favorite:

Atop collects most important performance metrics and provides a means to view their history. There are still some reasons for why Atop is not the (only) monitoring tool of choice for unattended systems:

Here are some references about how one might boost Atop for monitoring purposes:

To conclude: Although I know this tool, although I like it very much, I did not set it up for monitoring this time as it has some rough edges for that use case.

Glances

Links: https://github.com/nicolargo/glances/, Debian package glances

Glances screenschot with 100% CPU load and small window

Glances screenschot with 100% CPU load and small window

Glances serves a similar purpose to Atop but is aimed at larger screen spaces. It does not monitor I/O as good as atop, but adds information on sensor readings and filesystem usages if enough screen space is available. Glances does not (by itself) offer a means to store and retrieve historic data.

In addition to its console-based live monitoring, Glances can send metrics to time-series databases, produce alerts and display its output on a web interface. This makes it a candidate for use in unattended monitoring scenarios, but the necessity of a time-series database for historic data would again complicate things too much.

Netdata

Links: https://github.com/netdata/netdata, Debian package netdata

Netdata screenshot

Netdata screenshot

Netdata graphs a huge number of metrics in a web interface (under port 19999). Similar to glances, Netdata can send metrics to an external server. Similar to Grafana, it allows displaying the metrics as graphs over time or gauges for showing the current value.

Some notes about Netdata’s features with links to the respective documentation:

To summarize: Although it seems a little odd to present all of the system’s health status as time-value-plots, Netdata can monitor just about everything. It is a viable alternative to the Monit+Monitorix combination explained further below.

phpSysInfo

Links: https://github.com/phpsysinfo/phpsysinfo, Debian package phpsysinfo

Screenshot of phpSysInfo running in a Docker container

Screenshot of phpSysInfo running in a Docker container

phpSysInfo displays the system status in a tabular fashion. It focuses on the machine’s properties combined with memory and file system usage. While it gathers some information on load, phpSysInfo does not provide any diagrams with historical data. Instead, it focuses on displaying the current system health state.

phpSysInfo could be used as an alternative to Monit for the Ma_Sys.ma monitoring purposes. Given that it runs entirely in PHP, configuring phpSysInfo to have access to the system metrics as well as running securely on the web server seems to be a little more complicated than with the other tools.

Other Options

The following is a list of tools that were considered, but not tested extensively for one reason or another:

Combining Monit and Monitorix for a comprehensive and simple System Monitoring

As no single tool seemed to fit all the needs, two good ones were combined:

Monit

Links: https://mmonit.com/monit/, Debian package monit

Monit running on the Ma_Sys.ma main system

Monit running on the Ma_Sys.ma main system

Monit creates a status page that quickly allows one to identify if anything is wrong with the running system. If the system is working normally, all items display a green “OK”.

It does not log historic information and does not visualize them graphically. Monit does, however, allow for presenting the currrent status of textual information which is used by a series of custom scripts descripted in the following.

The complete configuration and scripts can be found in the repository. See file masysma-default.

System Resources

Monit checks that system resources are not constantly overloaded. This is especially important for disk space, but also for CPU load on small systems whose cooling might not be up to constant 100% CPU loads.

Services

Monit ensures that important services are running. The choice of services was hand-crafted to make sure only the maningful ones are included. It mostly only checks that the services are running and that some of their important files exist with correct permissions. The associated monitoring instructions have been derived from the examples supplied by Debian.

RAID

Monit runs a custom script to check the RAID status. It is “dead simple” and needs only two lines of code:

#!/bin/she -eu
# see statusscripts/mdadm.sh
cat /proc/mdstat
! grep -qF _ /proc/mdstat

Here, cat ensures the web interface will show the whole /proc/mdstat and grep -qF _ finds any failed drives listed in /proc/mdstat.

ZFS

Similar to RAID, ZFS’ status is also monitored by a custom script:

#!/bin/sh -eu
# see statusscripts/zpool.sh
zpoolstatus="$(LANG=en_US.UTF-8 zpool status)"
printf '%s\n' "$zpoolstatus"
printf '%s\n' "$zpoolstatus" | { ! grep -qE "(DEGRADED|UNAVAIL)"; }

Here, the first printf is to make the data visible in the web interface. The second one scans for any signs of failed drives.

Docker

A custom script for docker checks that there are no “unhealthy” services and adds the output of docker ps to the web interface.

#!/bin/sh -eu
# see statusscripts/docker.sh
dockerinfo="$(LANG=en_US.UTF-8 docker ps)"
printf '%s\n' "$dockerinfo"
printf '%s\n' "$dockerinfo" | { ! grep -qF "unhealthy"; }

S.M.A.R.T.

A more complicated script monitors the drive health as reported by the hardware. Its basic form is as follows:

#!/bin/sh -eu
# see statusscripts/smartctl.sh
smartoutput="$(LANG=en_US.UTF-8 smartctl -H -A "$@")"
printf '%s\n' "$smartoutput"
printf '%s\n' "$smartoutput" | grep -qF ": PASSED"

Again, its the same pattern where the output is printed and the return code indicates the success of the scripts execution. However, if you look a the actual implementation in statusscripts/smartctl.sh you may notice there is a quirk to save the output of smartctl to a file. This is intended to be used in conjunction with MDVL package mdvl-infobg to display the S.M.A.R.T. attribute names.

Notifications

On non-interactive systems, Monit should be configured to send e-mail reports in case of detected failures. This can be enabled by providing a file like e.g. masysma-16-email under /etc/monit/conf.d:

set mailserver smtp.example.com port 587
    username "machine@example.com" password "123456"
    using tlsv13
    with timeout 30 seconds

set mail-format {
    from: machine@example.com
}

set alert admin@exmaple.com

This settings configure the sender to be machine@example.com (whose mail password is 123456) and the receiver to be admin@example.com.

Monitorix

Links: https://www.monitorix.org/, Debian package monitorix

Monitorix running on the Ma_Sys.ma main system, inter-day graphs for System Load Average

Monitorix running on the Ma_Sys.ma main system, inter-day graphs for System Load Average

Monitorix is responsible for creating graphs and showing historic data. It does not display any text information, but provides a large number of graphs for different time intervals: day, week, month and year.

The Ma_Sys.ma configuration can be found as 70-masysma-default.conf in the repsitory. It keeps a lot of graphs disabled by default to make sure that all of the graphs provided by default are meaningful. If more graphs are needed, they can be enabled by providing another file like e.g. 80-masysma-18.conf under /etc/monitorix/conf.d. The values from the second file will then override the defaults from 70-masysma-default.conf.

A very nice property of Monitorix’ approach to the graphis is their simplicity:

Conclusion

So far, the new monitoring scheme has flawlessly been active for a month. The only issue that monit reported was a high load on a non-interactively used system during the transfer of a large backup.

Both tools: Monit and Monitorix seem to be capable of running un-attendedly for long periods of time and that is their important advantage over more powerful and more complex systems. Also, while it required a lot of tweaking and some auxiliary scripts, the configuration is easily versioned as small text file (snippets) in repositories.

Future Directions

It would be really nice if a check for the system’s patch level could be inlcuded in Monit, i.e.: Are the most recent security updates installed?

Neither Monit nor Monitorix incorporate information from the various system logs. logwatch is used for this on non-interactive systems, but it might be interesting to explore, if some of the important functions of logwatch could be integrated by means of custom Monit scripts?

Short note on getting to run Cacti

For Ma_Sys.ma monitoring purposes, Cacti was found to be too complex. Its installation procedure was tried and it roughtly consists of the following steps and components.

MySQL

CREATE DATABASE <database>;
USE <database>;
CREATE USER <user> IDENITFIED BY <password>;
GRANT ALL PRIVILEGES ON *.* TO <user>;
GRANT SELECT ON mysql.time_zone_name TO <user>;
mysql -u root -p cacti < /usr/share/doc/cacti/cacti.sql
mysql_tzinfo_to_sql /usr/share/zoneinfo/ | mysql -u root mysql

PHP

provide DB connection in config.php
/etc/php/7.3/cgi/php.ini date.timezone=Europe/Berlin
/etc/php/7.3/cli/php.ini date.timezone=Europe/Berlin

GUI

admin/admin -> admin/_0FAIL0_
Theme: Sunrise

OS

# Temporary, revert after installation?
chown www-data:www-data -R /usr/share/cacti

Ma_Sys.ma Website 5 (1.0.0) – no Flash, no JavaScript, no Webfont, no Copy Protection, no Mobile First. No bullshit. No GUI needed. Works with any browser.

Created: 2021/05/03 22:43:10 | Revised: 2021/06/14 01:44:58 | Tags: monitoring, linux | Version: 1.0.1 | SRC (Pandoc MD) | GPL

Copyright (c) 2021 Ma_Sys.ma. For further info send an e-mail to Ma_Sys.ma@web.de.

This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License along with this program. If not, see <http://www.gnu.org/licenses/>.