InfluxDB’s Checks and Notifications System

Components

Unlike less powerful and less flexible alerting systems, the InfluxDB 2.0 Checks and Notifications system comprises five components that function as building blocks that can be combined to create the precise functionality you need.

Checks in InfluxDB

Checks in InfluxDB are a type of task. A task is a Flux query that runs on a schedule or defined frequency. Flux is the functional data scripting language for InfluxDB. Flux allows you to query, transform, and analyze your time series data in almost any way you need. Checks query data and apply a status or level to each data point based on specified condition(s). The output of the check is a status which is written to the “ _monitoring” bucket. The “_monitoring” bucket is a default internal bucket.

Threshold checks in the InfluxDB UI

The easiest way to create a check is through the UI. You can create a threshold check or a deadman check through the UI. To create a threshold or deadman check, navigate to the Alerts page in the UI from the right hand navigation menu, click + Create, and select either threshold or deadman check under the Checks panel. There are two steps to creating a threshold check through the UI:

  1. Define Query
  2. Configure Check
  • _check_id: each check is assigned a check ID. Check ID’s can be used for debugging and rerunning failed checks
  • _check_name: the name of your check — or “CPU Check” from the example above
  • _level: either CRIT, WARN, OK, or INFO for UI checks (or configurable for custom checks)
  • _source_measurement: the measurement of your source data — or “cpu” from the example above
  • _type: the type of check — or “threshold” from the example above
  • custom tags added to the query output
  • columns included in the query: to view all of these columns, simply hover over a point in your query.

The Flux behind threshold checks

To view the corresponding Flux script that’s generated for your threshold check, you can use the InfluxDB v2 API or the UI. To view the corresponding Flux script for the UI-generated threshold check, look at the default “_task” system bucket.

from(bucket: "_tasks")
|> range(start: v.timeRangeStart, stop: v.timeRangeStop)
|> filter(fn: (r) => r["_field"] == "name")
from(bucket: "_tasks")
|> range(start: v.timeRangeStart, stop: v.timeRangeStop)
|> filter(fn: (r) => r["taskID"] == "0754bba32a9ae300")
|> filter(fn: (r) => r["_field"] == "logs")
curl -X GET \
https://us-west-2-1.aws.cloud2.influxdata.com/api/v2/checks/074...my_check_ID...000/query \
-H 'authorization: Token Oxxx….my_token...xxxBg==' \
package main 
import "influxdata/influxdb/monitor"
import "influxdata/influxdb/v1"
data = from(bucket: "telegraf")
|> range(start: -5s)
|> filter(fn: (r) =>(r["_measurement"] == "cpu"))
|> filter(fn: (r) =>(r["_field"] == "usage_system"))
|> filter(fn: (r) =>(r["cpu"] == "cpu-total"))
|> aggregateWindow(every: 5s, fn: mean, createEmpty: false)
option task = {name: "CPU Check", every: 5s, offset: 0s}
check = {_check_id: "074bba32a48c3000",
_check_name: "CPU Check",
_type: "threshold",
tags: {},}
crit = (r) =>r["usage_system"] > 5.0
messageFn = (r) =>("Check: ${ r._check_name } is: ${ r._level }")
data
|> v1["fieldsAsCols"]()
|> monitor["check"](data: check, messageFn: messageFn, crit: crit)

Deadman Check in the InfluxDB UI

Creating Deadman Checks through the InfluxDB UI is almost identical to creating a threshold check. However, rather than specifying thresholds, you must specify the duration of the deadman signal under the check properties in step 2) Configure Check.

The Flux behind deadman checks

To view the corresponding Flux script that’s generated for your deadman check, you can use the InfluxDB v2 API or the UI. The process for getting the corresponding Flux script for a deadman check is almost identical to that of the threshold check, as described above. Just make sure to gather the deadman check ID instead. The resulting deadman Flux script is very similar to the threshold check as well.

package main 
import "influxdata/influxdb/monitor"
import "experimental”
import "influxdata/influxdb/v1"
data = from(bucket: "telegraf")
|> range(start: -15s)
|> filter(fn: (r) => (r["_measurement"] == "cpu"))
|> filter(fn: (r) => (r["_field"] == "usage_system"))
|> filter(fn: (r) => (r["cpu"] == "cpu-total"))
option task = {name: "Deadman", every: 1m, offset: 0s}
check = { _check_id: "074bdadac4429000",
_check_name: "Deadman",
_type: "deadman",
tags: {deadman: "deadman"}}
crit = (r) => (r["dead"])
messageFn = (r) => ( "Check: ${r._check_name} is: ${r._level}")
data
|> v1["fieldsAsCols"]()
|> monitor["deadman"](t: experimental["subDuration"](from: now(), d: 5s))
|> monitor["check"](data: check, messageFn: messageFn, crit:crit)}

Create a custom check

The InfluxDB UI makes creating simple deadman and threshold checks very easy, but it falls short in its ability to highlight the sophistication of the check system within InfluxDB. Flux allows you to transform your data in almost any way you see fit and define custom conditions. Writing a custom check enables you to incorporate any of those Flux transformations and custom conditions in your check. The steps to creating a custom check are similar to the check creation steps in UI. First, query for your data and assign that query in a variable. Assigning a variable to your initial queries isn’t required, just recommended. Next, implement the levels of your check. As described in the section above, The Flux behind threshold checks, you can assign custom conditions by changing the function definitions for the levels: crit = (r) => r["usage_system"] > 5.0 and r["usage_user"] > 5.0.

solarBatteryData = from(bucket: "solar")
|> range(start: -task.every)
|> filter(fn: (r) => r["_measurement"] == "battery")
|> filter(fn: (r) => r["_field"] == "kWh")
|> derivative(unit: 3s, nonNegative: false, columns: ["_value"], timeColumn: "_time")
option task = {name: "CPU Check", every: 5s, offset: 0s}
check = {_check_id: "0000000000000001", //alphanumeric, 16 characters
_check_name: "CPU Check",
_type: "threshold",
tags: {},}
ok = (r) =>r["_value"] > 0.0
crit = (r) =>r["_value"] =< 0.0
messageFn = (r) => ("Check: ${r._check_name} is: ${if r._level == "crit" then "DH" else "CH"}")
solarBatteryData
|> v1["fieldsAsCols"]()
|> monitor["check"](data: check, messageFn: messageFn, crit: crit, ok:ok)

Common problem with InfluxDB checks

Frequently, users will create a check only to notice that a status isn’t being written to the “_monitoring” bucket. They also notice that the check is failing to write a status even though the underlying task is running successfully. This behavior is most likely caused by a read/write conflict. This hurdle is probably the most common gotcha with InfluxDB checks (and notifications), but the solution is simple. Adding an offset to your check will most likely resolve this read/write conflict.

Conclusion on InfluxDB checks

If there’s one takeaway that the reader should gather from this document, it’s that all checks, notification rules, and tasks are just Flux that’s executed on a schedule. If you’re concerned about performance concerns between checks and tasks — don’t be. A check is just a task underneath the hood.

Statuses

A status is a type of time series data that contains the results of a check. Statuses are written to the “_monitoring” bucket after a check has successfully run. As previously described in the Threshold Checks and Create a Custom Check sections above, a status contains a combination of data from the source bucket and information about the results of the check. The schema of a status is as follows:

  • _time: the time at which the check was executed.
  • _check_id: each check is assigned a check ID. Check ID’s can be used for debugging and rerunning failed checks — a tag.
  • _check_name: the name of your check (“CPU Check” for example) — a tag.
  • _level: either “CRIT”, “WARN”, “OK”, or “INFO” for UI checks (or configurable for custom checks) — tag
  • _source_measurement: the measurement of your source data (“cpu” for example) — a tag.
  • _measurement: the measurement of the status data (“status”) — a measurement.
  • _type: the type of check (“threshold” for example) — a tag.
  • _field: the field keys: _message , _source_timestamp , and ${your source field} .
  • _message: the status message.
  • _source_timestamp: the timestamp of the source field that is being checked in ns precision
  • ${your source field}: the source field that is being checked.
  • dead: an additional field specific to deadman checks. A boolean that represents whether the deadman signal has met the deadman check conditions.
  • custom tags added to the query output during check configuration.
  • any additional tags included in the query, resulting from the v1.fieldsAsCols function. To view all of these tags, simply hover over a point in your query in the Data Explorer.

Viewing statuses in the InfluxDB UI

There are two places to view statuses in the InfluxDB UI: through the Alerts page and through the Data Explorer. To view the statuses through the Alerts page, click View History under the eye icon in the Checks panel.

Striking a balance between tasks and checks

At this point in the document, hopefully you recognize that checks are just a task under the hood, but also that checks have been isolated in the InfluxDB API. You might also wonder why finding underlying tasks in the UI is cumbersome and why that obscurification exists. This decision was made to try and encourage users to execute transformation work in a separate task and discourage them from stuffing transformation and analytics work into a check. In other words, try to keep your custom tasks simple.

Notification rules in InfluxDB

Notification rules in InfluxDB are a type of task, similar to a check. Notification rules query the status of a check from the “_monitoring” bucket, apply the notification rule to the status, log notification to the “_monitoring” bucket, and send a notification message to a notification endpoint. A notification is a type of time series data. Like statuses, notifications are written to the “_monitoring” bucket and are the output of a Flux task. While statuses are the output of a check, notifications are the output of a notification rule. A notification endpoint is metadata in a notification rule that enables notification messages to be sent to a third party service like HTTP, Slack, or PagerDuty. To summarize, it might be helpful to think of notification rules as being synonymous with alerts. However, these tasks are referred to as notification rules in the InfluxDB API and InfluxDB UI.

Notification endpoints in the InfluxDB UI

The first step in creating a notification rule in the InfluxDB UI is to configure a notification endpoint. A notification endpoint is the third party service that you want to send your notification message to. The easiest way to configure a notification endpoint for your notification rule is through the InfluxDB UI. To create a notification endpoint, navigate to the Alerts page in the UI from the right-hand navigation menu, and click + Create. There are two steps to a notification endpoint through the UI:

  1. Specify your Destination
  2. Specify the Options

Notification rules in the InfluxDB UI

The second step in creating a notification rule in the InfluxDB UI is to configure the notification rule conditions. These user-defined conditions describe which statuses should be converted to a notification and similarly which notification messages should be sent to the notification endpoint.

  1. Include About information.
  2. Specify the notification rule Conditions.
  3. Configure the notification Message options.
Notification Rule: ${ r._notification_rule_name } triggered by check: ${ r._check_name }: ${ r._message }

The Flux behind notification rules

To view the corresponding Flux script that’s generated, you can use the InfluxDB v2 API or the UI. To view the corresponding Flux script for the UI-generated notification rule, please follow the instructions under The Flux behind threshold checks section, as the process is identical.

curl -X GET \
https://us-west-2-1.aws.cloud2.influxdata.com/api/v2/notificationRules/075...my_cnotification_rule_id...000/query \
-H 'authorization: Token Oxxx….my_token...xxxBg==' \
package main
//CPU Notification Rule
import "influxdata/influxdb/monitor"
import "slack"
import "influxdata/influxdb/secrets"
import "experimental"
option task = {name: "CPU Notification Rule ",
every: 10m,
offset: 0s}
slack_endpoint = slack["endpoint"](url: "https:hooks.slack.com/services/xxx/xxx/xxx")
notification = {_notification_rule_id: "0758afea09061000",
_notification_rule_name: "CPU Notification Rule ",
_notification_endpoint_id: "0754bad181a5b000",
_notification_endpoint_name: "My slack",}
statuses = monitor["from"](start: -10s, fn: (r) = r["_check_name"] == "CPU Check")
crit = statuses
|> filter(fn: (r) => r["_level"] == "crit")
all_statuses = crit
|> filter(fn: (r) => (r["_time"] >= experimental["subDuration"](from: now(), d: 10m)))
all_statuses
|> monitor["notify"](data: notification,
endpoint: slack_endpoint(mapFn: (r) = (
{channel: "",
text: "Notification Rule: ${ r._notification_rule_name } triggered by check: ${ r._check_name }: ${ r._message }",
color: if r["_level"] == "crit" then "danger" else if r["_level"] == "warn" then "warning" else "good"})))
  • The message function sends a single message to the destination.
  • The endpoint function is a factory function that outputs another function.

Creating a custom notification rule

The InfluxDB UI makes creating simple notification rules very easy, but it falls short in its ability to highlight the sophistication of the notification system within InfluxDB. Flux allows you to transform your data in almost any way you see fit. The steps to create a custom notification rule are similar to the steps as defined by the UI. This same logic can be applied towards creating your own custom notification rule. While the UI enables you to configure notification endpoints for HTTP, Slack, and PagerDuty, Flux has many more notification endpoint packages to take advantage of.

package main
//CPU Notification Rule for Telegram
import "influxdata/influxdb/monitor"
// import the correct package
import "contrib/sranka/telegram"
import "influxdata/influxdb/secrets"
import "experimental"
option task = {name: "CPU Notification Rule for Telegram ",
every: 10m,
offset: 0s}
telegram_endpoint = telegram[“endpoint”](
url: "https://api.telegram.org/bot",
token: "S3crEtTel3gRamT0k3n",
)
notification = {_notification_rule_id: "0000000000000002", //alphanumeric, 16 characters
_notification_rule_name: "CPU Notification Rule for Telegram",
_notification_endpoint_id: "0000000000000002", //alphanumeric, 16 characters
_notification_endpoint_name: "My slack",}
statuses = monitor["from"](start: -10s, fn: (r) = r["_check_name"] == "CPU Check")
critOrWarn = statuses
|> filter(fn: (r) => r["_level"] == "crit" or r["_level"] == "warn") //altered to include two levels
all_statuses = critOrWarn
|> filter(fn: (r) => (r["_time"] >= experimental["subDuration"](from: now(), d: 10m)))
all_statuses
|> monitor["notify"](data: notification,
endpoint: slack_endpoint(mapFn: (r) = (
{channel: "",
text: "Notification Rule: ${ r._notification_rule_name } triggered by check: ${ r._check_name }: ${ r._message }",
color: if r["_level"] == "crit" then "danger" else if r["_level"] == "warn" then "warning" else "good"})))
  • monitor.stateChanges(): allows you to detect changes in the “_level” column from one level to another specific level.
  • monitor.stateChangesOnly(): allows you to detect all changes in the “_level” column from any level to any other level.

Notifications

A notification is the time series data that is the output of a notification rule. It’s helpful to recognize the parallel between statuses and notifications. Statuses are to checks as notifications are to notification rules. It follows that a notification closely matches a status and contains the following schema:

  • _time: the time that the notification rule was executed.
  • _check_id: each check is assigned a check ID. Check ID’s can be used for debugging and rerunning failed checks — a tag.
  • _check_name: the name of your check (“CPU Check” for example) — a tag.
  • _notification_rule_id: each notification rule is assigned a notification rule ID. Notification rule ID’s can be used for debugging and rerunning failed notification rules — a tag.
  • _notification_rule_name: the name of your notification rule (“CPU Check Notification Rule) for example — a tag.
  • _notification_endpoint_id: each notification endpoint is assigned a notification endpoint ID. Notification endpoint ID’s can be used for debugging and rerunning failed notification rules — a tag.
  • _source_measurement: the measurement of your source data (“cpu” for example) — a tag.
  • _measurement: the measurement of the notification rule data (“notifications”) — a measurement.
  • _sent: a boolean that represents whether the notification message has been sent to the notification endpoint — a tag
  • _type: the type of check that your notification rule is applied towards (“threshold” for example) — a tag.
  • _level: the level of the corresponding check (“CRIT” for example) — a tag.
  • _field: the field keys: _message , _status_timestamp , _source_timestamp , and ${your source field} .
  • _message: the notification message.
  • _status_timestamp: the timestamp of the status assigned to your source field from the checked in ns precision.
  • _source_timestamp: the timestamp of the source field that is being checked in ns precision.
  • ${your source field}: the source field value that is being checked.
  • custom tags added to the query output during notification configuration.
  • any additional columns included in the query, resulting from the v1.fieldsAsCols function from the corresponding check.

Further reading and resources on the InfluxDB checks and notifications system

While this post aims to provide a comprehensive overview of the capabilities of the checks and notifications system in InfluxDB, the following resource might also interest you:

  1. TL;DR InfluxDB Tech Tips — Monitoring Tasks and Finding the Source of Runaway Cardinality: This post describes the advantages of applying the Operational Monitoring Template. This template can help you ensure that your tasks are running successfully. For example, imagine that you’ve included custom tags as a part of your UI-generated check and those tags are included in your status message. If the underlying task responsible for adding those tags fails, then your check will fail. Applying the Operational Monitoring Template can help you discover root causes for custom task or check failure.
  2. TL;DR InfluxDB Tech Tips — Using Tasks and Checks for Monitoring with InfluxDB: This post describes the basics of creating UI-generated checks as well as their limitations in a little more detail, in case you need it.
  3. TL;DR InfluxDB Tech Tips — How to Monitor States with InfluxDB: This post describes some of the intricacies and functions of the Flux InfluxDB Monitoring Package — a necessary tool in any developer’s toolkit looking to execute meaningful tasks with InfluxDB.
  4. TL;DR InfluxDB Tech Tips: Configuring a Slack Notification with InfluxDB: This post demonstrates how to create a Slack Notification through the UI. Specifically, how to enable Slack integrations and gather a Slack Webhook URL.
  5. Custom Checks: Another custom check example from the docs. Use this custom check as a template to help you craft a custom check with Flux.
  6. Telegram Flux Package: This tutorial demonstrates how to set up a Telegram bot and use the Telegram Flux package for a custom notification rule.
  7. Contributing Third Party Flux Packages: A Discord Endpoint Flux Function: This post describes how to contribute your own third party Flux package. Specifically, it describes how to contribute a Discord Endpoint package. If a Flux notification endpoint package that you need isn’t available, consider contributing one yourself!

Conclusion on the InfluxDB Checks and Notifications system

InfluxDB’s Checks and Notifications system is highly customizable and enables users to take action on their time series data. All checks and notifications are Flux tasks under the hood. The main difference between checks and notifications and other tasks is that checks and notifications read and write data to and from the “_monitoring” bucket. They also use specialized Flux packages like the Flux InfluxDB Monitor Package and notification endpoint packages.

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Anais Dotis

Anais Dotis

Developer Advocate at InfluxData