InfluxDB’s Checks and Notifications System

Components

Checks in InfluxDB

Threshold checks in the InfluxDB UI

  1. Define Query
  2. Configure Check
  • _check_id: each check is assigned a check ID. Check ID’s can be used for debugging and rerunning failed checks
  • _check_name: the name of your check — or “CPU Check” from the example above
  • _level: either CRIT, WARN, OK, or INFO for UI checks (or configurable for custom checks)
  • _source_measurement: the measurement of your source data — or “cpu” from the example above
  • _type: the type of check — or “threshold” from the example above
  • custom tags added to the query output
  • columns included in the query: to view all of these columns, simply hover over a point in your query.

The Flux behind threshold checks

from(bucket: "_tasks")
|> range(start: v.timeRangeStart, stop: v.timeRangeStop)
|> filter(fn: (r) => r["_field"] == "name")
from(bucket: "_tasks")
|> range(start: v.timeRangeStart, stop: v.timeRangeStop)
|> filter(fn: (r) => r["taskID"] == "0754bba32a9ae300")
|> filter(fn: (r) => r["_field"] == "logs")
curl -X GET \
https://us-west-2-1.aws.cloud2.influxdata.com/api/v2/checks/074...my_check_ID...000/query \
-H 'authorization: Token Oxxx….my_token...xxxBg==' \
package main 
import "influxdata/influxdb/monitor"
import "influxdata/influxdb/v1"
data = from(bucket: "telegraf")
|> range(start: -5s)
|> filter(fn: (r) =>(r["_measurement"] == "cpu"))
|> filter(fn: (r) =>(r["_field"] == "usage_system"))
|> filter(fn: (r) =>(r["cpu"] == "cpu-total"))
|> aggregateWindow(every: 5s, fn: mean, createEmpty: false)
option task = {name: "CPU Check", every: 5s, offset: 0s}
check = {_check_id: "074bba32a48c3000",
_check_name: "CPU Check",
_type: "threshold",
tags: {},}
crit = (r) =>r["usage_system"] > 5.0
messageFn = (r) =>("Check: ${ r._check_name } is: ${ r._level }")
data
|> v1["fieldsAsCols"]()
|> monitor["check"](data: check, messageFn: messageFn, crit: crit)

Deadman Check in the InfluxDB UI

The Flux behind deadman checks

package main 
import "influxdata/influxdb/monitor"
import "experimental”
import "influxdata/influxdb/v1"
data = from(bucket: "telegraf")
|> range(start: -15s)
|> filter(fn: (r) => (r["_measurement"] == "cpu"))
|> filter(fn: (r) => (r["_field"] == "usage_system"))
|> filter(fn: (r) => (r["cpu"] == "cpu-total"))
option task = {name: "Deadman", every: 1m, offset: 0s}
check = { _check_id: "074bdadac4429000",
_check_name: "Deadman",
_type: "deadman",
tags: {deadman: "deadman"}}
crit = (r) => (r["dead"])
messageFn = (r) => ( "Check: ${r._check_name} is: ${r._level}")
data
|> v1["fieldsAsCols"]()
|> monitor["deadman"](t: experimental["subDuration"](from: now(), d: 5s))
|> monitor["check"](data: check, messageFn: messageFn, crit:crit)}

Create a custom check

solarBatteryData = from(bucket: "solar")
|> range(start: -task.every)
|> filter(fn: (r) => r["_measurement"] == "battery")
|> filter(fn: (r) => r["_field"] == "kWh")
|> derivative(unit: 3s, nonNegative: false, columns: ["_value"], timeColumn: "_time")
option task = {name: "CPU Check", every: 5s, offset: 0s}
check = {_check_id: "0000000000000001", //alphanumeric, 16 characters
_check_name: "CPU Check",
_type: "threshold",
tags: {},}
ok = (r) =>r["_value"] > 0.0
crit = (r) =>r["_value"] =< 0.0
messageFn = (r) => ("Check: ${r._check_name} is: ${if r._level == "crit" then "DH" else "CH"}")
solarBatteryData
|> v1["fieldsAsCols"]()
|> monitor["check"](data: check, messageFn: messageFn, crit: crit, ok:ok)

Common problem with InfluxDB checks

Conclusion on InfluxDB checks

Statuses

  • _time: the time at which the check was executed.
  • _check_id: each check is assigned a check ID. Check ID’s can be used for debugging and rerunning failed checks — a tag.
  • _check_name: the name of your check (“CPU Check” for example) — a tag.
  • _level: either “CRIT”, “WARN”, “OK”, or “INFO” for UI checks (or configurable for custom checks) — tag
  • _source_measurement: the measurement of your source data (“cpu” for example) — a tag.
  • _measurement: the measurement of the status data (“status”) — a measurement.
  • _type: the type of check (“threshold” for example) — a tag.
  • _field: the field keys: _message , _source_timestamp , and ${your source field} .
  • _message: the status message.
  • _source_timestamp: the timestamp of the source field that is being checked in ns precision
  • ${your source field}: the source field that is being checked.
  • dead: an additional field specific to deadman checks. A boolean that represents whether the deadman signal has met the deadman check conditions.
  • custom tags added to the query output during check configuration.
  • any additional tags included in the query, resulting from the v1.fieldsAsCols function. To view all of these tags, simply hover over a point in your query in the Data Explorer.

Viewing statuses in the InfluxDB UI

Striking a balance between tasks and checks

Notification rules in InfluxDB

Notification endpoints in the InfluxDB UI

  1. Specify your Destination
  2. Specify the Options

Notification rules in the InfluxDB UI

  1. Include About information.
  2. Specify the notification rule Conditions.
  3. Configure the notification Message options.
Notification Rule: ${ r._notification_rule_name } triggered by check: ${ r._check_name }: ${ r._message }

The Flux behind notification rules

curl -X GET \
https://us-west-2-1.aws.cloud2.influxdata.com/api/v2/notificationRules/075...my_cnotification_rule_id...000/query \
-H 'authorization: Token Oxxx….my_token...xxxBg==' \
package main
//CPU Notification Rule
import "influxdata/influxdb/monitor"
import "slack"
import "influxdata/influxdb/secrets"
import "experimental"
option task = {name: "CPU Notification Rule ",
every: 10m,
offset: 0s}
slack_endpoint = slack["endpoint"](url: "https:hooks.slack.com/services/xxx/xxx/xxx")
notification = {_notification_rule_id: "0758afea09061000",
_notification_rule_name: "CPU Notification Rule ",
_notification_endpoint_id: "0754bad181a5b000",
_notification_endpoint_name: "My slack",}
statuses = monitor["from"](start: -10s, fn: (r) = r["_check_name"] == "CPU Check")
crit = statuses
|> filter(fn: (r) => r["_level"] == "crit")
all_statuses = crit
|> filter(fn: (r) => (r["_time"] >= experimental["subDuration"](from: now(), d: 10m)))
all_statuses
|> monitor["notify"](data: notification,
endpoint: slack_endpoint(mapFn: (r) = (
{channel: "",
text: "Notification Rule: ${ r._notification_rule_name } triggered by check: ${ r._check_name }: ${ r._message }",
color: if r["_level"] == "crit" then "danger" else if r["_level"] == "warn" then "warning" else "good"})))
  • The message function sends a single message to the destination.
  • The endpoint function is a factory function that outputs another function.

Creating a custom notification rule

package main
//CPU Notification Rule for Telegram
import "influxdata/influxdb/monitor"
// import the correct package
import "contrib/sranka/telegram"
import "influxdata/influxdb/secrets"
import "experimental"
option task = {name: "CPU Notification Rule for Telegram ",
every: 10m,
offset: 0s}
telegram_endpoint = telegram[“endpoint”](
url: "https://api.telegram.org/bot",
token: "S3crEtTel3gRamT0k3n",
)
notification = {_notification_rule_id: "0000000000000002", //alphanumeric, 16 characters
_notification_rule_name: "CPU Notification Rule for Telegram",
_notification_endpoint_id: "0000000000000002", //alphanumeric, 16 characters
_notification_endpoint_name: "My slack",}
statuses = monitor["from"](start: -10s, fn: (r) = r["_check_name"] == "CPU Check")
critOrWarn = statuses
|> filter(fn: (r) => r["_level"] == "crit" or r["_level"] == "warn") //altered to include two levels
all_statuses = critOrWarn
|> filter(fn: (r) => (r["_time"] >= experimental["subDuration"](from: now(), d: 10m)))
all_statuses
|> monitor["notify"](data: notification,
endpoint: slack_endpoint(mapFn: (r) = (
{channel: "",
text: "Notification Rule: ${ r._notification_rule_name } triggered by check: ${ r._check_name }: ${ r._message }",
color: if r["_level"] == "crit" then "danger" else if r["_level"] == "warn" then "warning" else "good"})))
  • monitor.stateChanges(): allows you to detect changes in the “_level” column from one level to another specific level.
  • monitor.stateChangesOnly(): allows you to detect all changes in the “_level” column from any level to any other level.

Notifications

  • _time: the time that the notification rule was executed.
  • _check_id: each check is assigned a check ID. Check ID’s can be used for debugging and rerunning failed checks — a tag.
  • _check_name: the name of your check (“CPU Check” for example) — a tag.
  • _notification_rule_id: each notification rule is assigned a notification rule ID. Notification rule ID’s can be used for debugging and rerunning failed notification rules — a tag.
  • _notification_rule_name: the name of your notification rule (“CPU Check Notification Rule) for example — a tag.
  • _notification_endpoint_id: each notification endpoint is assigned a notification endpoint ID. Notification endpoint ID’s can be used for debugging and rerunning failed notification rules — a tag.
  • _source_measurement: the measurement of your source data (“cpu” for example) — a tag.
  • _measurement: the measurement of the notification rule data (“notifications”) — a measurement.
  • _sent: a boolean that represents whether the notification message has been sent to the notification endpoint — a tag
  • _type: the type of check that your notification rule is applied towards (“threshold” for example) — a tag.
  • _level: the level of the corresponding check (“CRIT” for example) — a tag.
  • _field: the field keys: _message , _status_timestamp , _source_timestamp , and ${your source field} .
  • _message: the notification message.
  • _status_timestamp: the timestamp of the status assigned to your source field from the checked in ns precision.
  • _source_timestamp: the timestamp of the source field that is being checked in ns precision.
  • ${your source field}: the source field value that is being checked.
  • custom tags added to the query output during notification configuration.
  • any additional columns included in the query, resulting from the v1.fieldsAsCols function from the corresponding check.

Further reading and resources on the InfluxDB checks and notifications system

  1. TL;DR InfluxDB Tech Tips — Monitoring Tasks and Finding the Source of Runaway Cardinality: This post describes the advantages of applying the Operational Monitoring Template. This template can help you ensure that your tasks are running successfully. For example, imagine that you’ve included custom tags as a part of your UI-generated check and those tags are included in your status message. If the underlying task responsible for adding those tags fails, then your check will fail. Applying the Operational Monitoring Template can help you discover root causes for custom task or check failure.
  2. TL;DR InfluxDB Tech Tips — Using Tasks and Checks for Monitoring with InfluxDB: This post describes the basics of creating UI-generated checks as well as their limitations in a little more detail, in case you need it.
  3. TL;DR InfluxDB Tech Tips — How to Monitor States with InfluxDB: This post describes some of the intricacies and functions of the Flux InfluxDB Monitoring Package — a necessary tool in any developer’s toolkit looking to execute meaningful tasks with InfluxDB.
  4. TL;DR InfluxDB Tech Tips: Configuring a Slack Notification with InfluxDB: This post demonstrates how to create a Slack Notification through the UI. Specifically, how to enable Slack integrations and gather a Slack Webhook URL.
  5. Custom Checks: Another custom check example from the docs. Use this custom check as a template to help you craft a custom check with Flux.
  6. Telegram Flux Package: This tutorial demonstrates how to set up a Telegram bot and use the Telegram Flux package for a custom notification rule.
  7. Contributing Third Party Flux Packages: A Discord Endpoint Flux Function: This post describes how to contribute your own third party Flux package. Specifically, it describes how to contribute a Discord Endpoint package. If a Flux notification endpoint package that you need isn’t available, consider contributing one yourself!

Conclusion on the InfluxDB Checks and Notifications system

--

--

--

Developer Advocate at InfluxData

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

System Design Netflix — A Complete Architecture

Looking into UniKernals

Is going serverless worth it? Pros and Cons of Server-Less

Only a few days left to submit for the Flutter Puzzle Hack + announcing the judges

Ego Check: Tech for Tech’s Sake

Stop using Feature Branches…

How Google come up with the Site Reliability Engineering(SRE) role?

The Perfection Game

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Anais Dotis

Anais Dotis

Developer Advocate at InfluxData

More from Medium

Magalix + Weaveworks: Forging the Path of Secure GitOps Workflows

Backstage Plugins by Example: Part 2

A note about Gatsby Cloud Preview’s internals

About Kubernetes architecture (2)