Member-only story

BIRCH for Anomaly Detection with InfluxDB

5 min readJul 20, 2020

In this tutorial, we’ll use the BIRCH (balanced iterative reducing and clustering using hierarchies) algorithm from scikit-learn with the ADTK (Anomaly Detection Tool Kit) package to detect anomalous CPU behavior. We’ll use the InfluxDB 2.0 Python Client to query our data in InfluxDB 2.0 and return it as a Pandas DataFrame.

This tutorial assumes that you have InfluxDB and Telegraf installed and configured on your local machine to gather CPU stats. To easily gather system stats on your local machine, install InfluxDB and automatically configure Telegraf to add the System plugin.

We recommend running the code in this blog inside a virtual environment with Python 3.6+. The requirements.txt for this project looks like:

adtk==0.6.2
pandas==0.23.4
sklearn==0.23.1

A brief explanation of BIRCH

BIRCH (balanced iterative reducing and clustering using hierarchies) is an unsupervised clustering algorithm optimized for high performance on large datasets. It’s also good at reducing noise in the dataset to find meaningful patterns and produce accurate models. It’s similar to the more popular k-means clustering algorithm.

An introduction to ADTK and scikit-learn

ADTK (Anomaly Detection Tool Kit) is a Python package for unsupervised anomaly detection for time series data. According to the documentation, “This package offers a set of common detectors…

BIRCH for Anomaly Detection with InfluxDB

A brief explanation of BIRCH

An introduction to ADTK and scikit-learn

Written by Anais Dotis

No responses yet