Member-only story

BIRCH for Anomaly Detection with InfluxDB

Anais Dotis
5 min readJul 20, 2020

--

In this tutorial, we’ll use the BIRCH (balanced iterative reducing and clustering using hierarchies) algorithm from scikit-learn with the ADTK (Anomaly Detection Tool Kit) package to detect anomalous CPU behavior. We’ll use the InfluxDB 2.0 Python Client to query our data in InfluxDB 2.0 and return it as a Pandas DataFrame.

This tutorial assumes that you have InfluxDB and Telegraf installed and configured on your local machine to gather CPU stats. To easily gather system stats on your local machine, install InfluxDB and automatically configure Telegraf to add the System plugin.

We recommend running the code in this blog inside a virtual environment with Python 3.6+. The requirements.txt for this project looks like:

adtk==0.6.2
pandas==0.23.4
sklearn==0.23.1

A brief explanation of BIRCH

BIRCH (balanced iterative reducing and clustering using hierarchies) is an unsupervised clustering algorithm optimized for high performance on large datasets. It’s also good at reducing noise in the dataset to find meaningful patterns and produce accurate models. It’s similar to the more popular k-means clustering algorithm.

An introduction to ADTK and scikit-learn

ADTK (Anomaly Detection Tool Kit) is a Python package for unsupervised anomaly detection for time series data. According to the documentation, “This package offers a set of common detectors…

--

--

Anais Dotis
Anais Dotis

Written by Anais Dotis

Developer Advocate at InfluxData

No responses yet