Member-only story
BIRCH for Anomaly Detection with InfluxDB
In this tutorial, we’ll use the BIRCH (balanced iterative reducing and clustering using hierarchies) algorithm from scikit-learn with the ADTK (Anomaly Detection Tool Kit) package to detect anomalous CPU behavior. We’ll use the InfluxDB 2.0 Python Client to query our data in InfluxDB 2.0 and return it as a Pandas DataFrame.
This tutorial assumes that you have InfluxDB and Telegraf installed and configured on your local machine to gather CPU stats. To easily gather system stats on your local machine, install InfluxDB and automatically configure Telegraf to add the System plugin.
We recommend running the code in this blog inside a virtual environment with Python 3.6+. The requirements.txt
for this project looks like:
adtk==0.6.2
pandas==0.23.4
sklearn==0.23.1
A brief explanation of BIRCH
BIRCH (balanced iterative reducing and clustering using hierarchies) is an unsupervised clustering algorithm optimized for high performance on large datasets. It’s also good at reducing noise in the dataset to find meaningful patterns and produce accurate models. It’s similar to the more popular k-means clustering algorithm.
An introduction to ADTK and scikit-learn
ADTK (Anomaly Detection Tool Kit) is a Python package for unsupervised anomaly detection for time series data. According to the documentation, “This package offers a set of common detectors…