TL;DR InfluxDB Tech Tips — Optimizing Flux Performance in InfluxDB Cloud

General recommendations for Flux performance optimization

  1. Take advantage of pushdown patterns.
  2. Schema mutation functions should be applied at the end of your query.
  3. Use variables to avoid querying data multiple times.
  4. Divide processing work across multiple tasks when needed.

Taking advantage of pushdown patterns

Using schema mutations properly

import "array"
import "experimental"
start = experimental.subDuration(
d: -10m,
from: now(),
)
bucket1 = array.from(rows: [{_start: start, _stop: now(), _time: now(),_measurement: "mymeas", _field: "myfield", _value: "foo1"}])
|> yield(name: "bucket1")

bucket2 = array.from(rows: [{_start: start, _stop: now(), _time: now(),_measurement: "mymeas", _field: "myfield", _value: "foo2"}])
|> yield(name: "bucket2")
join(tables: {bucket1: bucket1, bucket2: bucket2}, on: ["_time"], method: "inner")
|> drop(columns:["_start_field1", "_stop_field1", "_measurement_field1", "myfield1"])
|> yield(name: "bad_join")
join(tables: {bucket1: bucket1, bucket2: bucket2}, on: ["_start","_stop""_time", “_measurement”,"_field"], method: "inner")
|> yield(name: "good_join")

Using variables to avoid querying data multiple times

from(bucket: "my-bucket")
|> range(start: -1h)
|> filter(fn: (r) => r._measurement == "my_measurement")
|> mean()
|> set(key: "agg_type",value: "mean_temp")
|> to(bucket: "downsampled", org: "my-org", tagColumns:["agg_type"])
from(bucket: "my-bucket")
|> range(start: -1h)
|> filter(fn: (r) => r._measurement == "my_measurement")
|> count()
|> set(key: "agg_type",value: “count_temp")
|> to(bucket: "downsampled", org: "my-org", tagColumns: ["agg_type"])
Copy
data = from(bucket: "my-bucket")
|> range(start: -1h)
|> filter(fn: (r) => r._measurement == "my_measurement")
data
|> mean()
|> set(key: "agg_type",value: "mean_temp")
|> to(bucket: "downsampled", org: "my-org", tagColumns: ["agg_type"])
data
|> count()
|> set(key: "agg_type",value: “count_temp")
|> to(bucket: "downsampled", org: "my-org", tagColumns: ["agg_type"])

Dividing processing work across multiple tasks

What is the Flux Profiler package?

import "profiler"option profiler.enabledProfilers = ["query", "operator"]from(bucket: "noaa")
|> range(start: 2019-08-17T00:00:00Z, stop: 2019-08-17T00:30:00Z)
|> filter(fn: (r) =>
r._measurement == "h2o_feet" and
r._field == "water_level" and
r.location == "coyote_creek"
)
|> map(fn: (r) => ({ r with
_value: r._value * 12.0,
_measurement: "h2o_inches"
}))
|> drop(columns: ["_start", "_stop"])
  • The first table provides information about your entire query including the total duration time it took to execute the query as well as the time it spent compiling, in the queue, etc.
  • The second table provides information about where the query is spending the most amount of time.
from(bucket: "noaa")
|> range(start: 2019-08-17T00:00:00Z, stop: 2019-08-17T00:30:00Z)
|> filter(fn: (r) =>
r._measurement == "h2o_feet" and
r._field == "water_level" and
r.location == "coyote_creek"
)
|> drop(columns: ["_start", "_stop"])
|> set(key: "_measurement",value: "h2o_inches")
|> map(fn: (r) => ({ r with
_value: r._value * 12.0,
}))

Using the Flux extension for Visual Studio Code to streamline Flux optimization discovery

Other tips

  1. Can you use the experimental.join() function instead of join() function?
  2. Can you apply any groups that will reduce the number of rows in your table(s) before applying a map() function?
  3. Can you tune any regexes to be as specific as possible?
  4. Does |> sort(columns: ["_time"], desc: false) |> limit(n:1) perform better than |> last()?

Best practices for receiving help

  • What is the query that’s having an issue?
  • Make sure to share it as well as including the output from the Profiler.
  • What is the cardinality of your data? (how many series)
  • Try using the InfluxDB Operational Monitoring Template to help you find the cardinality of your data.
  • Alternatively try using the schema.cardinality() function to help you find the cardinality of your data.
  • What is the density of your data? (how many points per unit time in each series)
  • Try using the InfluxDB Cloud Usage Template to help you identify the data into your InfluxDB Cloud account.
  • General information about how your data is structured (which measurements, fields and tag keys exist) is always helpful.
  • Try using the schema package to share your data structure.
  • What is your expectation of how fast a query should run? What’s the basis for that expectation?

Final thoughts on optimizing Flux performance in InfluxDB Cloud

Further reading

  1. Top 5 Hurdles for Flux Beginners and Resources for Learning to Use Flux: This post describes common hurdles for Flux beginners and how to tackle them by using the InfluxDB UI, understanding Annotated CSV, and more.
  2. Top 5 Hurdles for Intermediate Flux Users and Resources for Optimizing Flux: This post describes common hurdles for intermediate and advanced Flux users while providing more detail on pushdown patterns, how the Flux engine works, and more.
  3. Using and Understanding the InfluxDB Cloud Usage Template: This post describes how to use the InfluxDB Cloud Usage Template, which can help you determine the data into your InfluxDB account and can help the Flux team better understand whether your Flux performance is good or not.
  4. TL;DR InfluxDB Tech Tips — Monitoring Tasks and Finding the Source of Runaway Cardinality: This post describes how to use the InfluxDB Operational Monitoring Template which can help you understand the cardinality of your data.

--

--

--

Developer Advocate at InfluxData

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

How to Detect XXE Attacks from Multiple Text Inputs in PHP

How to Use ‘break’, ‘pass’, and ‘continue’ Statements in Python

How to split a PDF file into separate page PDF files in Python

Netflix on AWS Cloud: Case Study

Jumping in Headfirst

Hello World Program in Different Programming Language

IC NFT Checklist (Part 1)

Downsizing an RDS instance with minimal downtime

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Anais Dotis

Anais Dotis

Developer Advocate at InfluxData

More from Medium

BenchSci Raises $63 Million Series C to Solve Pharma’s Biggest R&D Challenges with AI-Powered…

Google Maps Integration (regardless of which CRS it is)

How Feedzai ARMS automates rule management in large scale systems

Convert MLMultiArray to Image for PyTorch models without performance lags