Introducing 2 second metric resolution in Dripstat

The more complex our systems get and the more we are used to high availability cloud services, the greater the need for high resolution, real time data. With 2-second resolution, streamed without delay, you can see the results of actions on your system instantly. More granular data helps you get a better a understanding of your system, since what was flat line in a lower resolution system can become full of tiny spikes that could help you get a better understanding of your running system and diagnose issues quicker.

Today we are launching 2-second resolution metrics in Dripstat, starting with Infra Integrations.

We wanted to push the boundary of how real time a monitoring system can be. Our aim with the new high resolution metrics was:

  • Collect data every 2 seconds
  • Have data available for display instantly after collection (not after another longer processing interval say, 20 seconds)
  • All dashboards should be auto refreshing
  • Allow seamlessly zooming in from a longer time range, say 1 week, down to the highest resolution, ie 2 seconds

Looking around the Observability landscape, we couldn't find any solution that matches all the above criteria.

  • Datadog and NewRelic collect data every 15 seconds
  • AWS Cloudwatch's 'high resolution' metrics can ingest per second data, but it collects them only every 10 second. Thus they refresh only every 10 seconds, effectively limiting the resolution to 10 seconds when you need the most recent data, where the high resolution would matter the most.
  • Google Stackdriver collects every 1 minute

Below is a chart showing metrics streamed at different resolutions (higher res video here). The charts are refreshed at the same interval as their resolution.  

Notice how much granular the 2 second chart is compared to 10 and 14 seconds. The 14 second chart looks almost still and the 1 minute chart is too slow to refresh at all in the time frame of this video! If you do the math, the difference becomes obvious. 2-second resolution data is 500% more than 10-second data, thats half an order of magnitude more; and 700% more than 14 second data, close to three quarters of a magnitude. Its 3,000% more data than 1-minute resolution, thats 3 times an order of magnitude!

Here is the Dripstat MySQL integration dashboard with all metrics being refreshed every 2 seconds (higher res video):

The new high resolution metrics are a part of our new dashboard experience that we introduced with Kubernetes Monitoring. Today we are making the high resolution metrics are available for all the Infra Integrations (with more integrations coming soon).

The high resolution metrics will be coming to Server metrics and APM shortly.

For now, update your Infra agents to version 5.0 and configure the integrations to get  2-second, realtime metrics on your databases!

Show Comments