Creating Datadog Alerts Based on Log Metrics

A tutorial on setting up Datadog monitoring processes

Computer with Datadog code.

As a cloud dev at Octobot, I am dedicated to providing support to the different development teams when they need it. This article is born from one of these experiences, when I joined a project with VES-Artex and started using the monitoring tool Datadog.

Why Datadog

The platform we developed and continue to work on for VES-Artex is the DairyBOS app, used to manage environmental conditions at dairy barns, always with a strong focus on animal welfare. DairyBOS also provides reports that help dairy farmers make better, more insightful decisions. You can learn more about this project here

In this project, we used Kibana on Elasticsearch for monitoring the system health and logs. Sometime ago, as part of a bigger consolidation effort over the client side, we started migrating our log collection and metrics to Datadog. In there, all operations could be centrally overseen and monitored.

We came across a couple of problems that required extracting metrics from log messages, or extracting metrics from log analysis. In the following paragraphs, I’ll share how we overcame these challenges in order to successfully implement Datadog.

Datadog alerts based on log messages

Let’s get started by identifying which messages are relevant to our problem. In our case, the filtering was done like this:

Datadog code

And here is an example of one of those logs:

Datadog code

In my case, I wanted to extract the number of ms that this response sends, expressed with the number 81 at the end of the line. This way, I can send an alert every time it exceeds a given threshold. Let’s go with 5 seconds for this example.

Pipelines and processors:

Once you have the log query go to Logs > Configuration. Then, you are going to create a new pipeline, in the filter box you are going to paste the same query you did in the last step and set a name to it. You can add tags or a description as desired.

Now, under your new pipeline you are going to create a new processor. If you want to parse logs based on patterns you should choose a Grok Parser type processor. Grok Parser processors have the ability to print 3 logs examples and 3 patterns respectively, also called rules. I recommend editing those rules so with one or two you can match every log you want to consider. 

For example:

Datadog code

Datadog code

**At the top you can see 3 log samples that exactly match my rule.

It’s important to define the name of the attributes in your rule because you are going to use them in the metrics section later on. The right syntax to define an attribute is {matcher:name}, as you can see in the rules box. You can see how the attributes defined in your rule would look under the rules section.

Datadog code

From now on, every new log that matches your rule is going to bring every attribute defined in it.

Note: This rule is not going to apply to older logs, only the new ones.

The final step before going into the metrics, is to create a measure based on your attribute, to do it simply click on the wheel next to your attribute and select Create Measure.

Metrics

You are almost there! Once your measure is created, go to Logs > Configuration > Generate Metrics. Create a new one and in the query field use the same as before, but add your created measure. So now you can filter the same log query, but your result is going to be your attribute.

Datadog code

Choose a name for your metric and view it in Metric Explorer so it syncs with the other metrics.

Datadog code

Monitor

Now you are ready to create your alert. Go to Monitors > New Monitor, you can choose by Metrics if you are only interested in the result of your measure, or by Logs if you also want to add a sample of logs in your alert. In my case I chose the second option. 

As you could see, by only selecting your measure the query automatically brings the result without writing it above, but I suggest you write it for reference. You could also group your results with avg by option if you want to register multiple alerts depending on your needs.

Datadog code

In my case, I group them by http.url (URL attribute defined in my processor rule) and now I’m able to set a multi-alert, which will trigger a separate alert for each URL.

Datadog code

You can also trigger a simple alert for any URL if you prefer. After setting your trigger conditions and notification settings, you are good to go. Here is an example of how the alerts look when it triggers on my slack channel:

Datadog code

Keep learning

This is how we solved this necessity presented by our client; I hope you found it helpful. As we adapt to changes and look for innovative solutions to the situations we face in our projects, we have the opportunity to learn new technologies and tools, and also expand our knowledge as a team.

If you want to continue learning about this tool, I recommend these links:

See related posts