Introduction

Running a rollup node can be complex, with many moving parts that need to work together seamlessly. This is why observability is crucial—it helps you understand how the system operates, identify issues, and ensure smooth performance.

In this tutorial, we’ll explain how to set up observability for a Sovereign SDK node, giving you the tools to monitor, debug, and optimize your rollup node.

To achieve observability, the Sovereign SDK uses three essential data types: logs, traces, and metrics:

  1. Logs: Logs are discrete, timestamped records of events or messages generated by the rollup node and its dependencies. They provide detailed insights into what happened at specific points in time, helping with debugging and error analysis.
  2. Traces: Traces represent the lifecycle of critical operations within the rollup, broken down into spans. They help visualize the flow of requests across components and pinpoint bottlenecks or performance issues.
  3. Metrics: Metrics are numerical measurements of specific aspects of the node's execution, such as transactions count, size, processing time and more high level, such as memory usage, transaction throughput, or latency. They provide a high-level view of system health and performance trends.

Having all three components—logs, traces, and metrics—gives you full visibility into the rollup node. This provides a full view of the rollup node, enabling quick troubleshooting, component dependency analysis, and performance monitoring:

This tutorial is divided into two parts:

  1. Logs and Traces: The first section focuses on setting up logs and traces to capture detailed event data and execution lifecycle.
  2. Metrics: The second section explains how to export and analyze metrics for high-level monitoring of your rollup node.

By the end of this tutorial, you’ll have a complete observability setup for your Sovereign SDK node, empowering you to ensure its stability and performance.

High Level Overview

The observability stack in the Sovereign SDK supports exporting logs and traces in the OpenTelemetry format (OpenTelemetry documentation). For this, we use Grafana Loki and Grafana Tempo to store the exported data and Grafana Alloy as a local agent. Grafana Alloy runs on the rollup server, where it collects logs and traces from the rollup node and exports them to Loki and Tempo.

For metrics, we rely on InfluxDB as the storage backend and Telegraf as the local agent responsible for collecting and buffering metric data before forwarding it to InfluxDB.

Why InfluxDB Instead of Prometheus?