1. Introduction: Why monitor the monitor?
Your telemetry collector is a critical component of your observability infrastructure. If it crashes or silently degrades, you lose all visibility into your applications. Self-monitoring involves treating Grafana Alloy like any other critical application: by monitoring its health, performance, and errors.
The risks of an unmonitored collector
- Silent data loss: A misconfigured component may stop sending data without generating a visible error.
- Excessive resource consumption: An inefficient log pipeline can lead to excessive CPU or RAM consumption, impacting other applications on the same host.
- Configuration errors: Syntax errors in a
.riverfile can prevent the configuration from reloading.
Self-monitoring allows detecting these issues proactively.
2. Collect Alloy metrics
Alloy exposes its own health and performance metrics in Prometheus format, as well as host metrics.
Host metrics (CPU, RAM, Disk)
Theprometheus.exporter.unix component (similar to node_exporter) is ideal for collecting basic system metrics on the host where Alloy is installed.// Expose system metrics (CPU, memory, disk, network)
prometheus.exporter.unix "local_system" {}
// Scrape the metrics exposed by the exporter above
prometheus.scrape "scrape_system" {
targets = prometheus.exporter.unix.local_system.targets
forward_to = [prometheus.remote_write.mimir.receiver]
}
prometheus.remote_write "mimir" {
// ... your Mimir/Prometheus endpoint configuration
}
Alloy internal metrics
Alloy exposes its own internal health and performance metrics.WARNING: Do not use the UI (e.g.,{"__address__" = "localhost:12345", "job" = "alloy"}) to collect metrics because it can be disabled. Instead, useprometheus.exporter.selfto collect metrics.
Use the prometheus.exporter.self component and add it to your scraping targets.
// Expose Alloy's internal metrics
prometheus.exporter.self "alloy_internal" {}
prometheus.scrape "scrape_alloy_and_system" {
targets = concat(
prometheus.exporter.unix.local_system.targets,
prometheus.exporter.self.alloy_internal.targets
)
forward_to = [prometheus.remote_write.mimir.receiver]
}
// Key metrics to monitor:
// - alloy_component_health: Health of each component (0=unhealthy, 1=healthy)
// - process_cpu_seconds_total: CPU time consumed by Alloy
// - process_resident_memory_bytes: RAM memory used by Alloy
3. Collect Alloy logs
Capturing the logs produced by Alloy is essential for diagnosing configuration errors or runtime issues.
Recommended method: via journald
If Alloy is run as asystemd service, the best method is to read the system journal directly.// Discover and filter logs for 'alloy.service'
discovery.relabel "journal_filter" {
rule {
source_labels = ["__journal__systemd_unit"]
regex = "alloy\\.service"
action = "keep"
}
}
// Read logs from journald
loki.source.journal "read_alloy_logs" {
forward_to = [loki.write.loki_endpoint.receiver]
relabel_rules = discovery.relabel.journal_filter.rules
}
loki.write "loki_endpoint" {
// ... your Loki endpoint configuration
}
Alternative: via a local file
If you redirect Alloy's standard output to a file (e.g.,/var/log/alloy.log), you can use loki.source.file.local.file_match "alloy_log_file" {
path_targets = [{"__path__" = "/var/log/alloy.log"}]
}
loki.source.file "read_from_file" {
targets = local.file_match.alloy_log_file.targets
forward_to = [loki.write.loki_endpoint.receiver]
}Don't forget to configure log rotation to avoid filling up the disk.
4. Collect Traces and Profiles (Profiling)
For advanced performance debugging, Alloy can expose performance profiles via its web interface.
Enable profiling with pprof
Profiling endpoints (pprof) are available on Alloy's web interface.WARNING: Just like with metrics, relying on the local UI (e.g.,{"__address__" = "localhost:12345"}) for profiling can be risky if it is disabled in production. Ensure it is active if you usepyroscope.scrape.
pyroscope.scrape "alloy_profiling" {
targets = [
{"__address__" = "localhost:12345", "job" = "alloy"}
]
forward_to = [pyroscope.write.pyroscope_endpoint.receiver]
}
pyroscope.write "pyroscope_endpoint" {
endpoint {
url = "http://pyroscope:4040/api/v1/push"
}
}
// Collected profiles:
// - alloy_process_cpu: CPU utilization profile
// - alloy_process_mem: Memory allocation (heap) profileCollecting Alloy's internal traces is a very advanced use case, generally reserved for the product's development itself. For most users, metrics and profiles are sufficient for self-monitoring.
5. Conclusion: A loop of trust
By configuring Grafana Alloy to monitor itself, you create a loop of trust. You can not only validate that your collector is working properly, but also optimize its performance and be alerted immediately in case of an issue. This is a fundamental step in building a robust and reliable observability platform.