Observability with OpenTelemetry

Helium server includes optional OpenTelemetry (OTel) integration for comprehensive observability. This integration is completely optional — the server will work perfectly fine without it using basic structured logging.

What is OpenTelemetry?

OpenTelemetry provides distributed tracing, metrics collection, and contextual logging for production systems. Use it when:

Running multiple worker instances requiring distributed tracing
Need detailed performance analysis and troubleshooting
Want centralized observability dashboards

Skip it for simple deployments, development environments, or when basic logging is sufficient.

Configuration

Enable OpenTelemetry by setting the OTEL_COLLECTOR environment variable:

export OTEL_COLLECTOR="http://otel-collector:4317"
./helium-server

If not set or initialization fails, the server automatically falls back to basic logging.

Service Names

Each worker mode reports with a distinct service name:

Worker Mode	Service Name
`grpc`	`Helium.grpc`
`subscribe_api`	`Helium.subscribe-api`
`webhook_api`	`Helium.webhook-api`
`consumer`	`Helium.consumer`
`mailer`	`Helium.mailer`
`cron_executor`	`Helium.cron-executor`

Recommended Stack: Grafana Observability

For production deployments, we recommend the Grafana observability stack — an open-source, Kubernetes-native solution with unified dashboards for traces, metrics, and logs.

Components

OpenTelemetry Collector: Receives and routes telemetry
Grafana Tempo: Distributed tracing storage
Prometheus: Metrics collection
Grafana Loki: Log aggregation
Grafana: Unified visualization

Deployment

Deploy the Grafana stack alongside your Kubernetes cluster:

1. Add Helm Repositories

helm repo add grafana https://grafana.github.io/helm-charts
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update

2. Create Namespace

kubectl create namespace observability

3. Deploy OpenTelemetry Collector

Create otel-collector-values.yaml:

mode: deployment

config:
  receivers:
    otlp:
      protocols:
        grpc:
          endpoint: 0.0.0.0:4317
        http:
          endpoint: 0.0.0.0:4318

  processors:
    batch:
      timeout: 10s
      send_batch_size: 1024

  exporters:
    # Traces to Tempo
    otlp/tempo:
      endpoint: tempo.observability.svc.cluster.local:4317
      tls:
        insecure: true

    # Metrics to Prometheus
    prometheus:
      endpoint: 0.0.0.0:8889
      namespace: helium

    # Logs to Loki
    loki:
      endpoint: http://loki.observability.svc.cluster.local:3100/loki/api/v1/push

  service:
    pipelines:
      traces:
        receivers: [otlp]
        processors: [batch]
        exporters: [otlp/tempo]

      metrics:
        receivers: [otlp]
        processors: [batch]
        exporters: [prometheus]

      logs:
        receivers: [otlp]
        processors: [batch]
        exporters: [loki]

ports:
  otlp-grpc:
    enabled: true
    containerPort: 4317
    servicePort: 4317
    protocol: TCP
  otlp-http:
    enabled: true
    containerPort: 4318
    servicePort: 4318
    protocol: TCP
  metrics:
    enabled: true
    containerPort: 8889
    servicePort: 8889
    protocol: TCP

helm install otel-collector grafana/opentelemetry-collector \
  --namespace observability \
  --values otel-collector-values.yaml

4. Deploy Tempo, Loki, and Prometheus

# Tempo for traces
helm install tempo grafana/tempo \
  --namespace observability \
  --set tempo.receivers.otlp.protocols.grpc.endpoint=0.0.0.0:4317

# Loki for logs
helm install loki grafana/loki-stack \
  --namespace observability \
  --set loki.enabled=true \
  --set promtail.enabled=false

5. Deploy Prometheus

helm install prometheus prometheus-community/kube-prometheus-stack \
  --namespace observability \
  --set grafana.enabled=false

6. Deploy Grafana

helm install grafana grafana/grafana \
  --namespace observability \
  --set adminPassword=changeme

Configure data sources in Grafana to connect Tempo, Prometheus, and Loki.

Troubleshooting

Server logs show “Failed to initialize OpenTelemetry”

Check that the OTel Collector is reachable at the configured endpoint. The server will automatically fall back to basic logging.

Missing traces in Grafana

Verify the data pipeline: Helium → OTel Collector → Tempo. Check logs at each stage.

Performance impact

OpenTelemetry adds minimal overhead: < 2% CPU, ~10-20MB memory, < 1ms latency per request.

Disabling OpenTelemetry

Simply unset the OTEL_COLLECTOR variable — the server automatically falls back to basic logging.

Summary

OpenTelemetry in Helium is completely optional:

Set OTEL_COLLECTOR to enable, leave unset to use basic logging
Automatic fallback if initialization fails
Recommended for production with multiple instances
Grafana stack provides open-source, Kubernetes-native observability

For detailed Helm deployment configurations, refer to the official Grafana Helm charts documentation.

Keyboard shortcuts

Helium Documentation