Observability with OpenTelemetry
Helium server includes optional OpenTelemetry (OTel) integration for comprehensive observability. This integration is completely optional — the server will work perfectly fine without it using basic structured logging.
What is OpenTelemetry?
OpenTelemetry provides distributed tracing, metrics collection, and contextual logging for production systems. Use it when:
- Running multiple worker instances requiring distributed tracing
- Need detailed performance analysis and troubleshooting
- Want centralized observability dashboards
Skip it for simple deployments, development environments, or when basic logging is sufficient.
Configuration
Enable OpenTelemetry by setting the OTEL_COLLECTOR environment variable:
export OTEL_COLLECTOR="http://otel-collector:4317"
./helium-server
If not set or initialization fails, the server automatically falls back to basic logging.
Service Names
Each worker mode reports with a distinct service name:
| Worker Mode | Service Name |
|---|---|
grpc | Helium.grpc |
subscribe_api | Helium.subscribe-api |
webhook_api | Helium.webhook-api |
consumer | Helium.consumer |
mailer | Helium.mailer |
cron_executor | Helium.cron-executor |
Recommended Stack: Grafana Observability
For production deployments, we recommend the Grafana observability stack — an open-source, Kubernetes-native solution with unified dashboards for traces, metrics, and logs.
Components
- OpenTelemetry Collector: Receives and routes telemetry
- Grafana Tempo: Distributed tracing storage
- Prometheus: Metrics collection
- Grafana Loki: Log aggregation
- Grafana: Unified visualization
Deployment
Deploy the Grafana stack alongside your Kubernetes cluster:
1. Add Helm Repositories
helm repo add grafana https://grafana.github.io/helm-charts
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update
2. Create Namespace
kubectl create namespace observability
3. Deploy OpenTelemetry Collector
Create otel-collector-values.yaml:
mode: deployment
config:
receivers:
otlp:
protocols:
grpc:
endpoint: 0.0.0.0:4317
http:
endpoint: 0.0.0.0:4318
processors:
batch:
timeout: 10s
send_batch_size: 1024
exporters:
# Traces to Tempo
otlp/tempo:
endpoint: tempo.observability.svc.cluster.local:4317
tls:
insecure: true
# Metrics to Prometheus
prometheus:
endpoint: 0.0.0.0:8889
namespace: helium
# Logs to Loki
loki:
endpoint: http://loki.observability.svc.cluster.local:3100/loki/api/v1/push
service:
pipelines:
traces:
receivers: [otlp]
processors: [batch]
exporters: [otlp/tempo]
metrics:
receivers: [otlp]
processors: [batch]
exporters: [prometheus]
logs:
receivers: [otlp]
processors: [batch]
exporters: [loki]
ports:
otlp-grpc:
enabled: true
containerPort: 4317
servicePort: 4317
protocol: TCP
otlp-http:
enabled: true
containerPort: 4318
servicePort: 4318
protocol: TCP
metrics:
enabled: true
containerPort: 8889
servicePort: 8889
protocol: TCP
helm install otel-collector grafana/opentelemetry-collector \
--namespace observability \
--values otel-collector-values.yaml
4. Deploy Tempo, Loki, and Prometheus
# Tempo for traces
helm install tempo grafana/tempo \
--namespace observability \
--set tempo.receivers.otlp.protocols.grpc.endpoint=0.0.0.0:4317
# Loki for logs
helm install loki grafana/loki-stack \
--namespace observability \
--set loki.enabled=true \
--set promtail.enabled=false
5. Deploy Prometheus
helm install prometheus prometheus-community/kube-prometheus-stack \
--namespace observability \
--set grafana.enabled=false
6. Deploy Grafana
helm install grafana grafana/grafana \
--namespace observability \
--set adminPassword=changeme
Configure data sources in Grafana to connect Tempo, Prometheus, and Loki.
Troubleshooting
Server logs show “Failed to initialize OpenTelemetry”
Check that the OTel Collector is reachable at the configured endpoint. The server will automatically fall back to basic logging.
Missing traces in Grafana
Verify the data pipeline: Helium → OTel Collector → Tempo. Check logs at each stage.
Performance impact
OpenTelemetry adds minimal overhead: < 2% CPU, ~10-20MB memory, < 1ms latency per request.
Disabling OpenTelemetry
Simply unset the OTEL_COLLECTOR variable — the server automatically falls back to basic logging.
Summary
OpenTelemetry in Helium is completely optional:
- Set
OTEL_COLLECTORto enable, leave unset to use basic logging - Automatic fallback if initialization fails
- Recommended for production with multiple instances
- Grafana stack provides open-source, Kubernetes-native observability
For detailed Helm deployment configurations, refer to the official Grafana Helm charts documentation.