Health Checks for Kubernetes
Helium server provides HTTP health check endpoints designed for Kubernetes liveness and readiness probes. These endpoints run on a separate internal port (default: 9090) and are enabled for all worker modes.
Overview
Health checks help Kubernetes determine:
- Liveness: Is the container alive and should it be restarted if it becomes unresponsive?
- Readiness: Is the container ready to handle requests?
Helium implements both probe types on a dedicated HTTP server that runs alongside each worker mode.
Endpoints
Liveness Probe: /healthz
Returns 200 OK with a JSON response if the server is running:
{
"status": "ok"
}
This endpoint always returns success if the health check server is responding. Kubernetes uses this to determine if the container should be restarted.
Readiness Probe: /readyz
Checks connectivity to all dependencies before returning status:
Success Response (200 OK):
{
"status": "ok",
"database": "ok",
"redis": "ok",
"rabbitmq": "ok"
}
Failure Response (503 Service Unavailable):
{
"status": "error",
"database": "ok",
"redis": "error",
"rabbitmq": "ok",
"error": "Redis error: Connection refused"
}
The readiness probe checks:
- PostgreSQL: Executes a simple query (
SELECT 1) - Redis: Sends a
PINGcommand - RabbitMQ: Validates connection pool status
All worker modes check the same three dependencies.
Configuration
Health Check Port
Set the HEALTH_CHECK_PORT environment variable to customize the port (default: 9090):
export HEALTH_CHECK_PORT=9090
This port should be:
- Internal only: Not exposed to external traffic
- Accessible by Kubernetes: For probe requests
- Different from main service ports: To avoid conflicts
Worker Modes
Health checks are available in all worker modes:
| Worker Mode | Main Port | Health Check Port | Dependencies Checked |
|---|---|---|---|
grpc | 50051 | 9090 | Database, Redis, RabbitMQ |
subscribe_api | 8080 | 9090 | Database, Redis, RabbitMQ |
webhook_api | 8081 | 9090 | Database, Redis, RabbitMQ |
consumer | N/A | 9090 | Database, Redis, RabbitMQ |
mailer | N/A | 9090 | Database, Redis, RabbitMQ |
cron_executor | N/A | 9090 | Database, Redis, RabbitMQ |
Kubernetes Deployment
Example Pod Configuration
Here’s how to configure health checks in your Kubernetes deployment:
apiVersion: apps/v1
kind: Deployment
metadata:
name: helium-grpc
spec:
replicas: 3
selector:
matchLabels:
app: helium-grpc
template:
metadata:
labels:
app: helium-grpc
spec:
containers:
- name: helium-grpc
image: helium-server:latest
env:
- name: WORK_MODE
value: "grpc"
- name: LISTEN_ADDR
value: "0.0.0.0:50051"
- name: HEALTH_CHECK_PORT
value: "9090"
- name: DATABASE_URL
valueFrom:
secretKeyRef:
name: helium-secrets
key: database-url
- name: REDIS_URL
valueFrom:
secretKeyRef:
name: helium-secrets
key: redis-url
- name: MQ_URL
valueFrom:
secretKeyRef:
name: helium-secrets
key: mq-url
ports:
- name: grpc
containerPort: 50051
protocol: TCP
- name: health
containerPort: 9090
protocol: TCP
livenessProbe:
httpGet:
path: /healthz
port: health
initialDelaySeconds: 10
periodSeconds: 10
timeoutSeconds: 5
failureThreshold: 3
readinessProbe:
httpGet:
path: /readyz
port: health
initialDelaySeconds: 5
periodSeconds: 5
timeoutSeconds: 5
failureThreshold: 3
Probe Configuration Guidelines
Liveness Probe:
initialDelaySeconds: 10-30 seconds (allow time for startup)periodSeconds: 10-30 seconds (check periodically)timeoutSeconds: 5 secondsfailureThreshold: 3 (restart after 3 consecutive failures)
Readiness Probe:
initialDelaySeconds: 5-10 seconds (faster than liveness)periodSeconds: 5-10 seconds (check more frequently)timeoutSeconds: 5 secondsfailureThreshold: 3 (mark unready after 3 failures)
Service Configuration
For API worker modes (grpc, subscribe_api, webhook_api), configure a Service:
apiVersion: v1
kind: Service
metadata:
name: helium-grpc
spec:
type: ClusterIP
ports:
- name: grpc
port: 50051
targetPort: grpc
protocol: TCP
selector:
app: helium-grpc
Note: The health check port (9090) is not exposed in the Service. It’s only for Kubernetes probes.
Worker Mode Behavior
API Modes (grpc, subscribe_api, webhook_api)
For API modes, the health check server runs alongside the main API server:
- When the main server exits, the health check server is immediately terminated
- Process exits when either server fails
- Ensures no “zombie” containers serving health checks without handling requests
Background Worker Modes (consumer, mailer, cron_executor)
For background workers, the health check server runs continuously:
- Liveness probe confirms the worker process is alive
- Readiness probe ensures dependencies are accessible
- Worker loops indefinitely alongside health check server
Troubleshooting
Health Check Server Not Starting
Symptom: Probes fail immediately with connection errors
Solutions:
- Check logs for health check server errors
- Verify
HEALTH_CHECK_PORTis not already in use - Ensure the port is accessible within the pod
Readiness Probe Failing
Symptom: Pod remains in “Not Ready” state
Solutions:
- Check which dependency is failing in the
/readyzresponse - Verify connection strings (DATABASE_URL, REDIS_URL, MQ_URL)
- Ensure network policies allow pod access to dependencies
- Check if dependencies are healthy
Example debugging:
# Forward health check port to local machine
kubectl port-forward pod/helium-grpc-xyz 9090:9090
# Check readiness endpoint
curl http://localhost:9090/readyz
Liveness Probe Causing Restart Loop
Symptom: Pod repeatedly restarts with liveness probe failures
Solutions:
- Increase
initialDelaySeconds(worker may need more startup time) - Increase
failureThreshold(allow more failures before restart) - Check if worker is deadlocked or stuck (examine logs before restart)
Worker Exits But Pod Stays Running
Symptom: Container appears healthy but doesn’t process requests
This should not happen with the current implementation:
- API workers: Health check is aborted when main server exits
- Background workers: Return from
execute_worker()causes process exit
If this occurs, file a bug report.
Security Considerations
Port Exposure
The health check port (9090) should never be exposed externally:
- Don’t create Ingress rules for health check endpoints
- Don’t expose the health check port in the Service definition
- Use network policies to restrict access to Kubernetes control plane only
Sensitive Information
Health check responses contain minimal information:
- No version numbers
- No internal IPs or hostnames
- No authentication tokens
- Only dependency status (ok/error)
Error messages may contain connection details. Ensure logs are secured appropriately.
Best Practices
- Use separate ports: Never combine health checks with main service endpoints
- Set appropriate timeouts: Balance between quick detection and false positives
- Monitor probe metrics: Track probe success rates in your observability stack
- Test locally: Use port-forwarding to verify health checks before deployment
- Align with dependencies: If using a sidecar proxy (Istio, Linkerd), configure startup probes
Summary
Helium’s health check endpoints provide robust Kubernetes integration:
- Liveness probe (
/healthz): Detects unresponsive containers - Readiness probe (
/readyz): Ensures dependencies are healthy - Separate port (default 9090): Isolated from main services
- All worker modes: Consistent behavior across deployment types
- Process lifecycle: Ensures clean exits, no zombie containers
Configure these probes in your Kubernetes deployments to enable automatic recovery and load balancing.