Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Helium

Helium is a modern commercial VPN SaaS system built with Rust, focusing on scalability, security, and user-friendliness.

Features

  • Kubernetes/docker native: stateless, horizontally scalable, and easy to deploy.
  • High security: no shell execution, no deserialize vulnerabilities, and no SQL injection.
  • Pluggable frontend: full functioned grpc API, and easy to build your own frontend.
  • Lightweight: Minimal to 40MB memory usage per service. Can handle 1000+ requests per second on a single 1-core CPU server.
  • Advanced Selling System: easy to handle complex business strategy, suitable for billions of users.

Tech Stack

  • Rust: Memory-safe systems programming with C-level performance
  • gRPC + Tonic: High-performance API with type-safe contracts
  • PostgreSQL + SQLx: Reliable database with compile-time query validation
  • Redis: Fast in-memory caching and session storage
  • AMQP: Reliable message queuing for microservices
  • Tokio: Async runtime for handling thousands of concurrent connections

Key Advantages:

  • Microservices architecture with independent scaling
  • Container-native design for Kubernetes deployment
  • Memory safety eliminates entire classes of security vulnerabilities
  • Exceptional performance: 1000+ RPS on single-core CPU with 40MB memory usage

Enterprise User Guide

This guide is for enterprises that run customer-facing VPN services on Helium. It is written for technical operators and support teams who understand coding, networking, and business operations, but do not need to read Helium source code.

Who should read this

  • Product and operations owners
  • Customer support and finance teams
  • Platform administrators
  • Technical onboarding staff

What this guide covers

The guide is organized by business flow, not by backend module:

  1. Onboard users and secure accounts
  2. Sell plans and process payments
  3. Deliver VPN access and manage package lifecycle
  4. Handle wallet, gift card, and referral operations
  5. Run customer support and announcements
  6. Operate admin workflows and infrastructure

How to use this guide

  • Start with user flows if you are launching customer operations
  • Use admin flows for internal team enablement
  • Share specific pages with the teams that own each workflow

What you can do with Helium

  • Run a full VPN SaaS business with user accounts, plans, payments, and support
  • Offer multiple payment methods and internal wallet usage
  • Control package access, node visibility, and policy by customer segment
  • Give different permissions to operations, support, and admin teams

What you cannot do with this guide

  • It does not replace security policy or compliance review for your organization
  • It does not document source-level implementation details
  • It does not include custom integration code for your internal systems

User Onboarding and Account Security

This flow explains how end users create accounts, sign in, recover access, and keep accounts secure.

What customers can do

  • Register with email and password
  • Register or sign in with supported OAuth providers
  • Enable multi-factor authentication (MFA)
  • Reset password by email link
  • Keep sessions active across devices, then log out when needed
flowchart TD
    start[UserStartsRegistration] --> sendEmail[RequestVerificationEmail]
    sendEmail --> emailLink[UserClicksMagicLink]
    emailLink --> setPassword[SetPasswordAndCreateAccount]
    setPassword --> autoLogin{AutoLoginEnabled}
    autoLogin -->|Yes| activeSession[SignedInSession]
    autoLogin -->|No| loginPage[GoToLogin]
    activeSession --> mfaSetup[OptionalMfaSetup]
    loginPage --> manualLogin[EmailOrOAuthLogin]

Registration

  1. User submits email (and optional referral code).
  2. System sends a time-limited link to that email.
  3. User opens the link, sets password, and completes account creation.
  4. If auto-login is enabled, user enters with a live session immediately.

Sign-in options

  • Email + password
  • OAuth provider login
  • MFA challenge when enabled

For enterprise UX, provide clear fallbacks: “Try another login method” and “Reset password.”

Password reset

  1. User requests reset email.
  2. User opens time-limited link.
  3. User sets new password.
  4. Existing sessions are invalidated for security.

This is expected behavior and should be explained on the reset success screen.

Session behavior

  • Access remains active through access and refresh token lifecycle.
  • Users can refresh sessions and continue without full re-login until refresh lifetime ends.
  • Security-sensitive changes can require re-verification.

What customers cannot do

  • Use expired or already-used email links
  • Keep old sessions alive after a successful password reset
  • Remove their last remaining login method (at least one method must remain)
  • Bypass MFA once it is required for their account actions

Purchasing a Plan

This flow describes how customers browse plans, create orders, pay, and receive service.

What customers can do

  • View plans available to their account tier
  • Apply valid coupons before checkout
  • Pay by supported gateway methods or account wallet
  • Cancel unpaid orders
  • Track order status until service delivery

End-to-end purchase flow

flowchart TD
    browse[BrowsePlans] --> coupon[OptionalApplyCoupon]
    coupon --> create[CreateOrder]
    create --> payMethod{ChoosePaymentMethod}
    payMethod -->|Gateway| gatewayPay[ExternalGatewayPayment]
    payMethod -->|Wallet| walletPay[WalletPayment]
    gatewayPay --> paid[OrderMarkedPaid]
    walletPay --> paid
    paid --> delivery[PackageDelivery]
    delivery --> active[ServiceBecomesActive]

Plan visibility

Customers only see plans that match their current eligibility (group access and sale status).
This allows the same platform to run multiple customer segments.

Coupon usage

  • Coupon is checked when user previews it.
  • Coupon is checked again at order creation.
  • Final payable amount is locked at order creation.

If a coupon becomes invalid before order creation completes, checkout should fail with a clear message.

Payment paths

Gateway payment

  • Customer is redirected to a payment page.
  • System confirms callback from provider.
  • Order becomes paid after verification.

Wallet payment

  • Uses available wallet balance only.
  • Payment and order update happen atomically.
  • Best UX for repeat customers with balance.

Order lifecycle

  • Unpaid: waiting for payment
  • Paid: payment confirmed, waiting for service delivery
  • Delivered: service package assigned
  • Cancelled: unpaid order cancelled by user, admin, or timeout

Operational notes

  • Unpaid orders are auto-cancelled after configured timeout.
  • There is a limit on how many unpaid orders one user can hold.
  • After successful payment, frontend should poll order status until delivered.

What customers cannot do

  • Use coupons outside validity rules (time window, limits, minimum amount)
  • Pay with wallet if available balance is insufficient
  • Recover a cancelled order (must create a new order)
  • Force immediate delivery if backend delivery queue is delayed

Using the VPN Service

This flow explains how customers receive and use VPN access after purchase.

What customers can do

  • Get personal subscription links
  • Import configs into supported proxy clients
  • Use nodes allowed by their active package
  • Queue future packages for automatic activation
  • Monitor usage and package consumption status

Access flow

flowchart TD
    paidOrder[OrderPaid] --> queue[PackageAddedToQueue]
    queue --> activate{HasActivePackage}
    activate -->|No| firstActive[ActivateFirstPackage]
    activate -->|Yes| stayQueued[RemainInQueue]
    firstActive --> nodeAccess[NodeAccessEnabled]
    stayQueued --> laterActivate[AutoActivateWhenCurrentConsumed]
    nodeAccess --> subscribe[GenerateSubscribeConfig]

Each user has a unique subscription token used to fetch client configuration from a subscribe URL.

  • URL can be opened directly in compatible clients
  • Client type can be auto-detected or user-selected
  • Generated config is filtered by user package permissions

Supported client families

  • Clash ecosystem
  • V2Ray-compatible clients
  • Sing-box ecosystem
  • iOS-focused clients such as QuantumultX, Loon, Surfboard, and Surge

Package lifecycle

  • InQueue: purchased, waiting
  • Active: currently providing service
  • Consumed: completed by time or traffic usage
  • Cancelled: removed due to cancellation/refund action

Only one package is active per user at a time. Additional packages wait and activate automatically in order.

Node access rules

Active package policy decides which nodes are visible to the customer.
If no package is active, node list and effective service access are empty.

What customers cannot do

  • Use nodes outside their active package group permissions
  • Activate multiple packages at the same time on one account
  • Use subscription token from another account
  • Keep access after package is fully consumed without another queued package

Wallet and Payments

This flow explains how customers use internal wallet balance, gift cards, and payment history.

What customers can do

  • View available and frozen balances
  • Redeem valid gift cards into wallet
  • Pay eligible orders from available balance
  • View balance change history for audit and support

Balance model

  • Available balance: spendable for order payment
  • Frozen balance: temporarily locked and not spendable

Both are shown as part of one wallet account per user.

Gift card redemption

  1. User enters gift card code.
  2. System validates card status (exists, unused, unexpired).
  3. Value is deposited into available balance.
  4. Card is marked redeemed and cannot be reused.

Paying orders with wallet

  1. User selects wallet payment.
  2. System checks available balance.
  3. Amount is deducted and order moves to paid status in one transaction.
  4. Payment record appears in balance history.

Balance history

History is used for:

  • customer transparency
  • support troubleshooting
  • finance reconciliation

Each record contains amount change, change type, reason, and timestamp.

What customers cannot do

  • Spend frozen balance
  • Redeem an expired or already-used gift card
  • Partially bypass payment rules with insufficient available balance
  • Directly edit or delete wallet history records

Referral Program

This flow explains how customers invite others and earn referral rewards.

What customers can do

  • Create invite codes (within configured limits)
  • Share invite codes with prospective users
  • Earn rewards when referred users complete qualifying purchases
  • Track referral performance and withdraw available rewards to wallet

Referral flow

flowchart TD
    code[CreateInviteCode] --> share[ShareCode]
    share --> signup[NewUserRegistersWithCode]
    signup --> purchase[ReferredUserCompletesQualifyingPayment]
    purchase --> reward[CommissionCalculated]
    reward --> stats[AffiliateStatsUpdated]
    stats --> withdraw[WithdrawToWallet]

Reward principles

  • Reward is based on a configured commission rate.
  • Each referred user can trigger rewards only up to a configured number of times.
  • Rewards are tracked as available amount until withdrawn.

Withdrawal

When user withdraws referral rewards:

  1. System checks withdrawable amount.
  2. Referral stats are updated.
  3. Wallet available balance is credited.

This should appear in both referral records and wallet history.

What customers cannot do

  • Create unlimited invite codes
  • Earn commission from non-qualifying payments (for example, restricted methods per policy)
  • Withdraw more than currently available reward
  • Change historical referral relation records

Support and Notifications

This flow explains how customers communicate with support teams and receive platform announcements.

What customers can do

  • Open support tickets with priority and description
  • Continue two-way conversations with support staff
  • Track ticket status changes
  • Read targeted announcements
  • Configure notification preferences (such as email categories)

Ticket lifecycle

flowchart TD
    open[UserCreatesTicket] --> waitingAdmin[StatusOpen]
    waitingAdmin --> adminReply[AdminReplies]
    adminReply --> waitingUser[StatusPending]
    waitingUser --> userReply[UserReplies]
    userReply --> waitingAdmin
    waitingAdmin --> resolved[AdminMarksResolved]
    resolved --> closed[AdminOrUserClosesTicket]

Ticket handling rules

  • Ticket belongs to the user who opened it.
  • Both sides can communicate while ticket is open or pending.
  • Closed tickets are archived for workflow completion.

Announcements

Announcements are broadcast messages targeted by user segment and priority.

  • Pinned announcements are shown first.
  • Priority helps users identify urgency.
  • Users only see announcements for groups they belong to.

Notification preferences

Users can control notification settings for available categories (for example: security, marketing, service reminders) based on your enterprise policy.

What customers cannot do

  • Access tickets that belong to another user
  • Continue sending messages on closed tickets
  • See announcements outside their targeting scope
  • Edit admin-authored support messages

Admin Getting Started

This flow explains how enterprises set up administrative access and role boundaries.

What admins can do

  • Onboard administrators through invitation flow
  • Authenticate with admin credentials and access tokens
  • Assign role-based permissions
  • Rotate and manage access credentials
  • Audit administrative actions

Admin onboarding flow

flowchart TD
    invite[CreateAdminInvitation] --> accept[AdminAcceptsInvitation]
    accept --> register[AdminCompletesRegistration]
    register --> issueKey[CredentialIssued]
    issueKey --> login[AdminLogin]
    login --> access[UseRoleBasedAdminFunctions]
RoleTypical responsibilityWrite capability
SuperAdminPlatform ownership and high-risk changesFull
ModeratorDaily operations and catalog managementBroad
CustomerSupportCustomer issue handlingLimited
SupportBotAutomated read-oriented workflowsMinimal/None

Enterprises should map these roles to internal SOPs before production launch.

Good operational practices

  • Use separate admin accounts per human operator.
  • Rotate credentials on a fixed schedule.
  • Keep support and platform ownership roles separate.
  • Review audit records regularly.

What admins cannot do

  • Use invitation links after they are consumed or expired
  • Exceed role permissions assigned by policy
  • Safely share one admin credential across multiple operators
  • Skip audit and governance requirements for sensitive actions

Admin Product Management

This flow explains how enterprise teams manage plans, coupons, and gift-card operations.

What admins can do

  • Build and maintain saleable plan catalog
  • Update package offerings through controlled versioning
  • Manage coupon campaigns and constraints
  • Generate and distribute gift cards
  • Keep existing customer entitlements stable during catalog updates

Package and product principles

  • Products are what customers buy.
  • Packages define service entitlements (traffic, duration, access scope).
  • Package series allow new versions while preserving old purchased terms.

This protects existing subscribers from unexpected entitlement changes.

Product operation flow

flowchart TD
    design[DefineOffer] --> packageVersion[CreateOrUpdatePackageVersion]
    packageVersion --> bindProduct[BindProductToPackageSeries]
    bindProduct --> publish[PublishForSale]
    publish --> monitor[MonitorSalesAndUsage]
    monitor --> iterate[IterateNextVersion]

Coupon campaign management

Admins can configure:

  • discount type (percentage or fixed amount)
  • valid time window
  • per-user and global usage limits
  • active/inactive lifecycle

Use preview messaging in frontend so customers understand why a coupon fails.

Gift card operations

  • Bulk generation for campaigns
  • Special-code generation for branded promotions
  • Expiration control for liability management
  • Support lookup for troubleshooting redemption issues

What admins cannot do

  • Change already-delivered customer package terms retroactively
  • Reuse a currently active coupon code conflict without deactivation strategy
  • Re-redeem single-use gift cards
  • Treat product updates as immediate entitlement changes for historical orders

Admin Customer Operations

This flow explains daily customer-facing operations for support and operations teams.

What admins can do

  • Search and inspect customer account state
  • Ban or unban users according to policy
  • Help recover account access (including MFA recovery operations)
  • Review order and payment state for troubleshooting
  • Apply controlled balance adjustments with documented reasons
  • Perform manual order interventions where policy allows

Typical support workflow

flowchart TD
    issue[CustomerIssueReported] --> identify[FindUserAndValidateIdentity]
    identify --> diagnose[CheckAccountOrderWalletState]
    diagnose --> action{NeedAdminAction}
    action -->|No| guidance[ProvideCustomerGuidance]
    action -->|Yes| execute[ExecuteAuthorizedAdminOperation]
    execute --> audit[RecordAuditAndSupportNote]
    guidance --> close[CloseSupportCase]
    audit --> close

Core operations

Account controls

  • Ban/unban user based on abuse, compliance, or security policy
  • Remove blocked authentication factors during verified recovery

Wallet controls

  • Deposit: credit balance
  • Consume: deduct balance
  • Freeze: move spendable funds into hold
  • Unfreeze: release held funds

Every balance adjustment should include clear human-readable reason text.

Order controls

  • Check unpaid/paid/delivered status
  • Assist with payment confirmation disputes
  • Use manual paid marking only under approved internal process
  • Handle partial compensation using approved balance adjustment process

What admins cannot do

  • Perform actions outside assigned role permissions
  • Adjust funds without reason and audit trace
  • Access or modify customer credentials directly
  • Use manual interventions as a substitute for payment controls and reconciliation policy

Admin VPN Infrastructure Operations

This flow explains how infrastructure teams run node capacity, routing quality, and delivery readiness.

What admins can do

  • Register and maintain node servers
  • Configure node clients and protocol offerings
  • Control node availability and maintenance state
  • Monitor node health, usage, and quality trends
  • Observe package queue delivery and activation behavior

Infrastructure operation flow

flowchart TD
    onboard[OnboardNodeServer] --> config[ConfigureNodeClientProfiles]
    config --> publish[ExposeEligibleNodesByGroup]
    publish --> monitor[MonitorHealthAndTraffic]
    monitor --> maintain[MaintenanceOrCapacityAdjustment]
    maintain --> monitor

Node operations

  • Keep node metadata accurate (location, route class, capacity intent).
  • Validate protocol configuration before publishing.
  • Use maintenance states to protect customer experience during changes.

Access and delivery relationship

  • Customer package policy controls which nodes users can access.
  • Package queue health affects when customers become active on service.
  • Infrastructure and commercial operations must coordinate release windows.

Observability checklist

  • node online/offline trends
  • abnormal traffic spikes
  • package activation lag
  • failed delivery or queue backlog
  • regional quality degradation

What admins cannot do

  • Expose nodes to customers without matching package access scope
  • Ignore maintenance signaling during disruptive changes
  • Treat node-client configuration and package policy as independent concerns
  • Assume delivery is healthy without queue and activation monitoring

Microservices Architecture

Helium is built as a collection of focused microservices that cooperate through a shared set of contracts, messaging patterns, and observability tooling. This section introduces the high-level layout of the system, explains how the services interact, and highlights the infrastructure choices that enable the platform to scale for large commercial VPN deployments.

Architectural Goals

  • Independent scaling – Each service can be deployed and scaled based on its workload characteristics (API traffic, background jobs, email throughput, etc.).
  • Clear boundaries – Services expose well-defined APIs (gRPC, REST, AMQP events) and depend on shared libraries for cross-cutting concerns, ensuring that business logic remains isolated inside its module.
  • Operational resiliency – Stateless services, database connection pooling, and message queues allow resilient deployments with graceful failure handling.
  • Security by design – Rust, strict processor patterns, and zero shared mutable state within processes prevent memory safety issues and accidental privilege escalations.

Service Topology

The helium-server crate can run in multiple worker modes. Each mode is packaged into its own container image or deployment unit, providing a natural microservice boundary while reusing the same codebase and shared libraries.

WorkerEntry PointResponsibilities
grpcGrpcWorkerExposes gRPC APIs for all business domains (Auth, Manage, Telecom, Market, Shop, Support, etc.). Performs request validation, invokes the corresponding module services, and emits events.
subscribe_apiSubscribeApiWorkerProvides REST endpoints optimized for subscription clients. Primarily a read-heavy facade backed by Redis caching and the service layer.
webhook_apiWebHookApiWorkerReceives payment gateway callbacks and external partner webhooks, normalizes payloads, and dispatches workflow events.
consumerConsumerWorkerListens on AMQP queues for asynchronous jobs (billing, node updates, provisioning) emitted by other services. Orchestrates long-running tasks that should not block API responses.
mailerMailerWorkerSpecialized consumer responsible for templated email delivery, retry management, and transactional messaging.
cron_executorCronWorkerPeriodically scans for scheduled work (subscription renewals, quota resets, health checks) and dispatches jobs via the same service layer used by the API workers.

These workers are deployed independently and scaled according to throughput requirements. For example, a busy billing period can scale the consumer and cron workers without affecting the gRPC API footprint.

Domain-Oriented Modules

Each domain (Auth, Manage, Telecom, Shop, Market, Notification, Support, Shield, Mailer) is implemented as an independent module under modules/. Modules follow a common layout (entities, services, rpc, hooks, events) as described in the Project Structure Guide. Within the microservices architecture:

  • Modules provide service layer processors that encapsulate business logic.
  • RPC layers expose the processors through gRPC servers. The GrpcWorker aggregates these services and mounts them behind a single TLS termination point, while keeping module ownership intact.
  • Hooks and events enable cross-module interactions without tight coupling, allowing, for instance, the Telecom module to emit usage events consumed by the billing logic in the Manage module.

Communication Patterns

Helium combines synchronous APIs with asynchronous messaging to balance latency and resiliency.

gRPC Contract

  • Tonic-generated servers provide strongly typed interfaces for customer-facing and operator APIs.
  • A uniform Processor trait ensures every RPC delegate is testable in isolation and can be reused by background workers.
  • Service discovery is handled at the infrastructure layer (Kubernetes or Docker Compose) because workers are stateless; clients load-balance using standard mechanisms (Envoy, NGINX, etc.).

REST Facades

  • Subscription and webhook workers expose lightweight REST routes via Axum.
  • REST APIs reuse the same service processors, ensuring identical business behavior across protocols and simplifying versioning.

Asynchronous Messaging

  • RabbitMQ (AMQP) is used to propagate domain events and dispatch background jobs.
  • Producers append metadata (correlation IDs, tenant identifiers) to support observability and reliable retries.
  • Consumers acknowledge messages only after successful processing, preventing data loss during failures.

Data Management

  • PostgreSQL is the system of record. SQLx is used through the DatabaseProcessor abstraction to keep SQL isolated inside entities/ modules and to support compile-time query checking.
  • Redis provides ephemeral caches, session storage, and rate limiting. The RedisConnection wrapper from helium-framework manages pooled connections shared by API and worker processes.
  • Consistent migrations live in the top-level migrations/ directory and are applied during deployment. Services run with zero shared mutable state; all coordination happens through the database or message queues.

Observability & Operations

  • Tracing is initialized in every worker with structured logs and span annotations. This enables distributed tracing across API and background workloads when combined with log collectors.
  • Metrics exporters (e.g., Prometheus integration) can be attached at the deployment layer because each worker exposes a predictable Axum/Tonic server endpoint.
  • Health probes: gRPC and REST workers perform dependency checks on startup (database, Redis, AMQP). Container orchestrators can use readiness/liveness probes to restart unhealthy instances.

Deployment Model

  • Workers are packaged as lightweight containers (<50MB RSS) and designed to be horizontally scalable. Scaling policies are set per worker depending on CPU or queue length metrics.
  • Configuration is provided through environment variables (DATABASE_URL, MQ_URL, REDIS_URL, WORK_MODE, etc.), making the platform 12-factor compliant.
  • Infrastructure typically consists of:
    • Kubernetes/Docker orchestrating the worker deployments
    • Managed PostgreSQL and Redis services
    • RabbitMQ cluster for messaging
    • Optional CDN or reverse proxy terminating TLS before forwarding requests to gRPC/REST workers.

Extensibility

Adding a new capability follows a repeatable pattern:

  1. Create or extend a module under modules/ with the Processor-based service implementation.
  2. Expose the functionality via RPC/REST by wiring the service into the relevant worker.
  3. Emit domain events or enqueue background jobs when work must be processed asynchronously.
  4. Deploy the updated worker image; other workers continue functioning without redeployment because contracts are versioned explicitly.

This approach keeps Helium maintainable while providing the flexibility to grow with complex VPN SaaS requirements.

Helium Project Structure Guide

This document describes the modular architecture and organization of the Helium VPN SaaS system.

Project Overview

Helium is a modern VPN SaaS system built with Rust, organized as a workspace with multiple modules. The system follows a modular architecture where each module represents a specific business domain.

Module Architecture

Each module follows a consistent internal structure with standardized components:

1. Entity Layer (entities/)

Purpose: Data models and database access patterns

Structure:

entities/
├── mod.rs                          # Module exports
├── db/                             # Database entity processors
│   ├── mod.rs
│   ├── user_account.rs             # User account queries/commands
│   └── ...
└── redis/                          # Redis entity processors
    ├── mod.rs
    ├── session.rs                  # Session cache operations
    └── ...

Key Patterns:

  • Implements Processor<Input, Result<Output, sqlx::Error>> for DatabaseProcessor
  • Contains all SQL queries and database operations
  • Separated by storage backend (db/ for PostgreSQL, redis/ for Redis)
  • No business logic - pure data access

2. Service Layer (services/)

Purpose: Business logic orchestration and workflows

Structure:

services/
├── mod.rs                          # Service exports
├── manage.rs                       # Management operations
├── user_account.rs                 # User profile management
└── ...

Key Patterns:

  • Implements Processor<Input, Result<Output, Error>> for service operations
  • Orchestrates multiple entity operations
  • Handles validation, transformation, and business rules
  • No direct SQL - delegates to entity processors
  • Uses DatabaseProcessor for data access

Example:

#[derive(Clone)]
pub struct UserManageService {
    pub db: sqlx::PgPool,
}

impl Processor<ListUsersRequest, Result<ListUsersResponse, Error>> for UserManageService {
    async fn process(&self, input: ListUsersRequest) -> Result<ListUsersResponse, Error> {
        let db = DatabaseProcessor::from_pool(self.db.clone());
        let users = db.process(ListUsers { ... }).await?;
        Ok(ListUsersResponse { users })
    }
}

3. gRPC Layer (rpc/)

Purpose: gRPC service implementations and external API

Structure:

rpc/
├── mod.rs                          # RPC exports
├── auth_service.rs                 # Authentication gRPC service
├── manage_service.rs               # Management gRPC service
├── middleware.rs                   # gRPC middleware
└── ...

Key Patterns:

  • Implements generated gRPC trait definitions
  • Converts protobuf messages to service DTOs
  • Delegates to service layer via Processor::process
  • Handles authentication and authorization

4. Hook System (hooks/)

Purpose: Event-driven side effects and integrations

Structure:

hooks/
├── mod.rs                          # Hook exports
├── billing.rs                      # Billing event hooks
├── register.rs                     # Registration hooks
└── ...

Key Patterns:

  • Responds to domain events
  • Handles cross-module integrations
  • Implements side effects (notifications, external API calls)
  • Decoupled from main business flows

5. Event System (events/)

Purpose: Domain event definitions and publishing

Structure:

events/
├── mod.rs                          # Event exports
├── user.rs                         # User-related events
├── order.rs                        # Order events
└── ...

Key Patterns:

  • Defines domain events using message queue integration
  • Publishes events for cross-module communication
  • Enables audit trails and analytics
  • Supports eventual consistency patterns

6. API Layer (api/)

Purpose: REST API endpoints and HTTP handlers

Structure:

api/
├── mod.rs                          # API exports
├── subscribe.rs                    # Subscription endpoints
└── xrayr/                          # XrayR integration APIs
    ├── mod.rs
    └── ...

Key Patterns:

  • Implements REST endpoints using Axum
  • Handles HTTP-specific concerns (parsing, serialization)
  • Delegates to service layer
  • Provides alternative to gRPC for specific use cases

7. Cron Jobs (cron.rs)

Purpose: Scheduled tasks and background jobs

Key Patterns:

  • Implements periodic maintenance tasks
  • Handles cleanup operations
  • Manages recurring billing cycles
  • Executes system health checks

8. Testing (tests/)

Purpose: Integration and unit tests

Structure:

tests/
├── common/                         # Test utilities
│   └── mod.rs                      # Common test setup
├── service_name_test.rs            # Service integration tests
└── ...

Key Patterns:

  • Integration tests for complete workflows
  • Uses testcontainers for database testing
  • Isolated test environments
  • Comprehensive service testing

Module Configuration

Dependencies (Cargo.toml)

Each module declares:

  • Workspace dependencies (shared versions)
  • Inter-module dependencies
  • Module-specific dependencies
  • Dev dependencies for testing
  • Build dependencies (typically tonic-prost-build for gRPC)

Build Configuration (build.rs)

Most modules include build scripts for:

  • gRPC code generation from proto files
  • Custom compilation steps
  • Environment-specific builds

Module Entry Point (lib.rs)

Standard module structure:

#![forbid(clippy::unwrap_used)]
#![forbid(unsafe_code)]
#![deny(clippy::expect_used)]
#![deny(clippy::panic)]

pub mod config;
pub mod cron;
pub mod entities;
pub mod events;
pub mod hooks;      // Optional
pub mod api;        // Optional
pub mod rpc;
pub mod services;

Protocol Buffers (proto/)

Organization: Organized by module with consistent naming:

proto/
├── auth/
│   ├── auth.proto                  # Core auth services
│   ├── account.proto               # Account management
│   └── manage.proto                # Admin operations
├── telecom/
│   ├── telecom.proto               # VPN services
│   └── manage.proto                # Telecom management
└── ...

Patterns:

  • Service definitions mirror module structure
  • Consistent message naming conventions
  • Shared types in common proto files

Key Architectural Principles

1. Processor Pattern

All APIs use the kanau::processor::Processor trait for consistent interfaces and composability.

2. Separation of Concerns

  • Entities: Data access only
  • Services: Business logic only
  • RPC/API: Protocol handling only
  • Events/Hooks: Side effects only

3. Database Abstraction

Services never contain raw SQL - all database access goes through entity processors.

4. Event-Driven Architecture

Modules communicate via events to maintain loose coupling.

5. RBAC and Audit

Administrative operations implement consistent role-based access control and audit logging.

Development Guidelines

Adding a New Module

  1. Create module directory under modules/
  2. Add basic Cargo.toml with workspace dependencies
  3. Create src/lib.rs with standard module structure
  4. Add module to workspace Cargo.toml
  5. Create proto definitions if gRPC services needed
  6. Implement entities → services → rpc layers in order

Testing Strategy

  1. Unit tests for complex business logic in services
  2. Integration tests in tests/ directory
  3. Use testcontainer-helium-modules for database tests
  4. Mock external dependencies
  5. Test error handling paths

Documentation Standards

  • Document all public APIs
  • Include examples for complex workflows
  • Maintain this guide as modules evolve
  • Document breaking changes in module changelogs

This modular architecture enables independent development, testing, and deployment of features while maintaining system coherence through standardized patterns and interfaces.

helium-server Crate

The Helium server is designed as a multi-mode worker system that can run different components independently or together, enabling flexible deployment strategies. Each worker mode serves a specific purpose in the overall system architecture.

Architecture

Worker Modes

The server supports six distinct worker modes:

Worker ModePortDescriptionUse Case
grpc50051gRPC API serverMain API for client applications and admin panels
subscribe_api8080RESTful subscription APIPublic subscription endpoints
webhook_api8081RESTful webhook handlerPayment provider callbacks, third-party integrations
consumer-Background message consumerProcessing async tasks from message queue
mailer-Email service workerSending emails and notifications
cron_executor-Scheduled task executorRunning periodic maintenance tasks

Dependencies

The server requires three core infrastructure components:

  • PostgreSQL: Primary database for persistent data
  • Redis: Caching, session storage, and temporary data
  • RabbitMQ (AMQP): Message queuing for async processing

Module Integration

The server integrates all business logic modules:

  • auth: Authentication and authorization
  • shop: E-commerce and billing
  • telecom: VPN node management and traffic handling
  • market: Affiliate and marketing systems
  • notification: Announcements and messaging
  • support: Customer support tickets
  • manage: Administrative functions
  • shield: Security and anti-abuse measures

Deployment Guide

Prerequisites

  • PostgreSQL, Redis, and RabbitMQ servers accessible
  • SQLx CLI: cargo install sqlx-cli --no-default-features --features postgres
  • Environment variables configured (see below)

Environment Configuration

The server is configured entirely through environment variables:

Required Variables

# Worker mode selection
WORK_MODE="grpc"  # or subscribe_api, webhook_api, consumer, mailer, cron_executor

# Database connection
DATABASE_URL="postgres://user:password@localhost/helium_db"

# Message queue connection
MQ_URL="amqp://user:password@localhost:5672/"

# Redis connection
REDIS_URL="redis://localhost:6379"

Optional Variables

# Server listen address (for API workers)
LISTEN_ADDR="0.0.0.0:50051"  # grpc mode default
LISTEN_ADDR="0.0.0.0:8080"   # subscribe_api mode default
LISTEN_ADDR="0.0.0.0:8081"   # webhook_api mode default

# Cron executor scan interval (seconds)
SCAN_INTERVAL="60"  # cron_executor mode only

# OpenTelemetry Collector endpoint (optional, for observability)
OTEL_COLLECTOR="http://otel-collector:4317"  # See Observability guide

Note: For comprehensive observability with distributed tracing and metrics, see the Observability with OpenTelemetry guide.

Database Migration

⚠️ CRITICAL: Database migrations must be run before starting the application.

# Install SQLx CLI
cargo install sqlx-cli --no-default-features --features postgres

# Apply all pending migrations
sqlx migrate run --database-url $DATABASE_URL

# Verify migration status
sqlx migrate info --database-url $DATABASE_URL

Basic Deployment

Running the Server

# Apply database migrations first
sqlx migrate run --database-url $DATABASE_URL

# Start the server with desired worker mode
WORK_MODE=grpc ./helium-server

Multiple Worker Deployment

For production, run different worker modes as separate processes:

# Terminal 1: Main gRPC API
WORK_MODE=grpc ./helium-server

# Terminal 2: Background consumer
WORK_MODE=consumer ./helium-server

# Terminal 3: Email worker
WORK_MODE=mailer ./helium-server

# Terminal 4: Cron jobs
WORK_MODE=cron_executor ./helium-server

Logging

The server uses structured logging:

# Enable debug logging
RUST_LOG=debug ./helium-server

# Production logging (default)
RUST_LOG=info ./helium-server

Developer Guide

Project Structure

server/
├── Cargo.toml              # Dependencies and metadata
├── src/
│   ├── main.rs             # Entry point and startup logic
│   ├── worker/             # Worker mode implementations
│   │   ├── mod.rs          # Worker configuration and dispatch
│   │   ├── grpc.rs         # gRPC server implementation
│   │   ├── consumer.rs     # Background message consumer
│   │   ├── mailer.rs       # Email service worker
│   │   ├── cron_executor.rs # Scheduled task executor
│   │   ├── subscribe_api.rs # Subscription REST API
│   │   └── webhook_api.rs  # Webhook REST API
│   └── hooks/              # Extension points (currently unused)
│       └── mod.rs

Building from Source

# Development build
cd server
cargo build

# Release build (optimized)
cargo build --release

# Run with specific worker mode
WORK_MODE=grpc cargo run

Adding New Worker Modes

  1. Create worker implementation:
// src/worker/new_worker.rs
pub struct NewWorker {
    // worker fields
}

impl NewWorker {
    pub async fn initialize(args: YourArgs) -> anyhow::Result<Self> {
        // initialization logic
    }

    pub async fn run(&self) -> anyhow::Result<()> {
        // worker main loop
    }
}
  1. Add to worker configuration:
// src/worker/mod.rs
pub enum WorkerArgs {
    // existing variants...
    NewWorker(YourArgs),
}

impl WorkerArgs {
    pub fn load_from_env() -> anyhow::Result<Self> {
        match work_mode.as_str() {
            // existing modes...
            "new_worker" => {
                // parse environment variables
                Ok(WorkerArgs::NewWorker(args))
            }
        }
    }

    pub async fn execute_worker(self) -> anyhow::Result<()> {
        match self {
            // existing modes...
            WorkerArgs::NewWorker(args) => {
                let worker = NewWorker::initialize(args).await?;
                worker.run().await
            }
        }
    }
}

gRPC Service Development

The gRPC worker automatically integrates all modules. To add new services:

  1. Implement your service in the appropriate module (e.g., modules/your_module/)

  2. Add to gRPC worker:

// src/worker/grpc.rs
impl GrpcWorker {
    pub async fn initialize(args: GrpcWorkModeArgs) -> Result<Self, anyhow::Error> {
        // ... existing initialization ...

        let your_service = YourService::new(database_processor.clone());

        Ok(Self {
            // ... existing fields ...
            your_service,
        })
    }

    pub fn server_ready(self) -> Router<...> {
        tonic::transport::server::Server::builder()
            // ... existing services ...
            .add_service(YourServiceServer::new(self.your_service))
    }
}

Database Migrations

Database schema is managed through SQLx migrations in the migrations/ directory. When adding new features:

  1. Create migration files:
# Create new migration
sqlx migrate add your_feature_name

# This creates:
# migrations/TIMESTAMP_your_feature_name.up.sql
# migrations/TIMESTAMP_your_feature_name.down.sql
  1. Run migrations:
# Apply migrations
sqlx migrate run --database-url $DATABASE_URL

# Revert last migration
sqlx migrate revert --database-url $DATABASE_URL

Testing

# Run all tests
cargo test

# Run specific module tests
cargo test --package helium-server

# Integration tests with database
DATABASE_URL=postgres://test_db cargo test

Performance Considerations

  • Memory Usage: Each worker typically uses 40-200MB RAM
  • CPU Efficiency: Single-core performance optimized, can handle 1000+ RPS
  • Connection Pooling: Database connections are shared across services
  • Async Processing: All I/O operations are non-blocking

Troubleshooting

Common Issues

Service won’t start:

# Check environment variables
env | grep -E "(DATABASE_URL|MQ_URL|REDIS_URL|WORK_MODE)"

# Verify database migrations are applied
sqlx migrate info --database-url $DATABASE_URL

Database migration issues:

# Check migration status
sqlx migrate info --database-url $DATABASE_URL

# Force apply migrations (if stuck)
sqlx migrate run --database-url $DATABASE_URL

# Revert last migration if needed
sqlx migrate revert --database-url $DATABASE_URL

# Reset database (CAUTION: destroys all data)
sqlx database reset --database-url $DATABASE_URL

Performance issues:

# Enable request tracing
RUST_LOG=helium_server=trace ./helium-server

# Profile with flamegraph
cargo flamegraph --bin helium-server

Logs and Debugging

# Debug logging
RUST_LOG=debug ./helium-server

# Trace specific modules
RUST_LOG=helium_server::worker::grpc=trace,info ./helium-server

Configuration Validation

Ensure all required environment variables are properly set:

# Validate configuration script
#!/bin/bash
set -e

echo "Validating Helium server configuration..."

# Check required variables
: "${WORK_MODE:?WORK_MODE not set}"
: "${DATABASE_URL:?DATABASE_URL not set}"
: "${MQ_URL:?MQ_URL not set}"
: "${REDIS_URL:?REDIS_URL not set}"

# Validate work mode
case "$WORK_MODE" in
  grpc|subscribe_api|webhook_api|consumer|mailer|cron_executor)
    echo "✓ Valid WORK_MODE: $WORK_MODE"
    ;;
  *)
    echo "✗ Invalid WORK_MODE: $WORK_MODE"
    exit 1
    ;;
esac

# Check if migrations are applied
if command -v sqlx >/dev/null 2>&1; then
  if sqlx migrate info --database-url "$DATABASE_URL" | grep -q "pending"; then
    echo "⚠ Warning: Pending database migrations found"
    echo "Run: sqlx migrate run --database-url $DATABASE_URL"
  else
    echo "✓ Database migrations are up to date"
  fi
else
  echo "⚠ Warning: sqlx CLI not found - cannot verify migrations"
  echo "Install with: cargo install sqlx-cli --no-default-features --features postgres"
fi

echo "Configuration validation complete!"

External Dependencies

The Helium system requires several external services to function properly. The Helium application itself runs in Docker containers, but the core infrastructure dependencies (PostgreSQL, Redis, RabbitMQ) should be provisioned as external managed services for production deployments.

While some dependencies are core infrastructure requirements, others are module-specific and may be optional depending on your deployment configuration.

Core Infrastructure Dependencies

These dependencies are required for all Helium deployments:

1. PostgreSQL Database

Purpose: Primary data store for all application data Version: PostgreSQL 12+ recommended Configuration:

  • Environment variable: DATABASE_URL
  • Format: postgres://user:password@host:port/database
  • Example: postgres://helium:password@localhost:5432/helium_db

Database Schema:

  • ⚠️ CRITICAL: SQLx migrations must be run before starting the application
  • All database schema changes are managed through SQLx migrations in the /migrations directory
  • Use sqlx migrate run --database-url $DATABASE_URL to apply migrations

External Service Requirements:

  • NOT containerized - PostgreSQL should run as an external managed service
  • Recommended: Use cloud-managed PostgreSQL (AWS RDS, Google Cloud SQL, Azure Database, etc.)
  • Alternative: Dedicated PostgreSQL server with proper backup and high availability setup

2. Redis

Purpose: Caching, session storage, and configuration store Version: Redis 6+ recommended Configuration:

  • Environment variable: REDIS_URL
  • Format: redis://host:port or redis://user:password@host:port
  • Example: redis://localhost:6379

Usage:

  • Session management and authentication tokens
  • Configuration caching across modules
  • Temporary data storage (OAuth challenges, etc.)

External Service Requirements:

  • NOT containerized - Redis should run as an external managed service
  • Recommended: Use cloud-managed Redis (AWS ElastiCache, Google Memorystore, Azure Cache, etc.)
  • Alternative: Dedicated Redis server with persistence and clustering for production

3. RabbitMQ

Purpose: Message queue for asynchronous processing between modules Version: RabbitMQ 3.8+ recommended Configuration:

  • Environment variable: MQ_URL
  • Format: amqp://user:password@host:port/
  • Example: amqp://helium:password@localhost:5672/

Usage:

  • Inter-module communication
  • Background job processing
  • Event-driven architecture support

External Service Requirements:

  • NOT containerized - RabbitMQ should run as an external managed service
  • Recommended: Use cloud-managed message queues (AWS MQ, Google Cloud Pub/Sub, Azure Service Bus)
  • Alternative: Dedicated RabbitMQ cluster with proper clustering and high availability

Module-Specific Dependencies

These dependencies are required only when using specific modules:

Auth Module - OAuth Providers (Optional)

Purpose: Social authentication (Google, Microsoft, GitHub, Discord) Required: Only if OAuth authentication is enabled Configuration: Stored in database/Redis configuration

Supported Providers:

  • Google OAuth 2.0
  • Microsoft Azure AD
  • GitHub OAuth
  • Discord OAuth

Setup Requirements:

  1. Create OAuth applications with each provider
  2. Configure redirect URIs to your Helium deployment
  3. Store client ID and secret in the system configuration
  4. Configure OAuth provider settings via the management interface

Configuration Structure:

{
  "auth": {
    "oauth_providers": {
      "providers": [
        {
          "name": "google",
          "client_id": "your-client-id",
          "client_secret": "your-client-secret",
          "redirect_uri": "https://your-domain.com/auth/oauth/callback"
        }
      ],
      "challenge_expiration": "5m"
    }
  }
}

Mailer Module - SMTP Server (Required for Email)

Purpose: Email delivery for user notifications, verification, etc. Required: When email functionality is needed Configuration: Stored in database/Redis configuration

SMTP Configuration:

{
  "mailer": {
    "host": "smtp.gmail.com",
    "port": 587,
    "username": "your-email@gmail.com",
    "password": "your-app-password",
    "sender": "noreply@your-domain.com",
    "starttls": true
  }
}

Supported SMTP Features:

  • STARTTLS encryption
  • Plain authentication
  • Custom sender addresses
  • HTML email templates

Common SMTP Providers:

  • Gmail: smtp.gmail.com:587 (requires app passwords)
  • Outlook/Hotmail: smtp-mail.outlook.com:587
  • SendGrid: smtp.sendgrid.net:587
  • Mailgun: smtp.mailgun.org:587
  • Amazon SES: email-smtp.region.amazonaws.com:587

Shop Module - Epay Payment Provider (Required for Payments)

Purpose: Payment processing for e-commerce functionality Required: When payment processing is needed Configuration: Stored in database as epay provider credentials

Epay Provider Setup:

  1. Register with an Epay-compatible payment provider
  2. Obtain merchant credentials (PID, Key, Merchant URL)
  3. Configure webhook endpoints for payment notifications
  4. Add provider credentials via the management interface

Supported Payment Methods:

  • Alipay (alipay)
  • WeChat Pay (wxpay)
  • USDT cryptocurrency (usdt)

Configuration Requirements:

{
  "shop": {
    "epay_notify_url": "https://your-domain.com/api/shop/epay/callback",
    "epay_return_url": "https://your-domain.com/payment/success",
    "max_unpaid_orders": 5,
    "auto_cancel_after": "30m"
  }
}

Epay Provider Database Entry:

INSERT INTO epay_provider_credentials (
  display_name,
  enabled_channels,
  key,
  pid,
  merchant_url
) VALUES (
  'My Payment Provider',
  ['alipay', 'wxpay'],
  'your-merchant-key',
  1234,
  'https://pay.provider.com/submit.php'
);

Development Dependencies

These are required for building and developing the project:

Protocol Buffers Compiler

Purpose: Compiling .proto files for gRPC services Installation:

  • Ubuntu/Debian: apt-get install protobuf-compiler
  • macOS: brew install protobuf
  • Already included in Docker build process

SQLx CLI

Purpose: Database migration management Installation: cargo install sqlx-cli --no-default-features --features postgres Usage:

  • Apply migrations: sqlx migrate run
  • Create new migration: sqlx migrate add <name>

Docker/Kubernetes Deployment Considerations

What Should Be Containerized

✅ Containerize:

  • Helium server application (helium-server)
  • Application-specific components and workers

❌ Do NOT Containerize:

  • PostgreSQL - Use external managed database services
  • Redis - Use external managed cache services
  • RabbitMQ - Use external managed message queue services

Infrastructure Handled by Platform

When deploying with Docker and Kubernetes, these infrastructure concerns are handled by the orchestration platform:

  • Load Balancers: Kubernetes ingress controllers handle load balancing
  • TLS Certificates: cert-manager or similar tools handle SSL/TLS
  • Service Discovery: Kubernetes DNS handles service discovery
  • Health Checks: Kubernetes probes handle application health monitoring
  • Logging: Container runtime and logging drivers handle log aggregation

AWS:

  • PostgreSQL: Amazon RDS for PostgreSQL
  • Redis: Amazon ElastiCache for Redis
  • RabbitMQ: Amazon MQ for RabbitMQ

Google Cloud:

  • PostgreSQL: Cloud SQL for PostgreSQL
  • Redis: Memorystore for Redis
  • RabbitMQ: Cloud Pub/Sub (alternative) or third-party RabbitMQ

Azure:

  • PostgreSQL: Azure Database for PostgreSQL
  • Redis: Azure Cache for Redis
  • RabbitMQ: Azure Service Bus (alternative) or third-party RabbitMQ

Environment Variables for Containers

apiVersion: apps/v1
kind: Deployment
metadata:
  name: helium-server
spec:
  template:
    spec:
      containers:
        - name: helium-server
          image: helium-server:latest
          env:
            - name: WORK_MODE
              value: "grpc"
            - name: DATABASE_URL
              valueFrom:
                secretKeyRef:
                  name: helium-secrets
                  key: database-url
            - name: REDIS_URL
              valueFrom:
                secretKeyRef:
                  name: helium-secrets
                  key: redis-url
            - name: MQ_URL
              valueFrom:
                secretKeyRef:
                  name: helium-secrets
                  key: rabbitmq-url

Security Considerations

Credentials Management

  1. Never store credentials in plain text
  2. Use Kubernetes secrets or similar secure storage
  3. Rotate credentials regularly
  4. Use dedicated service accounts with minimal permissions

Network Security

  1. Database: Restrict access to application subnets only
  2. Redis: Enable authentication and restrict network access
  3. RabbitMQ: Use strong passwords and enable TLS
  4. SMTP: Use app passwords or OAuth tokens when available

OAuth Security

  1. Use HTTPS for all OAuth redirect URIs
  2. Validate redirect URI domains strictly
  3. Use state parameter for CSRF protection (handled automatically)

Troubleshooting

Database Connection Issues

# Test database connectivity
psql $DATABASE_URL -c "SELECT version();"

# Check migration status
sqlx migrate info --database-url $DATABASE_URL

Redis Connection Issues

# Test Redis connectivity
redis-cli -u $REDIS_URL ping

# Check Redis memory usage
redis-cli -u $REDIS_URL info memory

RabbitMQ Connection Issues

# Check queue status
rabbitmqctl list_queues

# Check connection status
rabbitmqctl list_connections

SMTP Testing

The mailer module provides test endpoints and logging to help diagnose SMTP issues. Check application logs for detailed SMTP connection and authentication errors.

Epay Integration Issues

  1. Verify webhook URLs are accessible from the internet
  2. Check payment provider’s callback logs
  3. Ensure merchant credentials are correctly configured
  4. Validate signature verification in callback processing

Optional Observability Stack

OpenTelemetry & Grafana Stack (Optional)

Purpose: Comprehensive observability with distributed tracing, metrics, and log aggregation Required: No - completely optional enhancement Configuration: OTEL_COLLECTOR environment variable

Components:

  • OpenTelemetry Collector: Telemetry data collection and routing
  • Grafana Tempo: Distributed tracing backend
  • Prometheus: Metrics storage and querying
  • Grafana Loki: Log aggregation
  • Grafana: Unified visualization dashboard

When to Use:

  • Production deployments requiring detailed performance analysis
  • Multi-instance deployments needing distributed tracing
  • Teams requiring centralized observability dashboards
  • Troubleshooting complex performance issues

Deployment:

  • NOT containerized with application - Deploy as separate Kubernetes workloads or use Grafana Cloud
  • Recommended: Deploy Grafana stack in dedicated observability namespace
  • Alternative: Use managed services (Grafana Cloud, Datadog, New Relic)

Note: Helium automatically falls back to basic structured logging if OpenTelemetry is not configured. See the comprehensive Observability with OpenTelemetry guide for full setup instructions.

Summary

DependencyRequiredPurposeConfigurationDeployment
PostgreSQLYesPrimary databaseDATABASE_URLExternal managed service
RedisYesCaching/sessionsREDIS_URLExternal managed service
RabbitMQYesMessage queuingMQ_URLExternal managed service
SMTP ServerConditionalEmail deliveryDatabase configExternal service
OAuth ProvidersOptionalSocial authDatabase configExternal providers
Epay ProviderConditionalPayment processingDatabase configExternal service
ObservabilityOptionalTracing & metricsOTEL_COLLECTORExternal stack/cloud

Next Steps: After setting up these dependencies, proceed to the Helium Server Deployment Guide for detailed deployment instructions.

Observability with OpenTelemetry

Helium server includes optional OpenTelemetry (OTel) integration for comprehensive observability. This integration is completely optional — the server will work perfectly fine without it using basic structured logging.

What is OpenTelemetry?

OpenTelemetry provides distributed tracing, metrics collection, and contextual logging for production systems. Use it when:

  • Running multiple worker instances requiring distributed tracing
  • Need detailed performance analysis and troubleshooting
  • Want centralized observability dashboards

Skip it for simple deployments, development environments, or when basic logging is sufficient.

Configuration

Enable OpenTelemetry by setting the OTEL_COLLECTOR environment variable:

export OTEL_COLLECTOR="http://otel-collector:4317"
./helium-server

If not set or initialization fails, the server automatically falls back to basic logging.

Service Names

Each worker mode reports with a distinct service name:

Worker ModeService Name
grpcHelium.grpc
subscribe_apiHelium.subscribe-api
webhook_apiHelium.webhook-api
consumerHelium.consumer
mailerHelium.mailer
cron_executorHelium.cron-executor

For production deployments, we recommend the Grafana observability stack — an open-source, Kubernetes-native solution with unified dashboards for traces, metrics, and logs.

Components

  1. OpenTelemetry Collector: Receives and routes telemetry
  2. Grafana Tempo: Distributed tracing storage
  3. Prometheus: Metrics collection
  4. Grafana Loki: Log aggregation
  5. Grafana: Unified visualization

Deployment

Deploy the Grafana stack alongside your Kubernetes cluster:

1. Add Helm Repositories

helm repo add grafana https://grafana.github.io/helm-charts
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update

2. Create Namespace

kubectl create namespace observability

3. Deploy OpenTelemetry Collector

Create otel-collector-values.yaml:

mode: deployment

config:
  receivers:
    otlp:
      protocols:
        grpc:
          endpoint: 0.0.0.0:4317
        http:
          endpoint: 0.0.0.0:4318

  processors:
    batch:
      timeout: 10s
      send_batch_size: 1024

  exporters:
    # Traces to Tempo
    otlp/tempo:
      endpoint: tempo.observability.svc.cluster.local:4317
      tls:
        insecure: true

    # Metrics to Prometheus
    prometheus:
      endpoint: 0.0.0.0:8889
      namespace: helium

    # Logs to Loki
    loki:
      endpoint: http://loki.observability.svc.cluster.local:3100/loki/api/v1/push

  service:
    pipelines:
      traces:
        receivers: [otlp]
        processors: [batch]
        exporters: [otlp/tempo]

      metrics:
        receivers: [otlp]
        processors: [batch]
        exporters: [prometheus]

      logs:
        receivers: [otlp]
        processors: [batch]
        exporters: [loki]

ports:
  otlp-grpc:
    enabled: true
    containerPort: 4317
    servicePort: 4317
    protocol: TCP
  otlp-http:
    enabled: true
    containerPort: 4318
    servicePort: 4318
    protocol: TCP
  metrics:
    enabled: true
    containerPort: 8889
    servicePort: 8889
    protocol: TCP
helm install otel-collector grafana/opentelemetry-collector \
  --namespace observability \
  --values otel-collector-values.yaml

4. Deploy Tempo, Loki, and Prometheus

# Tempo for traces
helm install tempo grafana/tempo \
  --namespace observability \
  --set tempo.receivers.otlp.protocols.grpc.endpoint=0.0.0.0:4317

# Loki for logs
helm install loki grafana/loki-stack \
  --namespace observability \
  --set loki.enabled=true \
  --set promtail.enabled=false

5. Deploy Prometheus

helm install prometheus prometheus-community/kube-prometheus-stack \
  --namespace observability \
  --set grafana.enabled=false

6. Deploy Grafana

helm install grafana grafana/grafana \
  --namespace observability \
  --set adminPassword=changeme

Configure data sources in Grafana to connect Tempo, Prometheus, and Loki.

Troubleshooting

Server logs show “Failed to initialize OpenTelemetry”

Check that the OTel Collector is reachable at the configured endpoint. The server will automatically fall back to basic logging.

Missing traces in Grafana

Verify the data pipeline: Helium → OTel Collector → Tempo. Check logs at each stage.

Performance impact

OpenTelemetry adds minimal overhead: < 2% CPU, ~10-20MB memory, < 1ms latency per request.

Disabling OpenTelemetry

Simply unset the OTEL_COLLECTOR variable — the server automatically falls back to basic logging.

Summary

OpenTelemetry in Helium is completely optional:

  • Set OTEL_COLLECTOR to enable, leave unset to use basic logging
  • Automatic fallback if initialization fails
  • Recommended for production with multiple instances
  • Grafana stack provides open-source, Kubernetes-native observability

For detailed Helm deployment configurations, refer to the official Grafana Helm charts documentation.

Health Checks for Kubernetes

Helium server provides HTTP health check endpoints designed for Kubernetes liveness and readiness probes. These endpoints run on a separate internal port (default: 9090) and are enabled for all worker modes.

Overview

Health checks help Kubernetes determine:

  • Liveness: Is the container alive and should it be restarted if it becomes unresponsive?
  • Readiness: Is the container ready to handle requests?

Helium implements both probe types on a dedicated HTTP server that runs alongside each worker mode.

Endpoints

Liveness Probe: /healthz

Returns 200 OK with a JSON response if the server is running:

{
  "status": "ok"
}

This endpoint always returns success if the health check server is responding. Kubernetes uses this to determine if the container should be restarted.

Readiness Probe: /readyz

Checks connectivity to all dependencies before returning status:

Success Response (200 OK):

{
  "status": "ok",
  "database": "ok",
  "redis": "ok",
  "rabbitmq": "ok"
}

Failure Response (503 Service Unavailable):

{
  "status": "error",
  "database": "ok",
  "redis": "error",
  "rabbitmq": "ok",
  "error": "Redis error: Connection refused"
}

The readiness probe checks:

  • PostgreSQL: Executes a simple query (SELECT 1)
  • Redis: Sends a PING command
  • RabbitMQ: Validates connection pool status

All worker modes check the same three dependencies.

Configuration

Health Check Port

Set the HEALTH_CHECK_PORT environment variable to customize the port (default: 9090):

export HEALTH_CHECK_PORT=9090

This port should be:

  • Internal only: Not exposed to external traffic
  • Accessible by Kubernetes: For probe requests
  • Different from main service ports: To avoid conflicts

Worker Modes

Health checks are available in all worker modes:

Worker ModeMain PortHealth Check PortDependencies Checked
grpc500519090Database, Redis, RabbitMQ
subscribe_api80809090Database, Redis, RabbitMQ
webhook_api80819090Database, Redis, RabbitMQ
consumerN/A9090Database, Redis, RabbitMQ
mailerN/A9090Database, Redis, RabbitMQ
cron_executorN/A9090Database, Redis, RabbitMQ

Kubernetes Deployment

Example Pod Configuration

Here’s how to configure health checks in your Kubernetes deployment:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: helium-grpc
spec:
  replicas: 3
  selector:
    matchLabels:
      app: helium-grpc
  template:
    metadata:
      labels:
        app: helium-grpc
    spec:
      containers:
      - name: helium-grpc
        image: helium-server:latest
        env:
        - name: WORK_MODE
          value: "grpc"
        - name: LISTEN_ADDR
          value: "0.0.0.0:50051"
        - name: HEALTH_CHECK_PORT
          value: "9090"
        - name: DATABASE_URL
          valueFrom:
            secretKeyRef:
              name: helium-secrets
              key: database-url
        - name: REDIS_URL
          valueFrom:
            secretKeyRef:
              name: helium-secrets
              key: redis-url
        - name: MQ_URL
          valueFrom:
            secretKeyRef:
              name: helium-secrets
              key: mq-url
        ports:
        - name: grpc
          containerPort: 50051
          protocol: TCP
        - name: health
          containerPort: 9090
          protocol: TCP
        livenessProbe:
          httpGet:
            path: /healthz
            port: health
          initialDelaySeconds: 10
          periodSeconds: 10
          timeoutSeconds: 5
          failureThreshold: 3
        readinessProbe:
          httpGet:
            path: /readyz
            port: health
          initialDelaySeconds: 5
          periodSeconds: 5
          timeoutSeconds: 5
          failureThreshold: 3

Probe Configuration Guidelines

Liveness Probe:

  • initialDelaySeconds: 10-30 seconds (allow time for startup)
  • periodSeconds: 10-30 seconds (check periodically)
  • timeoutSeconds: 5 seconds
  • failureThreshold: 3 (restart after 3 consecutive failures)

Readiness Probe:

  • initialDelaySeconds: 5-10 seconds (faster than liveness)
  • periodSeconds: 5-10 seconds (check more frequently)
  • timeoutSeconds: 5 seconds
  • failureThreshold: 3 (mark unready after 3 failures)

Service Configuration

For API worker modes (grpc, subscribe_api, webhook_api), configure a Service:

apiVersion: v1
kind: Service
metadata:
  name: helium-grpc
spec:
  type: ClusterIP
  ports:
  - name: grpc
    port: 50051
    targetPort: grpc
    protocol: TCP
  selector:
    app: helium-grpc

Note: The health check port (9090) is not exposed in the Service. It’s only for Kubernetes probes.

Worker Mode Behavior

API Modes (grpc, subscribe_api, webhook_api)

For API modes, the health check server runs alongside the main API server:

  • When the main server exits, the health check server is immediately terminated
  • Process exits when either server fails
  • Ensures no “zombie” containers serving health checks without handling requests

Background Worker Modes (consumer, mailer, cron_executor)

For background workers, the health check server runs continuously:

  • Liveness probe confirms the worker process is alive
  • Readiness probe ensures dependencies are accessible
  • Worker loops indefinitely alongside health check server

Troubleshooting

Health Check Server Not Starting

Symptom: Probes fail immediately with connection errors

Solutions:

  1. Check logs for health check server errors
  2. Verify HEALTH_CHECK_PORT is not already in use
  3. Ensure the port is accessible within the pod

Readiness Probe Failing

Symptom: Pod remains in “Not Ready” state

Solutions:

  1. Check which dependency is failing in the /readyz response
  2. Verify connection strings (DATABASE_URL, REDIS_URL, MQ_URL)
  3. Ensure network policies allow pod access to dependencies
  4. Check if dependencies are healthy

Example debugging:

# Forward health check port to local machine
kubectl port-forward pod/helium-grpc-xyz 9090:9090

# Check readiness endpoint
curl http://localhost:9090/readyz

Liveness Probe Causing Restart Loop

Symptom: Pod repeatedly restarts with liveness probe failures

Solutions:

  1. Increase initialDelaySeconds (worker may need more startup time)
  2. Increase failureThreshold (allow more failures before restart)
  3. Check if worker is deadlocked or stuck (examine logs before restart)

Worker Exits But Pod Stays Running

Symptom: Container appears healthy but doesn’t process requests

This should not happen with the current implementation:

  • API workers: Health check is aborted when main server exits
  • Background workers: Return from execute_worker() causes process exit

If this occurs, file a bug report.

Security Considerations

Port Exposure

The health check port (9090) should never be exposed externally:

  • Don’t create Ingress rules for health check endpoints
  • Don’t expose the health check port in the Service definition
  • Use network policies to restrict access to Kubernetes control plane only

Sensitive Information

Health check responses contain minimal information:

  • No version numbers
  • No internal IPs or hostnames
  • No authentication tokens
  • Only dependency status (ok/error)

Error messages may contain connection details. Ensure logs are secured appropriately.

Best Practices

  1. Use separate ports: Never combine health checks with main service endpoints
  2. Set appropriate timeouts: Balance between quick detection and false positives
  3. Monitor probe metrics: Track probe success rates in your observability stack
  4. Test locally: Use port-forwarding to verify health checks before deployment
  5. Align with dependencies: If using a sidecar proxy (Istio, Linkerd), configure startup probes

Summary

Helium’s health check endpoints provide robust Kubernetes integration:

  • Liveness probe (/healthz): Detects unresponsive containers
  • Readiness probe (/readyz): Ensures dependencies are healthy
  • Separate port (default 9090): Isolated from main services
  • All worker modes: Consistent behavior across deployment types
  • Process lifecycle: Ensures clean exits, no zombie containers

Configure these probes in your Kubernetes deployments to enable automatic recovery and load balancing.

Docker-based Deployment

The Helium system is designed with a multi-worker architecture that can be deployed using containers. Each worker type serves a specific purpose and has different scaling requirements. This deployment approach provides:

  • Scalability: Independent scaling of different worker types based on load
  • Reliability: Fault isolation between different services
  • Flexibility: Easy deployment across different environments
  • Maintainability: Simplified updates and rollbacks

Prerequisites

Before proceeding with this guide, ensure you have:

  • External dependencies configured (see External Dependencies)
  • Docker or container runtime installed
  • Kubernetes cluster (for Kubernetes deployment)
  • Basic understanding of containerization concepts

Container Architecture

Worker Types and Scaling Patterns

The Helium server supports six distinct worker modes, each with specific scaling characteristics:

Worker ModePortScalingDescription
grpc50051✅ HorizontalMain gRPC API server - can be load balanced
subscribe_api8080✅ HorizontalRESTful subscription API - can be load balanced
webhook_api8081✅ HorizontalWebhook handler for payments - can be load balanced
consumer-✅ HorizontalBackground message consumer - multiple instances supported
mailer-⚠️ Single preferredEmail service - not recommended >1 instance
cron_executor-🚫 Single onlyScheduled tasks - MUST be exactly 1 instance

Scaling Constraints

⚠️ Critical Scaling Limitations

mailer Worker:

  • Recommendation: Deploy as single instance only
  • Reason: Relies on SMTP server connections and may cause email delivery issues with multiple instances
  • Impact: Multiple mailer instances can lead to duplicate emails or SMTP rate limiting

cron_executor Worker:

  • Requirement: MUST have exactly one instance
  • Reason: Scans the database to check for scheduled tasks in the queue
  • Impact: Multiple instances will cause duplicate task execution and potential data corruption

✅ Scalable Workers

API Workers (grpc, subscribe_api, webhook_api):

  • Can be horizontally scaled based on traffic demands
  • Support standard load balancing techniques
  • Share state through external Redis and PostgreSQL

consumer Worker:

  • Can run multiple instances for processing message queues
  • Automatically distributes work through RabbitMQ

Docker Image

Building the Docker Image

The project includes a multi-stage Dockerfile optimized for production:

# Build the Docker image
docker build -t helium-server:latest .

# Tag for registry
docker tag helium-server:latest your-registry/helium-server:v1.0.0

# Push to registry
docker push your-registry/helium-server:v1.0.0

Image Characteristics

  • Base Image: gcr.io/distroless/cc for minimal attack surface
  • Size: ~50MB final image
  • Architecture: Multi-arch support (amd64, arm64)
  • Security: Non-root user, minimal dependencies

Environment Variables

Configure containers using these environment variables:

# Required - Worker mode selection
WORK_MODE=grpc  # grpc, subscribe_api, webhook_api, consumer, mailer, cron_executor

# Required - Database connections
DATABASE_URL=postgres://user:password@postgres-host:5432/helium_db
REDIS_URL=redis://redis-host:6379
MQ_URL=amqp://user:password@rabbitmq-host:5672/

# Optional - Server configuration
LISTEN_ADDR=0.0.0.0:50051  # For API workers
SCAN_INTERVAL=60           # For cron_executor only
RUST_LOG=info             # Logging level

Docker Compose Deployment

For development or simple production setups:

version: "3.8"

services:
  # Main gRPC API (scalable)
  helium-grpc:
    image: helium-server:latest
    ports:
      - "50051:50051"
    environment:
      WORK_MODE: grpc
      DATABASE_URL: postgres://helium:password@postgres:5432/helium_db
      REDIS_URL: redis://redis:6379
      MQ_URL: amqp://helium:password@rabbitmq:5672/
      LISTEN_ADDR: 0.0.0.0:50051
    depends_on:
      - postgres
      - redis
      - rabbitmq
    restart: unless-stopped
    deploy:
      replicas: 2 # Can be scaled horizontally

  # Subscription API (scalable)
  helium-subscribe-api:
    image: helium-server:latest
    ports:
      - "8080:8080"
    environment:
      WORK_MODE: subscribe_api
      DATABASE_URL: postgres://helium:password@postgres:5432/helium_db
      REDIS_URL: redis://redis:6379
      MQ_URL: amqp://helium:password@rabbitmq:5672/
      LISTEN_ADDR: 0.0.0.0:8080
    depends_on:
      - postgres
      - redis
      - rabbitmq
    restart: unless-stopped
    deploy:
      replicas: 2 # Can be scaled horizontally

  # Webhook API (scalable)
  helium-webhook-api:
    image: helium-server:latest
    ports:
      - "8081:8081"
    environment:
      WORK_MODE: webhook_api
      DATABASE_URL: postgres://helium:password@postgres:5432/helium_db
      REDIS_URL: redis://redis:6379
      MQ_URL: amqp://helium:password@rabbitmq:5672/
      LISTEN_ADDR: 0.0.0.0:8081
    depends_on:
      - postgres
      - redis
      - rabbitmq
    restart: unless-stopped
    deploy:
      replicas: 2 # Can be scaled horizontally

  # Background consumer (scalable)
  helium-consumer:
    image: helium-server:latest
    environment:
      WORK_MODE: consumer
      DATABASE_URL: postgres://helium:password@postgres:5432/helium_db
      REDIS_URL: redis://redis:6379
      MQ_URL: amqp://helium:password@rabbitmq:5672/
    depends_on:
      - postgres
      - redis
      - rabbitmq
    restart: unless-stopped
    deploy:
      replicas: 3 # Can run multiple instances

  # Mailer service (single instance recommended)
  helium-mailer:
    image: helium-server:latest
    environment:
      WORK_MODE: mailer
      DATABASE_URL: postgres://helium:password@postgres:5432/helium_db
      REDIS_URL: redis://redis:6379
      MQ_URL: amqp://helium:password@rabbitmq:5672/
    depends_on:
      - postgres
      - redis
      - rabbitmq
    restart: unless-stopped
    deploy:
      replicas: 1 # SINGLE INSTANCE ONLY

  # Cron executor (must be single instance)
  helium-cron:
    image: helium-server:latest
    environment:
      WORK_MODE: cron_executor
      DATABASE_URL: postgres://helium:password@postgres:5432/helium_db
      REDIS_URL: redis://redis:6379
      MQ_URL: amqp://helium:password@rabbitmq:5672/
      SCAN_INTERVAL: 60
    depends_on:
      - postgres
      - redis
      - rabbitmq
    restart: unless-stopped
    deploy:
      replicas: 1 # MUST BE EXACTLY 1

  # External dependencies (for development only)
  postgres:
    image: postgres:15
    environment:
      POSTGRES_USER: helium
      POSTGRES_PASSWORD: password
      POSTGRES_DB: helium_db
    volumes:
      - postgres_data:/var/lib/postgresql/data
    ports:
      - "5432:5432"

  redis:
    image: redis:7
    ports:
      - "6379:6379"
    volumes:
      - redis_data:/data

  rabbitmq:
    image: rabbitmq:3-management
    environment:
      RABBITMQ_DEFAULT_USER: helium
      RABBITMQ_DEFAULT_PASS: password
    ports:
      - "5672:5672"
      - "15672:15672"
    volumes:
      - rabbitmq_data:/var/lib/rabbitmq

volumes:
  postgres_data:
  redis_data:
  rabbitmq_data:

Kubernetes Deployment

For production Kubernetes deployments:

Namespace and ConfigMap

apiVersion: v1
kind: Namespace
metadata:
  name: helium-system
---
apiVersion: v1
kind: ConfigMap
metadata:
  name: helium-config
  namespace: helium-system
data:
  RUST_LOG: "info"
  SCAN_INTERVAL: "60"

Secrets

apiVersion: v1
kind: Secret
metadata:
  name: helium-secrets
  namespace: helium-system
type: Opaque
stringData:
  database-url: "postgres://helium:password@postgres-service:5432/helium_db"
  redis-url: "redis://redis-service:6379"
  rabbitmq-url: "amqp://helium:password@rabbitmq-service:5672/"

gRPC API Deployment (Scalable)

apiVersion: apps/v1
kind: Deployment
metadata:
  name: helium-grpc
  namespace: helium-system
spec:
  replicas: 3 # Can be scaled horizontally
  selector:
    matchLabels:
      app: helium-grpc
  template:
    metadata:
      labels:
        app: helium-grpc
    spec:
      containers:
        - name: helium-server
          image: your-registry/helium-server:v1.0.0
          ports:
            - containerPort: 50051
          env:
            - name: WORK_MODE
              value: "grpc"
            - name: LISTEN_ADDR
              value: "0.0.0.0:50051"
            - name: DATABASE_URL
              valueFrom:
                secretKeyRef:
                  name: helium-secrets
                  key: database-url
            - name: REDIS_URL
              valueFrom:
                secretKeyRef:
                  name: helium-secrets
                  key: redis-url
            - name: MQ_URL
              valueFrom:
                secretKeyRef:
                  name: helium-secrets
                  key: rabbitmq-url
          envFrom:
            - configMapRef:
                name: helium-config
          resources:
            requests:
              memory: "128Mi"
              cpu: "100m"
            limits:
              memory: "512Mi"
              cpu: "500m"
          livenessProbe:
            tcpSocket:
              port: 50051
            initialDelaySeconds: 30
            periodSeconds: 10
          readinessProbe:
            tcpSocket:
              port: 50051
            initialDelaySeconds: 5
            periodSeconds: 5
---
apiVersion: v1
kind: Service
metadata:
  name: helium-grpc-service
  namespace: helium-system
spec:
  selector:
    app: helium-grpc
  ports:
    - port: 50051
      targetPort: 50051
  type: ClusterIP

Consumer Deployment (Scalable)

apiVersion: apps/v1
kind: Deployment
metadata:
  name: helium-consumer
  namespace: helium-system
spec:
  replicas: 3 # Can run multiple instances
  selector:
    matchLabels:
      app: helium-consumer
  template:
    metadata:
      labels:
        app: helium-consumer
    spec:
      containers:
        - name: helium-server
          image: your-registry/helium-server:v1.0.0
          env:
            - name: WORK_MODE
              value: "consumer"
            - name: DATABASE_URL
              valueFrom:
                secretKeyRef:
                  name: helium-secrets
                  key: database-url
            - name: REDIS_URL
              valueFrom:
                secretKeyRef:
                  name: helium-secrets
                  key: redis-url
            - name: MQ_URL
              valueFrom:
                secretKeyRef:
                  name: helium-secrets
                  key: rabbitmq-url
          envFrom:
            - configMapRef:
                name: helium-config
          resources:
            requests:
              memory: "256Mi"
              cpu: "200m"
            limits:
              memory: "1Gi"
              cpu: "1000m"
          livenessProbe:
            exec:
              command:
                - /bin/sh
                - -c
                - "ps aux | grep helium-server | grep -v grep"
            initialDelaySeconds: 30
            periodSeconds: 30

Mailer Deployment (Single Instance)

apiVersion: apps/v1
kind: Deployment
metadata:
  name: helium-mailer
  namespace: helium-system
spec:
  replicas: 1 # SINGLE INSTANCE ONLY
  strategy:
    type: Recreate # Prevent multiple instances during updates
  selector:
    matchLabels:
      app: helium-mailer
  template:
    metadata:
      labels:
        app: helium-mailer
    spec:
      containers:
        - name: helium-server
          image: your-registry/helium-server:v1.0.0
          env:
            - name: WORK_MODE
              value: "mailer"
            - name: DATABASE_URL
              valueFrom:
                secretKeyRef:
                  name: helium-secrets
                  key: database-url
            - name: REDIS_URL
              valueFrom:
                secretKeyRef:
                  name: helium-secrets
                  key: redis-url
            - name: MQ_URL
              valueFrom:
                secretKeyRef:
                  name: helium-secrets
                  key: rabbitmq-url
          envFrom:
            - configMapRef:
                name: helium-config
          resources:
            requests:
              memory: "128Mi"
              cpu: "100m"
            limits:
              memory: "512Mi"
              cpu: "500m"

Cron Executor Deployment (Singleton)

apiVersion: apps/v1
kind: Deployment
metadata:
  name: helium-cron
  namespace: helium-system
spec:
  replicas: 1 # MUST BE EXACTLY 1
  strategy:
    type: Recreate # Ensure no overlap during updates
  selector:
    matchLabels:
      app: helium-cron
  template:
    metadata:
      labels:
        app: helium-cron
    spec:
      containers:
        - name: helium-server
          image: your-registry/helium-server:v1.0.0
          env:
            - name: WORK_MODE
              value: "cron_executor"
            - name: DATABASE_URL
              valueFrom:
                secretKeyRef:
                  name: helium-secrets
                  key: database-url
            - name: REDIS_URL
              valueFrom:
                secretKeyRef:
                  name: helium-secrets
                  key: redis-url
            - name: MQ_URL
              valueFrom:
                secretKeyRef:
                  name: helium-secrets
                  key: rabbitmq-url
            - name: SCAN_INTERVAL
              value: "60"
          envFrom:
            - configMapRef:
                name: helium-config
          resources:
            requests:
              memory: "128Mi"
              cpu: "50m"
            limits:
              memory: "256Mi"
              cpu: "200m"
          livenessProbe:
            exec:
              command:
                - /bin/sh
                - -c
                - "ps aux | grep helium-server | grep -v grep"
            initialDelaySeconds: 60
            periodSeconds: 30

Horizontal Pod Autoscaler (HPA)

For scalable workers, configure automatic scaling:

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: helium-grpc-hpa
  namespace: helium-system
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: helium-grpc
  minReplicas: 2
  maxReplicas: 10
  metrics:
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 70
    - type: Resource
      resource:
        name: memory
        target:
          type: Utilization
          averageUtilization: 80

Load Balancer Configuration

Ingress for API Services

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: helium-ingress
  namespace: helium-system
  annotations:
    nginx.ingress.kubernetes.io/backend-protocol: "GRPC"
    cert-manager.io/cluster-issuer: "letsencrypt-prod"
spec:
  tls:
    - hosts:
        - api.your-domain.com
      secretName: helium-tls
  rules:
    - host: api.your-domain.com
      http:
        paths:
          - path: /
            pathType: Prefix
            backend:
              service:
                name: helium-grpc-service
                port:
                  number: 50051

Service Mesh Configuration

For advanced deployments with service mesh (Istio):

apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
  name: helium-grpc-vs
  namespace: helium-system
spec:
  hosts:
    - api.your-domain.com
  gateways:
    - helium-gateway
  http:
    - match:
        - uri:
            prefix: /
      route:
        - destination:
            host: helium-grpc-service
            port:
              number: 50051
          weight: 100
      fault:
        delay:
          percentage:
            value: 0.1
          fixedDelay: 5s

Database Migration

Database migrations must be run before starting any workers:

Migration Job

apiVersion: batch/v1
kind: Job
metadata:
  name: helium-migration
  namespace: helium-system
spec:
  template:
    spec:
      containers:
        - name: migration
          image: your-registry/helium-server:v1.0.0
          command: ["sqlx", "migrate", "run"]
          env:
            - name: DATABASE_URL
              valueFrom:
                secretKeyRef:
                  name: helium-secrets
                  key: database-url
      restartPolicy: Never
  backoffLimit: 3

Init Container for Workers

Add to all worker deployments:

spec:
  template:
    spec:
      initContainers:
        - name: wait-for-migration
          image: postgres:15
          command:
            [
              "sh",
              "-c",
              "until pg_isready -h postgres-service -p 5432; do echo waiting for database; sleep 2; done;",
            ]
          env:
            - name: DATABASE_URL
              valueFrom:
                secretKeyRef:
                  name: helium-secrets
                  key: database-url

Monitoring and Observability

Health Checks

Configure appropriate health checks for each worker type:

# For API workers (gRPC, REST)
livenessProbe:
  tcpSocket:
    port: 50051
  initialDelaySeconds: 30
  periodSeconds: 10

# For background workers (consumer, mailer, cron)
livenessProbe:
  exec:
    command:
    - /bin/sh
    - -c
    - "ps aux | grep helium-server | grep -v grep"
  initialDelaySeconds: 30
  periodSeconds: 30

Logging Configuration

env:
  - name: RUST_LOG
    value: "info,helium_server=debug" # Adjust as needed

Metrics Collection

Use Prometheus for metrics collection:

apiVersion: v1
kind: Service
metadata:
  name: helium-metrics
  namespace: helium-system
  annotations:
    prometheus.io/scrape: "true"
    prometheus.io/port: "8080"
    prometheus.io/path: "/metrics"
spec:
  selector:
    app: helium-grpc
  ports:
    - port: 8080
      name: metrics

Troubleshooting

Common Issues

Pod Crash Loop:

# Check logs
kubectl logs -n helium-system deployment/helium-grpc

# Check events
kubectl get events -n helium-system --sort-by='.metadata.creationTimestamp'

# Verify environment variables
kubectl exec -n helium-system deployment/helium-grpc -- env | grep -E "(DATABASE_URL|REDIS_URL|MQ_URL)"

Multiple Cron Executors:

# Check for multiple cron instances (should show only 1)
kubectl get pods -n helium-system -l app=helium-cron

# Check cron logs for conflicts
kubectl logs -n helium-system -l app=helium-cron --tail=100

Database Connection Issues:

# Test database connectivity
kubectl run -i --tty --rm debug --image=postgres:15 --restart=Never -- \
  psql postgresql://user:password@postgres-service:5432/helium_db -c "SELECT version();"

# Check migration status
kubectl exec -n helium-system deployment/helium-grpc -- \
  sqlx migrate info --database-url $DATABASE_URL

Performance Tuning

Resource Limits:

  • API workers: 200-500m CPU, 256Mi-1Gi RAM per pod
  • Consumer workers: 500m-1 CPU, 512Mi-2Gi RAM per pod
  • Mailer/Cron: 100-200m CPU, 128-512Mi RAM per pod

Scaling Guidelines:

  • Start with 2-3 replicas for API workers
  • Scale consumers based on message queue depth
  • Monitor CPU/memory usage and adjust limits accordingly

External Dependencies

Refer to the External Dependencies Guide for detailed information about:

  • PostgreSQL setup and configuration
  • Redis configuration and clustering
  • RabbitMQ setup and management
  • SMTP server configuration
  • OAuth provider setup
  • Payment provider integration

Configuration Management

Refer to the Configuration Guide for:

  • Environment variable reference
  • Configuration file formats
  • Runtime configuration updates
  • Security best practices

Next Steps

After successful deployment:

  1. Configure monitoring and alerting
  2. Set up backup procedures for stateful data
  3. Implement CI/CD pipelines for automated deployments
  4. Configure log aggregation and analysis
  5. Plan disaster recovery procedures

For specific configuration details, see the Helium Server Configuration guide.

Configuration Guide

This document provides comprehensive configuration information for operators deploying the Helium project. The system uses a combination of environment variables for server configuration and JSON configurations stored in the database for module-specific settings.

Table of Contents

Environment Variables

The Helium server is configured entirely through environment variables. These control the server behavior and connectivity to external services.

Required Environment Variables

All worker modes require these variables:

# Worker mode selection (REQUIRED)
WORK_MODE="grpc"  # Options: grpc, subscribe_api, webhook_api, consumer, mailer, cron_executor

# Database connection (REQUIRED)
DATABASE_URL="postgres://user:password@localhost:5432/helium_db"

# Redis connection (REQUIRED)
REDIS_URL="redis://localhost:6379"

# RabbitMQ connection (REQUIRED)
MQ_URL="amqp://user:password@localhost:5672/"

Worker Mode Options

Worker ModePortDescriptionUse Case
grpc50051gRPC API serverMain API for client applications and admin panels
subscribe_api8080RESTful subscription APIPublic subscription endpoints
webhook_api8081RESTful webhook handlerPayment provider callbacks, third-party integrations
consumer-Background message consumerProcessing async tasks from message queue
mailer-Email service workerSending emails and notifications
cron_executor-Scheduled task executorRunning periodic maintenance tasks

Optional Environment Variables

# Server listen addresses (for API workers)
LISTEN_ADDR="0.0.0.0:50051"  # Default for grpc mode
LISTEN_ADDR="0.0.0.0:8080"   # Default for subscribe_api mode
LISTEN_ADDR="0.0.0.0:8081"   # Default for webhook_api mode

# Cron executor configuration
SCAN_INTERVAL="60"  # Scan interval in seconds (cron_executor mode only)

# Logging configuration
RUST_LOG="info"  # Options: error, warn, info, debug, trace

Module Configurations

Module configurations are stored as JSON in the PostgreSQL database in the application__config table. Each module has its own configuration key and JSON structure.

Note: All duration values are represented as strings containing the number of seconds (e.g., "300" for 5 minutes, "1800" for 30 minutes).

Auth Module (auth)

Key: "auth"

The authentication module handles user registration, login, JWT tokens, and OAuth providers.

{
  "email_provider": {
    "register_domain": {
      "enable_white_list": false,
      "white_list": [],
      "enable_black_list": false,
      "black_list": []
    },
    "otp_expire_after": "300",
    "delete_otp_before": "7200",
    "magic_link_expire_after": "1800",
    "magic_link_delete_before": "14400",
    "resend_interval": "30"
  },
  "jwt": {
    "secret": "your-jwt-secret-key-32-characters-long",
    "refresh_token_expiration": "2592000",
    "access_token_expiration": "900",
    "issuer": "https://your-domain.com",
    "access_audience": "helium_cloud",
    "refresh_audience": "helium_cloud_auth"
  },
  "oauth_providers": {
    "providers": [
      {
        "name": "Google",
        "client_id": "your-google-client-id",
        "client_secret": "your-google-client-secret",
        "redirect_uri": "https://your-domain.com/auth/oauth/google/callback"
      },
      {
        "name": "GitHub",
        "client_id": "your-github-client-id",
        "client_secret": "your-github-client-secret",
        "redirect_uri": "https://your-domain.com/auth/oauth/github/callback"
      }
    ],
    "challenge_expiration": "300"
  },
}

Configuration Details:

  • email_provider.register_domain: Controls which email domains are allowed for registration
  • email_provider.otp_expire_after: How long OTP codes remain valid (in seconds, default: “300” = 5 minutes)
  • email_provider.resend_interval: Minimum time between resend attempts (in seconds, default: “30” = 30 seconds)
  • jwt.secret: CRITICAL: Must be a secure random string for production
  • jwt.*_expiration: Token lifetime settings (in seconds, default: “2592000” = 30 days for refresh, “900” = 15 minutes for access)
  • oauth_providers.providers: List of OAuth providers with their credentials

Telecom Module (telecom)

Key: "telecom"

The telecom module manages VPN nodes, subscription links, and proxy synchronization.

{
  "node_health_check": {
    "offline_timeout": "600"
  },
  "subscribe_link": {
    "endpoints": [
      {
        "url_template": "https://subscribe.your-domain.com/subscribe/{SUBSCRIBE_TOKEN}",
        "endpoint_name": "primary"
      },
      {
        "url_template": "https://backup.your-domain.com/subscribe/{SUBSCRIBE_TOKEN}",
        "endpoint_name": "backup"
      }
    ]
  },
  "uni_proxy_sync": {
    "push_interval": "30",
    "pull_interval": "60"
  },
  "vpn_server_token": "secure-random-token-for-vpn-servers"
}

Configuration Details:

  • node_health_check.offline_timeout: Time before marking nodes as offline (in seconds, default: “600” = 10 minutes)
  • subscribe_link.endpoints: List of subscription endpoints for client configuration
  • uni_proxy_sync.push_interval: How often to push traffic data (in seconds, default: “30” = 30 seconds)
  • uni_proxy_sync.pull_interval: How often to pull user info (in seconds, default: “60” = 1 minute)
  • vpn_server_token: CRITICAL: Secure token for VPN server authentication

Shop Module (shop)

Key: "shop"

The shop module handles e-commerce functionality, orders, and payment processing.

{
  "max_unpaid_orders": 5,
  "auto_cancel_after": "1800",
  "epay_notify_url": "https://your-domain.com/api/webhook/epay/notify",
  "epay_return_url": "https://your-domain.com/payment/success"
}

Configuration Details:

  • max_unpaid_orders: Maximum unpaid orders per user (default: 5)
  • auto_cancel_after: Time before auto-canceling unpaid orders (in seconds, default: “1800” = 30 minutes)
  • epay_notify_url: REQUIRED: Server-to-server notification endpoint for payment providers
  • epay_return_url: REQUIRED: User return URL after payment completion

Mailer Module (mailer)

Key: "mailer"

The mailer module handles email delivery through SMTP.

{
  "host": "smtp.gmail.com",
  "port": 587,
  "username": "your-smtp-username",
  "password": "your-smtp-password",
  "sender": "noreply@your-domain.com",
  "starttls": true
}

Configuration Details:

  • host: SMTP server hostname
  • port: SMTP server port (typically 587 for STARTTLS, 465 for SSL)
  • username/password: SMTP authentication credentials
  • sender: Email address used as sender
  • starttls: Enable STARTTLS encryption (recommended: true)

Admin Management Module (admin-jwt)

Key: "admin-jwt"

Controls JWT tokens for administrative access.

{
  "secret": "admin-jwt-secret-key-32-characters-long",
  "token_expiration": "864000",
  "issuer": "https://admin.your-domain.com",
  "audience": "HeliumAdmin"
}

Configuration Details:

  • secret: CRITICAL: Secure secret for admin JWT signing
  • token_expiration: Admin token lifetime (in seconds, default: “864000” = 10 days)
  • issuer: JWT issuer for admin tokens
  • audience: JWT audience for admin tokens

Market Module (affiliate)

Key: "affiliate"

Controls the affiliate marketing system.

{
  "max_invite_code_per_user": 10,
  "default_reward_rate": "0.1",
  "default_trigger_time_per_user": 3
}

Configuration Details:

  • max_invite_code_per_user: Maximum invite codes per user (default: 10)
  • default_reward_rate: Default affiliate commission rate (default: 10%)
  • default_trigger_time_per_user: Required referrals before earning (default: 3)

Infrastructure Dependencies

PostgreSQL Database

Required Version: PostgreSQL 12+

Configuration:

  • Environment variable: DATABASE_URL
  • Format: postgres://user:password@host:port/database

Important Notes:

  • ⚠️ CRITICAL: Run migrations before starting: sqlx migrate run --database-url $DATABASE_URL
  • Use external managed PostgreSQL service for production (AWS RDS, Google Cloud SQL, etc.)
  • Ensure proper backup and high availability configuration

Redis

Required Version: Redis 6+

Configuration:

  • Environment variable: REDIS_URL
  • Format: redis://host:port or redis://user:password@host:port

Usage:

  • Session storage and authentication tokens
  • Module configuration caching
  • Temporary data (OAuth challenges, OTP codes)

RabbitMQ (AMQP)

Configuration:

  • Environment variable: MQ_URL
  • Format: amqp://user:password@host:port/

Usage:

  • Asynchronous task processing
  • Email sending queue
  • Inter-module communication

Configuration Templates

Development Environment

# .env file for development
WORK_MODE=grpc
DATABASE_URL=postgres://helium:password@localhost:5432/helium_dev
REDIS_URL=redis://localhost:6379
MQ_URL=amqp://guest:guest@localhost:5672/
LISTEN_ADDR=0.0.0.0:50051
RUST_LOG=debug

Production Environment

# Production environment variables
WORK_MODE=grpc
DATABASE_URL=postgres://helium_user:secure_password@db.example.com:5432/helium_prod
REDIS_URL=redis://redis.example.com:6379
MQ_URL=amqp://helium_user:secure_password@mq.example.com:5672/
LISTEN_ADDR=0.0.0.0:50051
RUST_LOG=info

Multi-Worker Deployment

For production, run multiple worker processes:

# API Server (can be scaled horizontally)
WORK_MODE=grpc ./helium-server &

# Background Tasks (can be scaled)
WORK_MODE=consumer ./helium-server &

# Email Processing (single instance recommended)
WORK_MODE=mailer ./helium-server &

# Scheduled Tasks (MUST be single instance)
WORK_MODE=cron_executor ./helium-server &

Configuration Management

To update module configurations:

  1. Via Database: Insert/update records in the application__config table
  2. Via Admin API: Use the management gRPC API to update configurations
  3. Configuration Sync: The system automatically syncs configurations from PostgreSQL to Redis cache

Example SQL for updating auth configuration:

INSERT INTO application__config (key, content)
VALUES ('auth', '{"jwt": {"secret": "new-secret"}, ...}')
ON CONFLICT (key) DO UPDATE SET
  content = EXCLUDED.content,
  updated_at = NOW();

Security Considerations

⚠️ Critical Configuration Security:

  1. JWT Secrets: Use cryptographically secure random strings (32+ characters)
  2. VPN Server Token: Generate secure random tokens for server authentication
  3. Database Credentials: Use strong passwords and restrict database access
  4. SMTP Credentials: Use application-specific passwords, not primary account passwords
  5. OAuth Secrets: Keep OAuth client secrets secure and rotate them regularly

Troubleshooting

Common Configuration Issues

  1. Database Connection: Verify PostgreSQL accessibility and credentials
  2. Redis Connection: Check Redis server status and network connectivity
  3. RabbitMQ Connection: Ensure RabbitMQ server is running and accessible
  4. Email Delivery: Test SMTP configuration with your email provider
  5. OAuth Issues: Verify client IDs, secrets, and redirect URIs match provider settings

Validation Commands

# Test database connection
sqlx migrate info --database-url $DATABASE_URL

# Test Redis connection
redis-cli -u $REDIS_URL ping

# Test RabbitMQ connection
rabbitmqctl status  # on RabbitMQ server

Helium CLI

The Helium CLI (helium-cli) is a comprehensive administrative tool that allows operators to:

  • Initialize system configurations with default values
  • Manage admin accounts (create, list, view, delete)
  • Validate configuration files before deployment
  • Interact with both PostgreSQL database and Redis cache

Installation

The CLI is built as part of the main Helium project. After building the project:

cargo build --release --bin helium-cli

The binary will be available at target/release/helium-cli.

Global Configuration

The CLI requires database and Redis connections to function. These can be configured via:

Environment Variables

export DATABASE_URL="postgresql://user:password@localhost/helium"
export REDIS_URL="redis://localhost:6379"

Command Line Arguments

helium-cli --database-url "postgresql://user:password@localhost/helium" \
           --redis-url "redis://localhost:6379" \
           <command>

Verbose Logging

Enable detailed logging for troubleshooting:

helium-cli --verbose <command>

Skip Database Migrations

Skip automatic database migrations when connecting:

helium-cli --skip-migration <command>

This is useful when:

  • Migrations have already been run by another process
  • Running in a container where migrations are handled separately
  • Debugging database connection issues

Commands

Configuration Management

Initialize All Configurations

helium-cli init-config

This command initializes all system configurations with their default values. It:

  • Creates default configurations for all modules in the database
  • Updates Redis cache with the configurations
  • Handles the following configuration types:
    • Auth: Authentication and authorization settings
    • Admin JWT: JWT configuration for admin authentication
    • Telecom: Telecom service configurations
    • Shop: E-commerce and shop settings
    • Market: Affiliate and marketing configurations
    • Mailer: SMTP and email service settings

Example Output:

Initializing 6 configuration types...

Initializing Auth config... ✓ Success
Initializing Admin JWT config... ✓ Success
Initializing Telecom config... ✓ Success
Initializing Shop config... ✓ Success
Initializing Market/Affiliate config... ✓ Success
Initializing Mailer config... ✓ Success

Configuration initialization completed:
  ✓ Successful: 6

Use Cases:

  • Initial deployment setup
  • Resetting configurations to defaults
  • Disaster recovery scenarios

Validate Configuration Files

helium-cli validate-config --config-type <TYPE> <config-file.json>

Validates a JSON configuration file against the specified configuration schema.

Supported Configuration Types:

  • auth - Authentication configuration
  • admin-jwt / admin_jwt - Admin JWT configuration
  • telecom - Telecom service configuration
  • shop - Shop/e-commerce configuration
  • market / affiliate - Marketing/affiliate configuration
  • mailer - Email service configuration

Examples:

# Validate auth configuration
helium-cli validate-config --config-type auth auth-config.json

# Validate mailer configuration
helium-cli validate-config --config-type mailer smtp-config.json

Example Output:

✓ Configuration file is valid!
  File: auth-config.json
  Type: Auth
  Key: auth

Admin Account Management

List Admin Accounts

helium-cli admin list [--limit <N>] [--offset <N>]

Lists all admin accounts with pagination support.

Options:

  • --limit <N> - Number of results to return (default: 50)
  • --offset <N> - Number of results to skip (default: 0)

Example:

# List first 10 admin accounts
helium-cli admin list --limit 10

# List admin accounts with pagination
helium-cli admin list --limit 25 --offset 50

Example Output:

Found 3 admin account(s):

ID                                   Role                 Name                           Email                          Created At
------------------------------------ -------------------- ------------------------------ ------------------------------ --------------------
123e4567-e89b-12d3-a456-426614174000 Super Admin          System Administrator           admin@example.com              2024-01-15T10:30:00Z
234e5678-e89b-12d3-a456-426614174001 Customer Support     Support Team Lead              support@example.com            2024-01-16T14:20:00Z
345e6789-e89b-12d3-a456-426614174002 Moderator            Content Moderator              moderator@example.com          2024-01-17T09:45:00Z

Show Admin Account Details

helium-cli admin show <ADMIN_ID>

Displays detailed information about a specific admin account.

Example:

helium-cli admin show 123e4567-e89b-12d3-a456-426614174000

Example Output:

Admin Account Details:
  ID: 123e4567-e89b-12d3-a456-426614174000
  Name: System Administrator
  Role: Super Admin
  Email: admin@example.com
  Avatar: https://example.com/avatar.jpg
  Created At: 2024-01-15T10:30:00Z

Create Admin Account

helium-cli admin create --name <NAME> --role <ROLE> [--email <EMAIL>] [--avatar <AVATAR_URL>]

Creates a new admin account with the specified details.

Required Options:

  • --name <NAME> - Display name for the admin
  • --role <ROLE> - Admin role (see roles below)

Optional Options:

  • --email <EMAIL> - Admin email address
  • --avatar <AVATAR_URL> - URL to admin avatar image

Available Roles:

  • super_admin / superadmin / super-admin - Full system access
  • moderator - Content moderation privileges
  • customer_support / customersupport / customer-support - Customer service access
  • support_bot / supportbot / support-bot - Automated support system access

Examples:

# Create super admin
helium-cli admin create \
  --name "System Administrator" \
  --role super_admin \
  --email "admin@example.com"

# Create customer support account
helium-cli admin create \
  --name "Support Agent" \
  --role customer_support \
  --email "support@example.com" \
  --avatar "https://example.com/avatars/support.jpg"

# Create moderator (minimal info)
helium-cli admin create \
  --name "Content Moderator" \
  --role moderator

Example Output:

Successfully created admin account:
  ID: 456e7890-e89b-12d3-a456-426614174003
  Name: System Administrator
  Role: Super Admin
  Email: admin@example.com
  Avatar: N/A

Delete Admin Account

helium-cli admin delete <ADMIN_ID> [--yes]

Deletes an admin account after confirmation.

Options:

  • --yes - Skip confirmation prompt (use with caution)

Examples:

# Delete with confirmation prompt
helium-cli admin delete 123e4567-e89b-12d3-a456-426614174000

# Delete without confirmation (automated scripts)
helium-cli admin delete 123e4567-e89b-12d3-a456-426614174000 --yes

Example Interactive Flow:

Admin account to delete:
  ID: 123e4567-e89b-12d3-a456-426614174000
  Name: Old Administrator
  Role: Super Admin
  Email: old-admin@example.com

Are you sure you want to delete this admin account? [y/N]: y
Successfully deleted admin account: 123e4567-e89b-12d3-a456-426614174000

Common Use Cases

Initial Deployment

  1. Set up environment variables:

    export DATABASE_URL="postgresql://helium:password@localhost/helium"
    export REDIS_URL="redis://localhost:6379"
    
  2. Initialize system configurations:

    helium-cli init-config
    
  3. Create initial super admin:

    helium-cli admin create \
      --name "System Administrator" \
      --role super_admin \
      --email "admin@yourcompany.com"
    

Configuration Management Workflow

  1. Prepare configuration file: Create a JSON file with your custom configuration.

  2. Validate before deployment:

    helium-cli validate-config --config-type auth ./configs/auth-config.json
    
  3. Deploy configuration: Use the web interface or API to upload the validated configuration.

Admin Account Maintenance

  1. Regular audit of admin accounts:

    helium-cli admin list --limit 100
    
  2. Create specialized support accounts:

    # Customer support team
    helium-cli admin create --name "Support Team A" --role customer_support
    
    # Content moderation team
    helium-cli admin create --name "Moderator Team B" --role moderator
    
  3. Remove inactive accounts:

    helium-cli admin delete <inactive-admin-id>
    

Error Handling

The CLI provides comprehensive error messages and logging:

  • Database Connection Issues: Check DATABASE_URL and database availability
  • Redis Connection Issues: Verify REDIS_URL and Redis service status
  • Configuration Validation Errors: Review JSON syntax and required fields
  • Admin Role Errors: Ensure role names match supported values exactly

Security Considerations

  1. Environment Variables: Use secure methods to set database credentials
  2. Admin Creation: Be selective with super_admin role assignments
  3. Account Deletion: Always verify admin identity before deletion
  4. Logging: Be aware that verbose mode may log sensitive information

Troubleshooting

Common Issues

“DATABASE_URL must be provided”

  • Set the DATABASE_URL environment variable or use --database-url flag

“Failed to connect to database”

  • Verify PostgreSQL is running and accessible
  • Check connection string format and credentials
  • Ensure the database exists

“Invalid admin role”

  • Use exact role names: super_admin, moderator, customer_support, support_bot
  • Role names are case-insensitive but must match supported variants

“Configuration validation failed”

  • Check JSON syntax with a JSON validator
  • Ensure all required fields are present
  • Verify field types match expected schema

Getting Help

Use the built-in help system:

# General help
helium-cli --help

# Command-specific help
helium-cli admin --help
helium-cli admin create --help

Integration with Deployment Scripts

The CLI is designed to work well in automated deployment scenarios:

#!/bin/bash
set -e

# Set environment
export DATABASE_URL="$HELIUM_DB_URL"
export REDIS_URL="$HELIUM_REDIS_URL"

# Initialize configurations
echo "Initializing Helium configurations..."
helium-cli init-config

# Create admin account if it doesn't exist
echo "Creating admin account..."
helium-cli admin create \
  --name "$ADMIN_NAME" \
  --role super_admin \
  --email "$ADMIN_EMAIL" || true

echo "Deployment initialization complete!"

This CLI tool is essential for proper Helium deployment and ongoing operational management. Use it as part of your deployment automation and regular maintenance procedures.

Migrate From SS-Panel UIM

This guide walks Helium operators through migrating an existing SS-Panel UIM deployment. The migration intentionally happens in two isolated passes so you can export data from the legacy MariaDB instance without touching the new Helium PostgreSQL database until you are ready.

At a high level:

  1. mariadb-pass reads all data from the SS-Panel MariaDB schema and saves it to a local rkyv archive.
  2. postgre-pass consumes that rkyv archive and writes normalized data into Helium’s PostgreSQL schema.

Because Helium normally targets PostgreSQL, the first pass uses a dedicated crate that bundles the MySQL client driver and builds separately from the rest of the project.

What Gets Migrated

The migration transfers the following SS-Panel data into Helium’s schema:

User Accounts

  • Email and password hashes (preserved as-is for seamless login)
  • User names and registration timestamps
  • Last active timestamps
  • Account balances (available balance for purchasing)
  • Referral relationships (affiliate ref_by links)
  • Traffic usage (upload/download totals)
  • VMess UUIDs (for node authentication)
  • Subscribe tokens (subscription links)
  • Invite codes (user-specific invite codes)

Helium creates corresponding entries in:

  • auth.user_account (login credentials)
  • auth.user_account (profile metadata)
  • shop.user_balance (financial data)
  • market.affiliate_user_policy (referral relationships)
  • telecom.user_nodes_token (node authentication tokens)

Products → Packages

SS-Panel products are converted to Helium packages with:

  • Package name
  • Price
  • Duration (time allowance in days)
  • Bandwidth quota

These populate the telecom.package table.

Orders → Package Queues

Historical purchase orders are replayed into Helium’s package queue system:

  • Order status (activated vs. pending)
  • Creation and update timestamps
  • Associated product/package

Orders are inserted into telecom.package_queue to preserve user entitlements and purchase history.

Nodes → Node Servers & Clients

SS-Panel nodes are split into two Helium entities:

  • Node servers (telecom.node_server): server address, rate, class
  • Node clients (telecom.node_client): protocol configurations (VMess, WebSocket, gRPC)

Each node’s custom configuration (ports, security, network transport) is normalized to Helium’s node client schema.

Data Not Migrated

The following SS-Panel data is not migrated:

  • Invoices (read but not written to Helium)
  • Payback records (read but not written)
  • Admin accounts (must be created manually via helium-cli)
  • System configurations (initialize via helium-cli init-config)
  • Announcements and tickets (start fresh in Helium)

Prerequisites

  • SS-Panel UIM running on MariaDB (or MySQL-compatible) that you can access in read-only mode during export.
  • A ready Helium PostgreSQL database with migrations applied and no production users yet. Run sqlx migrate run before importing.
  • Adequate disk space wherever you write the rkyv archive. Expect several hundred megabytes for large installs.
  • Rust toolchain (same as Helium) and network access to both databases from the machine performing the migration.
  • Optional: a safe location (e.g., object storage) to back up the generated rkyv file.

Pass 1 – Export From SS-Panel (MariaDB)

The exporter lives in ssp-migrator/mariadb-pass and is compiled with SQLx’s MySQL feature set. Build and run it separately from the main server binaries.

Build the exporter

mariadb-pass uses SQLx’s compile-time query checking. The workspace ships with .sqlx caches for PostgreSQL only, so generic commands such as cargo build --release -p mariadb-pass will fail. You must compile from the crate directory with access to a live SS-Panel database (or export SQLx metadata for MariaDB manually).

cd ssp-migrator/mariadb-pass
SQLX_OFFLINE=false DATABASE_URL="$SSP_DATABASE_URL" cargo build --release

The DATABASE_URL environment variable is required during compilation so SQLx can introspect the MariaDB schema. If you cannot open a direct connection from the build host, generate SQLx data offline with sqlx prepare against MariaDB and commit it alongside the crate before building.

Prepare connection settings

You can pass the database URL directly on the command line or export it as an environment variable. A typical MariaDB connection string looks like:

export SSP_DATABASE_URL="mysql://user:password@legacy-host:3306/sspanel"

Run the exporter

cd ssp-migrator/mariadb-pass
SQLX_OFFLINE=false DATABASE_URL="$SSP_DATABASE_URL" cargo run --release -- \
  --database-url "$SSP_DATABASE_URL" \
  --output-file /tmp/helium-migration.rkyv

The command performs several steps internally:

  • Streams each SS-Panel entity (users, products, orders, nodes, etc.) in batches.
  • Normalizes relationships to Helium’s intermediate structs.
  • Serializes the result to an rkyv archive (default name migration_data.rkyv).

Monitor the logs for warnings about rows that cannot be converted. The exporter skips invalid records but continues processing.

When the run finishes you should have an archive file similar to /tmp/helium-migration.rkyv. Back it up before moving on.

Pass 2 – Import Into Helium (PostgreSQL)

The importer lives in ssp-migrator/postgre-pass and understands Helium’s canonical schema. Ensure the target PostgreSQL database is empty or freshly provisioned to avoid collisions.

Build the importer

cargo build --release -p postgre-pass

This binary only links the PostgreSQL driver, so it compiles with the same workspace settings as other Helium components.

Prepare connection settings

export HELIUM_DATABASE_URL="postgres://helium:password@new-host:5432/helium_db"

Run the importer

cargo run --release -p postgre-pass -- \
  --rkyv-file /tmp/helium-migration.rkyv \
  --database-url "$HELIUM_DATABASE_URL"

The importer performs conversions aligned with Helium’s modules:

  • Inserts node servers and clients in the correct dependency order.
  • Creates packages, affiliate policies, balances, and user accounts.
  • Replays historical purchases into the package queue so users retain entitlements.

If anything fails, no partial state is left behind—each insert group is committed in dependency order. Fix the reported data issue, rebuild the rkyv archive if necessary, and rerun the importer.

Post-migration Checklist

  • Confirm the importer logs Migration completed successfully.
  • Inspect a handful of migrated users in Helium’s admin tools (profiles, balances, active packages).
  • Verify node configurations in telecom match the expected SS-Panel node inventory.
  • Rotate user credentials if required by your migration policy (password hashes are imported as-is).
  • Schedule DNS cutover and client config updates after validating the new deployment.

Troubleshooting

  • MariaDB TLS or authentication errors: confirm the MariaDB driver accepts your certificates or append parameters (e.g., ?ssl-mode=REQUIRED).
  • Missing subscribe links or invite codes: the exporter requires these tables to be populated for each user. Reconcile data in SS-Panel before exporting.
  • Importer stops on unique constraint violations: verify the PostgreSQL database is clean. Drop and recreate the schema, then rerun the importer.
  • Large datasets: run the exporter on a machine close to the database to reduce latency. You can copy the resulting rkyv file to the environment where the importer runs.

With both passes complete, Helium now has a faithful copy of the SS-Panel data and you can proceed with normal deployment and cutover activities.