Helium

Helium is a modern commercial VPN SaaS system built with Rust, focusing on scalability, security, and user-friendliness.

Features

Kubernetes/docker native: stateless, horizontally scalable, and easy to deploy.
High security: no shell execution, no deserialize vulnerabilities, and no SQL injection.
Pluggable frontend: full functioned grpc API, and easy to build your own frontend.
Lightweight: Minimal to 40MB memory usage per service. Can handle 1000+ requests per second on a single 1-core CPU server.
Advanced Selling System: easy to handle complex business strategy, suitable for billions of users.

Tech Stack

Rust: Memory-safe systems programming with C-level performance
gRPC + Tonic: High-performance API with type-safe contracts
PostgreSQL + SQLx: Reliable database with compile-time query validation
Redis: Fast in-memory caching and session storage
AMQP: Reliable message queuing for microservices
Tokio: Async runtime for handling thousands of concurrent connections

Key Advantages:

Microservices architecture with independent scaling
Container-native design for Kubernetes deployment
Memory safety eliminates entire classes of security vulnerabilities
Exceptional performance: 1000+ RPS on single-core CPU with 40MB memory usage

Microservices Architecture

Helium is built as a collection of focused microservices that cooperate through a shared set of contracts, messaging patterns, and observability tooling. This section introduces the high-level layout of the system, explains how the services interact, and highlights the infrastructure choices that enable the platform to scale for large commercial VPN deployments.

Architectural Goals

Independent scaling – Each service can be deployed and scaled based on its workload characteristics (API traffic, background jobs, email throughput, etc.).
Clear boundaries – Services expose well-defined APIs (gRPC, REST, AMQP events) and depend on shared libraries for cross-cutting concerns, ensuring that business logic remains isolated inside its module.
Operational resiliency – Stateless services, database connection pooling, and message queues allow resilient deployments with graceful failure handling.
Security by design – Rust, strict processor patterns, and zero shared mutable state within processes prevent memory safety issues and accidental privilege escalations.

Service Topology

The helium-server crate can run in multiple worker modes. Each mode is packaged into its own container image or deployment unit, providing a natural microservice boundary while reusing the same codebase and shared libraries.

Worker	Entry Point	Responsibilities
`grpc`	`GrpcWorker`	Exposes gRPC APIs for all business domains (Auth, Manage, Telecom, Market, Shop, Support, etc.). Performs request validation, invokes the corresponding module services, and emits events.
`subscribe_api`	`SubscribeApiWorker`	Provides REST endpoints optimized for subscription clients. Primarily a read-heavy facade backed by Redis caching and the service layer.
`webhook_api`	`WebHookApiWorker`	Receives payment gateway callbacks and external partner webhooks, normalizes payloads, and dispatches workflow events.
`consumer`	`ConsumerWorker`	Listens on AMQP queues for asynchronous jobs (billing, node updates, provisioning) emitted by other services. Orchestrates long-running tasks that should not block API responses.
`mailer`	`MailerWorker`	Specialized consumer responsible for templated email delivery, retry management, and transactional messaging.
`cron_executor`	`CronWorker`	Periodically scans for scheduled work (subscription renewals, quota resets, health checks) and dispatches jobs via the same service layer used by the API workers.

These workers are deployed independently and scaled according to throughput requirements. For example, a busy billing period can scale the consumer and cron workers without affecting the gRPC API footprint.

Domain-Oriented Modules

Each domain (Auth, Manage, Telecom, Shop, Market, Notification, Support, Shield, Mailer) is implemented as an independent module under modules/. Modules follow a common layout (entities, services, rpc, hooks, events) as described in the Project Structure Guide. Within the microservices architecture:

Modules provide service layer processors that encapsulate business logic.
RPC layers expose the processors through gRPC servers. The GrpcWorker aggregates these services and mounts them behind a single TLS termination point, while keeping module ownership intact.
Hooks and events enable cross-module interactions without tight coupling, allowing, for instance, the Telecom module to emit usage events consumed by the billing logic in the Manage module.

Communication Patterns

Helium combines synchronous APIs with asynchronous messaging to balance latency and resiliency.

gRPC Contract

Tonic-generated servers provide strongly typed interfaces for customer-facing and operator APIs.
A uniform Processor trait ensures every RPC delegate is testable in isolation and can be reused by background workers.
Service discovery is handled at the infrastructure layer (Kubernetes or Docker Compose) because workers are stateless; clients load-balance using standard mechanisms (Envoy, NGINX, etc.).

REST Facades

Subscription and webhook workers expose lightweight REST routes via Axum.
REST APIs reuse the same service processors, ensuring identical business behavior across protocols and simplifying versioning.

Asynchronous Messaging

RabbitMQ (AMQP) is used to propagate domain events and dispatch background jobs.
Producers append metadata (correlation IDs, tenant identifiers) to support observability and reliable retries.
Consumers acknowledge messages only after successful processing, preventing data loss during failures.

Data Management

PostgreSQL is the system of record. SQLx is used through the DatabaseProcessor abstraction to keep SQL isolated inside entities/ modules and to support compile-time query checking.
Redis provides ephemeral caches, session storage, and rate limiting. The RedisConnection wrapper from helium-framework manages pooled connections shared by API and worker processes.
Consistent migrations live in the top-level migrations/ directory and are applied during deployment. Services run with zero shared mutable state; all coordination happens through the database or message queues.

Observability & Operations

Tracing is initialized in every worker with structured logs and span annotations. This enables distributed tracing across API and background workloads when combined with log collectors.
Metrics exporters (e.g., Prometheus integration) can be attached at the deployment layer because each worker exposes a predictable Axum/Tonic server endpoint.
Health probes: gRPC and REST workers perform dependency checks on startup (database, Redis, AMQP). Container orchestrators can use readiness/liveness probes to restart unhealthy instances.

Deployment Model

Workers are packaged as lightweight containers (<50MB RSS) and designed to be horizontally scalable. Scaling policies are set per worker depending on CPU or queue length metrics.
Configuration is provided through environment variables (DATABASE_URL, MQ_URL, REDIS_URL, WORK_MODE, etc.), making the platform 12-factor compliant.
Infrastructure typically consists of:
- Kubernetes/Docker orchestrating the worker deployments
- Managed PostgreSQL and Redis services
- RabbitMQ cluster for messaging
- Optional CDN or reverse proxy terminating TLS before forwarding requests to gRPC/REST workers.

Extensibility

Adding a new capability follows a repeatable pattern:

Create or extend a module under modules/ with the Processor-based service implementation.
Expose the functionality via RPC/REST by wiring the service into the relevant worker.
Emit domain events or enqueue background jobs when work must be processed asynchronously.
Deploy the updated worker image; other workers continue functioning without redeployment because contracts are versioned explicitly.

This approach keeps Helium maintainable while providing the flexibility to grow with complex VPN SaaS requirements.

Helium Project Structure Guide

This document describes the modular architecture and organization of the Helium VPN SaaS system.

Project Overview

Helium is a modern VPN SaaS system built with Rust, organized as a workspace with multiple modules. The system follows a modular architecture where each module represents a specific business domain.

Module Architecture

Each module follows a consistent internal structure with standardized components:

1. Entity Layer (`entities/`)

Purpose: Data models and database access patterns

Structure:

entities/
├── mod.rs                          # Module exports
├── db/                             # Database entity processors
│   ├── mod.rs
│   ├── user_account.rs             # User account queries/commands
│   └── ...
└── redis/                          # Redis entity processors
    ├── mod.rs
    ├── session.rs                  # Session cache operations
    └── ...

Key Patterns:

Implements Processor<Input, Result<Output, sqlx::Error>> for DatabaseProcessor
Contains all SQL queries and database operations
Separated by storage backend (db/ for PostgreSQL, redis/ for Redis)
No business logic - pure data access

2. Service Layer (`services/`)

Purpose: Business logic orchestration and workflows

Structure:

services/
├── mod.rs                          # Service exports
├── manage.rs                       # Management operations
├── user_profile.rs                 # User profile management
└── ...

Key Patterns:

Implements Processor<Input, Result<Output, Error>> for service operations
Orchestrates multiple entity operations
Handles validation, transformation, and business rules
No direct SQL - delegates to entity processors
Uses DatabaseProcessor for data access

Example:

#[derive(Clone)]
pub struct UserManageService {
    pub db: sqlx::PgPool,
}

impl Processor<ListUsersRequest, Result<ListUsersResponse, Error>> for UserManageService {
    async fn process(&self, input: ListUsersRequest) -> Result<ListUsersResponse, Error> {
        let db = DatabaseProcessor::from_pool(self.db.clone());
        let users = db.process(ListUsers { ... }).await?;
        Ok(ListUsersResponse { users })
    }
}

3. gRPC Layer (`rpc/`)

Purpose: gRPC service implementations and external API

Structure:

rpc/
├── mod.rs                          # RPC exports
├── auth_service.rs                 # Authentication gRPC service
├── manage_service.rs               # Management gRPC service
├── middleware.rs                   # gRPC middleware
└── ...

Key Patterns:

Implements generated gRPC trait definitions
Converts protobuf messages to service DTOs
Delegates to service layer via Processor::process
Handles authentication and authorization

4. Hook System (`hooks/`)

Purpose: Event-driven side effects and integrations

Structure:

hooks/
├── mod.rs                          # Hook exports
├── billing.rs                      # Billing event hooks
├── register.rs                     # Registration hooks
└── ...

Key Patterns:

Responds to domain events
Handles cross-module integrations
Implements side effects (notifications, external API calls)
Decoupled from main business flows

5. Event System (`events/`)

Purpose: Domain event definitions and publishing

Structure:

events/
├── mod.rs                          # Event exports
├── user.rs                         # User-related events
├── order.rs                        # Order events
└── ...

Key Patterns:

Defines domain events using message queue integration
Publishes events for cross-module communication
Enables audit trails and analytics
Supports eventual consistency patterns

6. API Layer (`api/`)

Purpose: REST API endpoints and HTTP handlers

Structure:

api/
├── mod.rs                          # API exports
├── subscribe.rs                    # Subscription endpoints
└── xrayr/                          # XrayR integration APIs
    ├── mod.rs
    └── ...

Key Patterns:

Implements REST endpoints using Axum
Handles HTTP-specific concerns (parsing, serialization)
Delegates to service layer
Provides alternative to gRPC for specific use cases

7. Cron Jobs (`cron.rs`)

Purpose: Scheduled tasks and background jobs

Key Patterns:

Implements periodic maintenance tasks
Handles cleanup operations
Manages recurring billing cycles
Executes system health checks

8. Testing (`tests/`)

Purpose: Integration and unit tests

Structure:

tests/
├── common/                         # Test utilities
│   └── mod.rs                      # Common test setup
├── service_name_test.rs            # Service integration tests
└── ...

Key Patterns:

Integration tests for complete workflows
Uses testcontainers for database testing
Isolated test environments
Comprehensive service testing

Module Configuration

Dependencies (`Cargo.toml`)

Each module declares:

Workspace dependencies (shared versions)
Inter-module dependencies
Module-specific dependencies
Dev dependencies for testing
Build dependencies (typically tonic-prost-build for gRPC)

Build Configuration (`build.rs`)

Most modules include build scripts for:

gRPC code generation from proto files
Custom compilation steps
Environment-specific builds

Module Entry Point (`lib.rs`)

Standard module structure:

#![forbid(clippy::unwrap_used)]
#![forbid(unsafe_code)]
#![deny(clippy::expect_used)]
#![deny(clippy::panic)]

pub mod config;
pub mod cron;
pub mod entities;
pub mod events;
pub mod hooks;      // Optional
pub mod api;        // Optional
pub mod rpc;
pub mod services;

Protocol Buffers (`proto/`)

Organization: Organized by module with consistent naming:

proto/
├── auth/
│   ├── auth.proto                  # Core auth services
│   ├── account.proto               # Account management
│   └── manage.proto                # Admin operations
├── telecom/
│   ├── telecom.proto               # VPN services
│   └── manage.proto                # Telecom management
└── ...

Patterns:

Service definitions mirror module structure
Consistent message naming conventions
Shared types in common proto files

Key Architectural Principles

1. Processor Pattern

All APIs use the kanau::processor::Processor trait for consistent interfaces and composability.

2. Separation of Concerns

Entities: Data access only
Services: Business logic only
RPC/API: Protocol handling only
Events/Hooks: Side effects only

3. Database Abstraction

Services never contain raw SQL - all database access goes through entity processors.

4. Event-Driven Architecture

Modules communicate via events to maintain loose coupling.

5. RBAC and Audit

Administrative operations implement consistent role-based access control and audit logging.

Development Guidelines

Adding a New Module

Create module directory under modules/
Add basic Cargo.toml with workspace dependencies
Create src/lib.rs with standard module structure
Add module to workspace Cargo.toml
Create proto definitions if gRPC services needed
Implement entities → services → rpc layers in order

Testing Strategy

Unit tests for complex business logic in services
Integration tests in tests/ directory
Use testcontainer-helium-modules for database tests
Mock external dependencies
Test error handling paths

Documentation Standards

Document all public APIs
Include examples for complex workflows
Maintain this guide as modules evolve
Document breaking changes in module changelogs

This modular architecture enables independent development, testing, and deployment of features while maintaining system coherence through standardized patterns and interfaces.

Admin Manage Module

The Admin Manage module provides the operational control plane for Helium. It brings together the tooling that internal staff need to configure commercial VPN offerings, supervise subscriber lifecycle tasks, and monitor the reliability of the distributed network. While customer-facing applications interact with the public APIs, the Admin Manage module focuses on privileged workflows such as policy curation, partner management, and sensitive account intervention.

At a glance, the module enables administrators to:

Onboard new business units, partners, and reseller organizations.
Provision and maintain privileged operator accounts with fine-grained access scopes.
Configure catalog data (plans, bundles, promotions) that the market and shop domains surface to end customers.
Oversee subscriber management, including suspension, KYC verification, and support escalations.
Inspect operational telemetry generated by other Helium modules to triage issues quickly.

The Manage module is intentionally integrated with the platform’s observability, billing, and identity services. By housing these capabilities in one place, Helium ensures that administrative actions respect the same audit and security guarantees enforced across the rest of the microservices architecture.

Admin Account System

The Manage service implements a purpose-built administrator directory that is separate from customer identities. It stores control-plane operators, delegated tenant managers, and read-only auditors together with the access credentials required to call privileged APIs.

Account Personas and Records

Every administrator entry stores an immutable id, display name, granted role, and optional contact metadata. Platform operators run the Helium cloud and may assume any tenant context. Tenant administrators belong to a single customer tenant and can invite peers. Auditors observe configuration and compliance state without mutation rights. Each record retains lifecycle timestamps (creation, last login, invitation usage) so governance reports can be produced without touching runtime logs.

Registration Workflow

Administrator onboarding is handled by invitation and implemented inside AdminAuthService:

Invitation lookup – The service receives a RegisterAdmin command and uses FindAdminInvitationByToken to retrieve the invitation that seeded the registration. Missing invitations resolve to RegisterAdminResult::InvalidInvitation.
Usage guard – If the invitation was already consumed (invitation.used), the service short-circuits with RegisterAdminResult::InvitationUsed to stop replayed activation links.
Account creation – A new UUID is minted and passed to CreateAdminAccount together with the invite’s role and the operator-supplied profile fields. This persists the administrator row and binds it to the permission set determined during invitation issuance.
API key provisioning – generate_admin_token produces an opaque API key (passkey). The token plus key label (key_name) are stored through CreateAdminToken, ensuring future logins can authenticate against the database record.
Invitation finalization – UseAdminInvitation marks the invitation as used so the one-time link cannot be replayed.
Response – The service returns RegisterAdminResult::Success with the freshly created admin_id and the plaintext API token. The caller is responsible for presenting that key securely to the new administrator; it is never persisted in plaintext elsewhere.

This flow guarantees that registration can only occur with a valid invitation and that each administrator starts with at least one API credential for future logins.

Subsequent logins exchange the stored API token for a short-lived JWT access credential:

API key lookup – AdminAuthService receives an AdminLogin command containing the submitted API token that was minted during registration (or from a later CreateAdminToken issuance). It invokes FindAdminPasskeyByToken to resolve the underlying admin key row stored in admin.admin_key. Absent or revoked tokens yield AdminLoginResult::KeyNotFound.
Account hydration – With the key resolved, the service loads the administrator profile through FindAdminById. Missing accounts are treated as a failed login to avoid leaking information about deleted users.
JWT configuration – The service clones its Redis connection and calls find_config_from_redis::<AdminJwtConfig> to load the current signing material, issuer, audience, and expiration settings.
Claim assembly – Using AdminJwtClaims, the service sets sub to the admin ID, name and role to the profile details, and timestamps (iat, exp) based on OffsetDateTime::now_utc() plus the configured TTL.
Token issuance – The encoded claims are signed using the JWT encoder from AdminJwtConfig::encode(). On success the service returns AdminLoginResult::Success(AdminAccessToken); failures bubble up as framework errors.

The caller receives only the access token string. Subsequent Manage API calls must include it (for example, as a Bearer token) so workers can authorize the request.

Access Token Semantics

The issued JWT embeds the administrator’s role in lower case (via role.to_jwt_string().to_lowercase()), the issuer/audience pair, and the expiry chosen by configuration. Manage workers validate tokens on every request using the same Redis-backed configuration, checking signature validity and time-based claims before resolving RBAC permissions. Because API tokens act as a second factor, operators are encouraged to rotate them regularly; new keys can be created through the same CreateAdminToken processor while old keys are revoked out-of-band.

Account Lifecycle

Beyond registration and login, Manage provides tools for invitation management, account suspension, and archival. Suspended accounts retain their historical record but cannot authenticate until reinstated. Archival removes active credentials yet preserves audit context, ensuring the administrator directory remains authoritative for compliance reporting.

Role-based Access Control

The Manage module implements role-based access control (RBAC) with a compact, code-first model. Every administrative API call flows through an authorization check that compares the caller’s stored role with a whitelist embedded in the operation being executed. This section documents the concrete design so other modules can plug into the same pattern.

Admin roles

Administrator accounts persist a strongly-typed role in the admin.admin_account table. The role enum is defined in Rust as AdminRole with four variants: SuperAdmin, Moderator, CustomerSupport, and SupportBot. Each variant describes the maximum authority an operator can have, and the enum provides helpers for serialising the value into JWT claims:

#[derive(Debug, Clone, Copy, PartialEq, Eq, sqlx::Type, Serialize)]
#[serde(rename_all = "snake_case")]
#[sqlx(type_name = "admin.admin_role", rename_all = "snake_case")]
pub enum AdminRole {
    /// The super admin is the highest level of admin. Super admin can use all manage APIs.
    SuperAdmin,
    /// The moderator is a lower level of admin. Moderator can use all non-sensitive APIs.
    Moderator,
    /// The customer support is a lower level of admin. Customer support can only access user management APIs.
    CustomerSupport,
    /// The support bot can access most of non-sensitive read APIs, but cannot access any write APIs.
    SupportBot,
}

impl AdminRole {
    pub fn to_jwt_string(&self) -> String {
        match self {
            AdminRole::SuperAdmin => "super_admin".to_string(),
            AdminRole::Moderator => "moderator".to_string(),
            AdminRole::CustomerSupport => "customer_support".to_string(),
            AdminRole::SupportBot => "support_bot".to_string(),
        }
    }
}

The current Manage APIs are conservative: every write path and most read paths are restricted to SuperAdmin. That choice is encoded directly in the service layer and can be relaxed by expanding the allowed-role lists as more granular policies are introduced. For example, here are some typical operation implementations:

#[derive(Debug, Clone, PartialEq, Eq, Serialize)]
pub struct CreateInvite {
   pub inviter_id: Uuid,
   pub role: AdminRole,
}

impl AdminOperation for CreateInvite {
   const ALLOWED_ROLES: &'static [AdminRole] = &[AdminRole::SuperAdmin];
   const OPERATION_NAME: &'static str = "create_invite";
   const OPERATION_TARGET: &'static str = "admin";

   fn to_audit_log(&self) -> Result<String, serde_json::Error> {
      // ...
   }
}

#[derive(Debug, Clone, PartialEq, Eq, Serialize)]
pub struct ChangeRole {
    pub admin_id: Uuid,
    pub role: AdminRole,
}

impl AdminOperation for ChangeRole {
   const ALLOWED_ROLES: &'static [AdminRole] = &[AdminRole::SuperAdmin];
   const OPERATION_NAME: &'static str = "change_role";
   const OPERATION_TARGET: &'static str = "admin";

   fn to_audit_log(&self) -> Result<String, serde_json::Error> {
       // ...
   }
}

#[derive(Debug, Clone, Copy, PartialEq, Eq, Serialize)]
pub struct ListAdmins {
   pub limit: i64,
   pub offset: i64,
}

impl AdminOperation for ListAdmins {
   const ALLOWED_ROLES: &'static [AdminRole] = &[AdminRole::SuperAdmin];
   const OPERATION_NAME: &'static str = "list_admins";
   const OPERATION_TARGET: &'static str = "admin";

   fn to_audit_log(&self) -> Result<String, serde_json::Error> {
      // ...
   }
}

Operation gating

Each RPC-facing action is modelled as a struct (for example CreateInvite or ListAdmins) that implements the AdminOperation trait. The trait requires the operation to publish three constants—ALLOWED_ROLES, OPERATION_NAME, and OPERATION_TARGET—and a method for serialising audit metadata. The ALLOWED_ROLES array is the crucial RBAC rule: it declares which AdminRole values may invoke the action:

pub trait AdminOperation {
   const ALLOWED_ROLES: &'static [AdminRole];
   const OPERATION_NAME: &'static str;
   const OPERATION_TARGET: &'static str;

   fn check_permission(rule: AdminRole) -> bool {
      // implemented function
   }

   fn with_admin_id(self, admin_id: Uuid) -> RecordedAdminOperation<Self>
   where
        Self: Sized,
   {
      // implemented function
   }

   // required function
   fn to_audit_log(&self) -> Result<String, serde_json::Error>;
}

Requests are wrapped in an AuditLayer before they reach the underlying service. The layer looks up the caller’s account, verifies that the stored role is present in ALLOWED_ROLES, records an audit entry when appropriate, and only then dispatches the call to the service implementation. Any mismatch results in an immediate PermissionsDenied error without touching the business logic. This keeps enforcement centralised and guarantees that logging and RBAC stay in sync:

#[derive(Debug, Clone)]
pub struct AuditLayer {
   // private fields
}

impl AuditLayer {
   pub fn new(database_processor: DatabaseProcessor) -> Self {
      // implemented function
   }

   // Main audited wrapper
   async fn wrap<Oper, Output, Proc>(
      &self,
      processor: &Proc,
      input: RecordedAdminOperation<Oper>,
   ) -> Result<Output, Error>
   where
      Oper: AdminOperation + Send,
      Proc: Processor<RecordedAdminOperation<Oper>, Result<Output, Error>> + Send + Sync,
   {
      // implemented function
   }
}

Reusing the RBAC pattern

Other Manage submodules—or even external crates—should follow the same contract when adding administrative features:

Define a command object for the new action and implement AdminOperation on it. Select the minimal role set necessary for the task and provide a concise audit payload via to_audit_log.
Process the command through AuditLayer::wrap (or wrap_without_record if the action should skip audit logging) so that permission checks and audit persistence always run.
When exposing the command through gRPC or HTTP, ensure the request handler obtains the caller’s ID from the authentication middleware and forwards it by calling .with_admin_id(...) on the operation before handing it to the service.

By embedding role checks inside the operation type instead of scattering them through handler logic, the Manage module keeps RBAC auditable, testable, and easy to extend. Future modules can adopt finer-grained policies simply by expanding the enum or splitting operations with different allowed-role sets without having to rework middleware.

Authentication & Audit

The Manage service authenticates every administrative RPC call and records the privileged work that survives RBAC checks. This document captures the moving pieces so existing contributors remember how the plumbing fits together and new contributors can reuse it correctly.

Access token issuance

AdminAuthService is responsible for exchanging an API key for a signed access JWT. The AdminLogin processor performs the following steps:

Look up the submitted API key with FindAdminPasskeyByToken. Unknown keys immediately return AdminLoginResult::KeyNotFound.
Resolve the owning administrator via FindAdminById. Keys referencing a removed account are treated as unknown.
Pull AdminJwtConfig from Redis using find_config_from_redis. The config bundle provides the HMAC/EdDSA encoder plus default issuer, audience, and expiry settings.
Build AdminJwtClaims from the admin profile (ID, display name, role) and the timestamps computed from OffsetDateTime::now_utc().
Sign the JWT with the encoder returned by the config and wrap the token in AdminAccessToken for the RPC response.

Every Manage client must attach the returned JWT to subsequent requests using x-admin-authorization. No refresh token logic exists—clients repeat the login flow when the token expires.

Authentication middleware

The RPC server mounts AdminAuthLayer (see rpc::middleware) on every handler that requires an authenticated admin. The middleware:

Loads AdminJwtConfig from Redis (identical to the login flow) so it can reuse the configured decoder.
Extracts x-admin-authorization from the incoming headers and validates the JWT. Invalid or missing tokens result in Status::unauthenticated once the gRPC method executes.
Stores the authenticated administrator ID in the request extensions as AdminId so downstream handlers can fetch it with AdminId::from_request(&Request).

When you add a new RPC surface, ensure the router is wrapped with AdminAuthLayer::new(redis.clone()) and read the admin identifier from the extensions instead of parsing the header manually. This keeps every handler in sync with the central decoding logic and JWT configuration source.

Auditing and RBAC enforcement

AuditLayer wires authentication into authorization and durable audit trails. Operations that mutate state are modelled as RecordedAdminOperation<T> where T: AdminOperation describes the action being performed. Wrapping a processor with the layer performs the following steps:

Reload the administrator from the database (FindAdminById). Requests referencing a deleted admin fail with Error::PermissionsDenied.
Emit tracing fields (admin_name, admin_role) for observability.
Check whether the admin role satisfies the static allow-list declared on the operation type via AdminOperation::check_permission. Permission failures short-circuit the call with Error::PermissionsDenied.
If permitted, transform the operation into an audit payload with to_audit_log() and persist it using AddAuditLog.
Execute the wrapped processor and log success or failure.

Use AuditLayer::wrap_without_record when you only need the permission check (for example, read-only operations). Otherwise prefer wrap so the audit table stays authoritative.

Checklist for new handlers

Accept the admin context by calling AdminId::from_request in the RPC entry point.
Build the corresponding operation type that implements AdminOperation.
Wrap the service processor with AuditLayer::wrap (or wrap_without_record for read endpoints).
Propagate Error::PermissionsDenied back to the client untouched so callers see a clear 403-style error.

Following these conventions ensures every manage RPC reuses the same JWT validation, role checks, and audit recording logic.

Auth Module

The auth module owns every touch point around user identity: registration, login, and session lifecycle. When you need to change how accounts are created, how tokens are minted or validated, or how third‑party logins work, this is the place to start.

Overview

Config (config.rs) – Centralizes runtime configuration for email rules, JWT token parameters, and OAuth providers. Look here when wiring new environment variables or tuning token TTLs.
Entities (entities/) – Typed models for Redis and database records used during authentication, such as session IDs and magic links. Extending persistence schemas happens here.
Services (services/) – Stateless services that implement registration flows, session issuance, and password utilities. Application code should depend on these instead of hand‑rolling auth logic.
RPC (rpc/) – Public gRPC/HTTP endpoints that expose authentication capabilities to other services. If you are adding a new feature, start by defining or updating the RPCs.
OAuth (oauth/) – Provider integrations, OpenID helpers, and challenge storage. Any new provider or OpenID tweak belongs here.
Cron (cron.rs) – Housekeeping jobs that prune expired challenges, OTPs, and sessions. Whenever you add a new time‑bound artifact, ensure a cleanup job exists.
Hooks & Events (hooks/, events/) – Event emission and subscriber glue for cross‑module reactions (e.g., notifying other modules about new signups).
Password (password.rs) – Hashing strategy and password policy helpers. Adjust this when requirements change.

Typical Extension Workflow

Start with configuration – Introduce config structs or fields in config.rs, then surface them through ConfigProvider so deployments can set them.
Update domain logic – Modify the relevant service in services/ to implement the new behaviour. Use existing entity types or create new ones inside entities/.
Expose interfaces – Adjust RPC handlers under rpc/ (and optionally events/ or hooks/) so callers can access the new functionality.
Keep maintenance in mind – Schedule cleanups in cron.rs and emit events where downstream consumers expect them.

Usage Notes

Always reuse the JWT helpers in config::JwtConfig when issuing tokens so validation rules stay consistent.
When introducing a new OAuth provider, add its configuration to config.rs, implement the client under oauth/, and register it through the provider registry.
Prefer high‑level service APIs for authentication operations inside other modules; they encapsulate hashing, validation, and side effects.
Tests live under modules/auth/tests/. Mirror the high‑level flows there to keep regressions visible.

Keep this document in sync with structural changes so future maintainers know where to find the pieces they need.

User Account System

Core concepts

User authentication record – UserAuthAccount is the canonical row in auth.user. It tracks the account UUID, ban flag, registration timestamp, and whether two-factor is enabled while letting the same identity be accessed through multiple login surfaces:

#[derive(Debug, Clone, Copy, PartialEq, Eq, sqlx::FromRow)]
/// The core entity of user authentication.
///
/// This is the top-level entity that represents a user's authentication account.
/// It contains the user's ID, whether they are banned, and the date they registered.
///
/// The user can have multiple way to login, such as email, OAuth.
pub struct UserAuthAccount {
    pub id: Uuid,
    pub is_banned: bool,
    pub registered_at: time::PrimitiveDateTime,
    pub two_factor_enabled: bool,
}

Profile vs. login surface – UserProfile keeps mutable presentation data (name, picture, marketing email, group membership, MFA flag) separate from credentials; the email stored here is not automatically a login method:

#[derive(Debug, Clone, PartialEq, Eq, sqlx::FromRow)]
/// The profile of a user.
pub struct UserProfile {
    pub id: Uuid,
    pub name: Option<String>,
    pub picture: Option<String>,
    /// The email address will be used for notification and marketing.
    ///
    /// For the address used for authentication or security, refer to the EmailAccount entity.
    pub email: Option<String>,
    pub created_at: time::PrimitiveDateTime,
    pub updated_at: time::PrimitiveDateTime,
    /// User's group determines what production can be shown to the user.
    pub user_group: i32,
    /// User's extra groups are used to determine what production can be shown to the user.
    /// Extra group is for private production.
    pub user_extra_groups: Vec<i32>,
    /// Whether MFA is enabled for the user
    pub mfa_enabled: bool,
}

Dedicated tables hold actual login credentials: password-backed EmailAccount rows link an address and password hash to the user:

#[derive(Clone, PartialEq, Eq, sqlx::FromRow, Zeroize, ZeroizeOnDrop)]
pub struct EmailAccount {
    pub id: i64,
    pub email: String,
    pub password_hash: CompactString,
    pub user_id: Uuid,
}

Each OAuth connection lives in OAuthAccount with provider metadata and timestamps:

#[derive(Debug, Clone, PartialEq, Eq, sqlx::FromRow)]
/// The OAuth account of a user.
///
/// One user can have multiple OAuth accounts.
pub struct OAuthAccount {
    pub id: i64,
    pub user_id: Uuid,
    pub provider_name: OAuthProviderName,
    pub provider_user_id: String,
    pub registered_at: PrimitiveDateTime,
    pub token_updated_at: PrimitiveDateTime,
}

Events and audits – account binding/unbinding and security changes emit AMQP events (see AccountBindEvent, AccountUnbindEvent, PasswordResetEvent, etc.) so downstream systems can react or log activity. When you add new surfaces remember to publish the appropriate events.

Account data layout

When a user registers through email, RegisterEmailAccount creates the auth.user row, seeds a profile, and stores the hashed password in auth.email_account:

#[derive(Clone, PartialEq, Eq, Zeroize, ZeroizeOnDrop)]
pub struct RegisterEmailAccount {
    pub email: String,
    pub password_hash: CompactString,
    pub user_group: i32,
}

pub enum RegisterEmailAccountResult {
    Success {
        user_id: Uuid,
        email_account_id: i64,
    },
    EmailAlreadyExists,
}

OAuth registrations follow the same pattern via RegisterOAuthAccount, inserting the profile with provider metadata before creating the first OAuth credential row:

#[derive(Debug, Clone, PartialEq, Eq)]
pub struct RegisterOAuthAccount {
    pub provider_name: OAuthProviderName,
    pub provider_user_id: String,
    pub email: Option<String>,
    pub name: Option<String>,
    pub picture: Option<String>,
    pub user_group: i32,
}

The account service layers aggregate these shards on demand. UserManageService::process(ShowUserDetail) joins profile, auth flags, email login (if any), OAuth logins, and whether TOTP exists so dashboards can render a full state snapshot.

Whenever you extend the schema, double-check that:

CountUserLoginMethods continues to report the real number of usable login paths (it currently sums email + OAuth rows)
Removal flows (user-facing and admin) still guard against deleting the last login method.
Admin-facing DTOs (UserDetailResponse, UserSummary) expose whatever additional surface you add for operations tooling.

EmailProviderService::process(EmailLogin) performs credential lookup, constant-time password verification (dummy hash fallback), and MFA evaluation before minting access/refresh JWTs through SessionService::CreateSession and emitting a UserLoginEvent for analytics:

#[derive(Clone, PartialEq, Eq)]
pub struct EmailLogin {
    pub email: String,
    pub password: String,
    pub mfa: Option<MfaMethod>,
    pub ip: Option<IpAddr>,
    pub user_agent: Option<String>,
}

#[derive(Debug, Clone, PartialEq, Eq)]
pub enum EmailLoginResult {
    Success(AccessToken, RefreshToken),
    WrongCredential,
    RequireMfa,
    MfaFailed,
    NotFound,
}

The session creation process stores refresh tokens in Redis and generates JWT access tokens:

#[derive(Debug, Clone, PartialEq, Eq)]
pub struct CreateSession {
    pub user_id: Uuid,
    pub login_method: LoginMethod,
    pub ip: Option<std::net::IpAddr>,
    pub user_agent: Option<String>,
}

Expect a RequireMfa or MfaFailed result when MFA is toggled on and the client omits or fails verification.

OAuthProviderService::process(OAuthLogin) validates the state challenge stored in Redis, exchanges the provider code for tokens, fetches user info, and either looks up or registers an OAuthAccount. New registrations populate profile defaults and raise UserRegisterEvent. Successful logins create a session tagged with the provider for downstream attribution:

#[derive(Debug, Clone)]
pub struct OAuthLogin {
    pub provider_name: OAuthProviderName,
    pub code: String,
    pub state: Uuid,
    pub ip: Option<std::net::IpAddr>,
    pub user_agent: Option<String>,
}

#[derive(Debug, Clone, PartialEq, Eq)]
pub enum OAuthLoginResult {
    LoggedIn(AccessToken, RefreshToken),
    InvalidState,
    ProviderMismatch,
    ProviderError(String),
    UserRegistered(AccessToken, RefreshToken),
}

Users can bind extra surfaces only after entering sudo mode. Email-based sudo tokens are issued via MfaService and verified by both EmailProviderService and OAuthProviderService before binding, changing passwords, or unlinking methods.

For removal, EmailProviderService::RemoveEmailAccount and OAuthProviderService::RemoveOAuthAccount ensure at least one login method remains, delete the credential row, and fire an AccountUnbindEvent so audit logs stay complete.

Those flows back the gRPC UserAccountService endpoints that power user settings; when adding a new method wire it through the same guardrails.

Security hardening

MFA & sudo mode – MfaService supports TOTP and email OTP verification, toggles MFA on the profile, and issues short-lived sudo tokens cached in Redis. Any destructive credential change validates a sudo token first:

#[derive(Debug, Clone, PartialEq, Eq)]
pub enum MfaMethod {
    Totp { code: u32 },
    Email { email: String, code: String },
}

Password lifecycle – Password resets validate email links, hash the new password with the configured algorithm, terminate every active session, and broadcast a PasswordResetEvent. See the Password Reset Flow guide for detailed information about the reset process and APIs. Explicit password changes follow the same hashing path and sudo check.
Session management – The session service stores refresh tokens in Redis keyed by SessionId, issues JWTs from config, writes login events, and can terminate an entire user’s session set on demand (used after password reset or by security tooling).
Constant-time credential checks – Email login performs dummy verifications when the address is missing to minimize timing leaks, and password hashes are never logged thanks to explicit redaction in debug output (as shown in the EmailAccount debug implementation above).

Administrative operations

UserManageService exposes RBAC-protected processors for customer support and moderation:

Counting users in configurable time windows, optionally excluding banned accounts
Listing users with filters on email, group, ban status, and registration time, plus aggregate OAuth provider names for quick scanning
Showing detailed account state (profile, auth flags, login methods, MFA) for a specific user (as shown in the ShowUserDetail implementation above)
Removing login methods while keeping at least one usable path, editing profile basics, banning/unbanning, and forcibly removing TOTP when users are locked out

All administrative operations follow the RBAC pattern established in the Manage module, where each operation implements AdminOperation with role restrictions and audit logging:

impl AdminOperation for RemoveLoginMethodRequest {
    const ALLOWED_ROLES: &'static [AdminRole] = &[AdminRole::SuperAdmin, AdminRole::Moderator];
    const OPERATION_NAME: &'static str = "remove_login_method";
    const OPERATION_TARGET: &'static str = "user_account";

    fn to_audit_log(&self) -> Result<String, serde_json::Error> {
        // ...
    }
}

Whenever you add a new credential type or security lever, update these processors plus the protobufs exposed by UserAccountService so both admins and end users can inspect and manage the new surface:

Session Management

What counts as a session?

A session is the Redis record keyed by a SessionId UUID that maps a logged-in user to the metadata needed to mint tokens and audit activity:

pub struct Session {
    pub id: SessionId,
    pub user_id: Uuid,
    pub terminated: bool,
    pub last_refreshed: u64,
}

The record stores the owning user, whether it has been terminated, and the last time it was refreshed so we can expire idle sessions without touching the database. Bulk operations rely on the companion UserSessions index, which keeps the list of session IDs per user so ListUserSessions and TerminateAllSessions can enumerate them without scanning SQL tables.

Dual-token model

Each session issues two JWTs with different audiences and lifetimes:

pub struct JwtConfig {
    pub secret: CompactString,
    pub refresh_token_expiration: time::Duration,
    pub access_token_expiration: time::Duration,
    pub issuer: CompactString,
    pub access_audience: CompactString,
    pub refresh_audience: CompactString,
}

Access token – Short-lived bearer token for regular APIs guarded by UserAuthLayer. It embeds the user ID (sub) and session ID (sid) and expires per access_token_expiration in the configuration.
Refresh token – Long-lived token scoped to refreshing or terminating the session. Its audience differs from the access token and it carries a longer exp, typically thirty days by default.

Both tokens are minted when SessionService::process(CreateSession) is called. The refresh expiration is also used as the TTL for the session key in Redis, ensuring Redis evicts the record once the refresh token is no longer valid:

pub async fn verify_refresh_token(
    &self,
    refresh_token: &str,
) -> Result<Option<SessionId>, Error> {
    let config = self.load_config().await?;
    let decode = config.jwt.refresh_token_decoder();
    let session_id = decode(refresh_token).ok().map(|c| c.claims.sid);
    Ok(session_id.map(SessionId))
}

The refresh token ID is verified via SessionService::verify_refresh_token before any privileged session action proceeds, keeping refresh operations isolated from the access token path.

Lifecycle

Creation

Login and registration flows call SessionService::CreateSession after authentication succeeds. The service allocates a fresh session UUID, stores the Redis record with a TTL derived from the refresh lifetime, emits a UserLoginEvent for downstream consumers, and returns both JWTs:

async fn process(&self, input: CreateSession) -> Result<(AccessToken, RefreshToken), Error> {
    let session_id = Uuid::new_v4();
    let now = time::OffsetDateTime::now_utc().unix_timestamp() as u64;
    let config = self.load_config().await?;

    let session = Session {
        id: SessionId(session_id),
        user_id: input.user_id,
        terminated: false,
        last_refreshed: now,
    };

    // Store with TTL equal to refresh token expiration
    Session::write_kv_with_ttl(&mut redis, SessionId(session_id), session, refresh_token_expiration).await?;

    let access = config.jwt.generate_access_token(input.user_id, SessionId(session_id))?;
    let refresh = config.jwt.generate_refresh_token(input.user_id, SessionId(session_id))?;

    // Emit login event for downstream consumers
    UserLoginEvent { ... }.send(&self.mq).await?;

    Ok((access, refresh))
}

OAuth and email providers share this path so every login surface behaves consistently.

Refresh

Clients invoke the RefreshSession RPC with the refresh token in the x-refresh-token header. The server decodes the session ID from the token, reloads the Redis record, rejects terminated or missing sessions, enforces the inactivity timeout, and then rewrites the record with an updated timestamp before minting a new access/refresh pair:

pub const REFRESH_TOKEN_HEADER: &str = "x-refresh-token";

async fn process(&self, input: RefreshSession) -> Result<SessionRefreshResult, Error> {
    let Some(mut session) = Session::read(&mut redis, SessionId(input.session_id)).await? else {
        return Ok(SessionRefreshResult::NotFound);
    };

    if session.terminated {
        return Ok(SessionRefreshResult::Terminated);
    }

    let last_refreshed = time::OffsetDateTime::from_unix_timestamp(session.last_refreshed as i64);
    let now = time::OffsetDateTime::now_utc();
    if now - last_refreshed > refresh_expiration {
        return Ok(SessionRefreshResult::Expired);
    }

    session.last_refreshed = now.unix_timestamp() as u64;
    Session::write_kv(&mut redis, SessionId(input.session_id), session).await?;

    let access = config.jwt.generate_access_token(user_id, SessionId(input.session_id))?;
    let refresh = config.jwt.generate_refresh_token(user_id, SessionId(input.session_id))?;
    Ok(SessionRefreshResult::Refreshed(access, refresh))
}

Refreshes never accept the access token, and the refresh token is not honored by UserAuthLayer, so each token stays in its intended lane.

Expiration and revocation

Sessions can disappear through several channels:

Access token expiration – The JWT validator in UserAuthLayer simply rejects expired access tokens, forcing the client to refresh with a valid refresh token:

pub const ACCESS_TOKEN_HEADER: &str = "x-user-authorization";

async fn user_auth(metadata: &HeaderMap, mut redis: RedisConnection) -> Result<UserId, Status> {
    let header = metadata
        .get(ACCESS_TOKEN_HEADER)
        .and_then(|h| h.to_str().ok())
        .ok_or(Status::unauthenticated("Missing authorization header"))?;

    let config = find_config_from_redis::<AuthConfig>(&mut redis).await?;
    let decode = config.jwt.decoder();
    let jwt_claims = decode(header)
        .map_err(|_| Status::unauthenticated("Invalid authorization header"))?
        .claims;
    Ok(UserId(jwt_claims.sub))
}

Refresh inactivity timeout – SessionService::process(RefreshSession) returns Expired once the elapsed time since last_refreshed exceeds refresh_token_expiration, preventing resurrection of idle sessions.
Redis TTL – Because the session key is stored with a TTL equal to the refresh lifetime, Redis will evict it automatically even if the refresh path never runs again.
Scheduled cleanup – The SessionCleanupJob cron scans remaining session keys, deleting any whose last_refreshed predates the configured expiration window to catch edge cases where TTLs were extended or missing.
Manual termination – Users can call UserAccount::TerminateSession with a refresh token to flag a session as terminated; future refresh attempts see the Terminated status and refuse to mint tokens. Password resets also invoke TerminateAllSessions so compromised credentials cannot keep a foothold.

Administrative levers

Operations tooling can directly reuse SessionService::TerminateAllSessions to purge a user’s active logins, and the password-reset flow demonstrates how to hook that processor after security-sensitive events. Beyond the cron cleanup job, there is currently no dedicated admin RPC that lists or manages sessions; support dashboards should wire into the Redis-backed processors (ListUserSessions, TerminateSession, TerminateAllSessions) when that capability is required.

Token-free user APIs

Only the gRPC services mounted without UserAuthLayer skip access-token checks. These “entry” endpoints live on UserAuth and cover registration, login, password resets, OAuth challenges, and session refresh; everything else, including the UserAccount service, is wrapped by UserAuthLayer and requires the access token in the x-user-authorization header. The refresh token is still required via metadata when calling RefreshSession or TerminateSession, but it never unlocks general-purpose APIs.

Register Flow

This document explains how the email-based registration pipeline in the auth module is implemented and what a frontend client must do to integrate with it. The flow is deliberately split into two RPCs: one to issue a magic link by email and another to finalize the account creation. The sections below describe the data contracts, validation rules, and expected UX behavior at each step so that UI engineers can wire the screens without re-reading the Rust implementation.

Step 1. Request a registration email

Use the UserAuth.SendRegisterEmail RPC with the prospective user’s email address and optional referral code.

Field	Notes
`email`	Raw email string entered by the user.
`referral_code`	Optional referral/invitation code. Will be passed to Step 2 via URL query.

Backend behavior

The service first validates the domain against the configurable whitelist/blacklist. Requests to disallowed domains return INVALID_EMAIL immediately:

#[derive(Debug, Clone, Serialize, Deserialize, Default)]
pub struct EmailDomainConfig {
    pub enable_white_list: bool,
    pub white_list: Box<[CompactString]>,
    pub enable_black_list: bool,
    pub black_list: Box<[CompactString]>,
}

impl EmailDomainConfig {
    pub fn check_addr(&self, addr: impl AsRef<str>) -> bool {
      // ...
    }
}

If the email already maps to an existing login, the call returns EMAIL_EXISTS so the UI can direct the user to the login or password reset flow.
When rate limiting is triggered (same address requested again before resend_interval elapses), the server still returns SENT but quietly suppresses a duplicate email. Surface a neutral “email sent” toast to avoid leaking account existence. The default resend_interval is 30 seconds.
For a fresh request, the service creates a magic-link record containing a 32-character auth_key, queues an email via RabbitMQ, and responds with SENT:

#[derive(Debug, Clone, PartialEq, Eq, sqlx::FromRow)]
pub struct EmailVerifyLink {
    pub id: i64,
    pub email: String,
    pub auth_key: String,
    pub send_at: PrimitiveDateTime,
    pub reason: EmailVerifyReason,
    pub user_id: Option<Uuid>,
    pub is_unused: bool,
}

const AUTH_KEY_LENGTH: usize = 32;

pub fn generate_auth_key() -> String {
    // ...
}

Frontend expectations

Present an email field, an optional referral code field, and an action button. On submission, call SendRegisterEmail and branch on the enum:
- SENT: Show success UI (“Check your inbox for a magic link”) and inform the user to click the link in their email. The magic link will include the auth_key and the referral_code (if provided) as URL query parameters, automatically directing them to Step 2.
- INVALID_EMAIL: Highlight the field with a validation error.
- EMAIL_EXISTS: Offer links to sign in or reset password.
Because the backend may skip resending, add a visible countdown using the configured resend_interval (default 30 seconds) before enabling a “Resend” button.
The email template contains a clickable magic link that embeds the auth_key in the URL. When clicked, it should route the user directly to the registration completion page (Step 2) with the token and referral code automatically populated from URL query parameters.

Step 2. Complete registration via magic link

When the user clicks the magic link from the email, they are directed to the registration completion page. The URL will contain:

auth_key – extracted from the URL query parameter, automatically validates the user’s email
referral_code – passed through from Step 1 via URL query parameter (if provided)

The magic link is time-limited. Backend enforcement uses link_expire_after (default 5 minutes). After that, registration attempts fail with INVALID_LINK.

On the frontend, the registration completion page should:

Extract the auth_key from the URL query parameters (this proves the user has access to the email).
Extract the referral_code from the URL query parameters (if present, this was provided in Step 1).
Display a form to collect:
- A password that satisfies the platform’s policy (policy enforcement happens upstream in the password module).
- Display the referral code if present (read-only or hidden field).
- A checkbox “Keep me signed in” that toggles auto-login.

Step 3. Finalize registration

Call UserAuth.RegisterUser with the collected data:

pub struct RegisterUser {
    pub auth_key: String,
    pub password: String,
    pub referral_code: Option<String>,
    pub auto_login: bool,
    pub ip: Option<IpAddr>,
    pub user_agent: Option<String>,
}

Field	Required	Notes
`auth_key`	✓	Magic link token from URL query parameter. Each token is single-use and tied to the email.
`password`	✓	Plain password; hashing happens server-side.
`referral_code`		Optional marketing code from URL query parameter (passed through from Step 1), forwarded unchanged.
`auto_login`	✓	When true, the backend issues access & refresh tokens on success.
`ip`		Optional. If the frontend can detect the client’s public IP (e.g., API gateway), pass it for session metadata.
`user_agent`		Optional string captured from the browser; used for session/device history.

Response handling

RegisterUserReply can return four shapes based on the service result:

pub enum RegisterUserResult {
    Registered(Uuid),
    RegisteredWithSession(Uuid, AccessToken, RefreshToken),
    EmailAlreadyExists,
    InvalidLink,
}

REGISTERED_WITH_SESSION: Registration succeeded and a new session was created. The reply contains the user_id plus access_token and refresh_token. Store them immediately using the same rules as a login response, then route to the signed-in area.
REGISTERED: Registration succeeded but no session was created (because auto_login was false). Route to the login screen and preload the email for convenience.
gRPC error ALREADY_EXISTS: The email was registered after the magic link was issued. Show an “already registered” message and link to sign-in.
INVALID_LINK: Magic link was unknown, already used, or expired. Inform the user that the link is invalid/expired and offer to send a new registration email.

When auto-login is enabled, the backend records the session with the supplied IP and user agent before returning tokens, so capturing accurate metadata is important for the session list and security analytics.

Retrying after failures

If the magic link expires or has already been used, require the user to restart from Step 1 and request a new registration email. Once a link has been consumed, it cannot be retried.

UI checklist

Provide separate screens for Step 1 (email + referral code submission) and Step 2 (password entry after clicking magic link).
In Step 1, include both an email field and an optional referral code field.
Show a visible countdown for magic link expiration and resend availability (based on config defaults; make them configurable via environment/UI constants).
In Step 2, extract both auth_key and referral_code from URL query parameters automatically—no manual token entry is needed.
Display the referral code (if present) on the Step 2 page for user confirmation, either as a read-only field or hidden input.
Ensure all success paths funnel to analytics hooks alongside the user_id returned from the RPC for downstream tracking.
When auto-login is disabled, clearly direct the user to the login screen after successful registration.

This flow mirrors the backend implementation and should keep the frontend in sync with the server-side invariants without re-reading the Rust code every time changes are made.

Password Reset Flow

This document explains how the password reset pipeline in the auth module is implemented and what a frontend client must do to integrate with it. The flow is split into three RPCs: one to request a password reset email with a magic link, one to validate the reset token from the magic link, and one to finalize the password reset. The sections below describe the data contracts, validation rules, and expected UX behavior at each step so that UI engineers can wire the screens without re-reading the Rust implementation.

Step 1. Request a password reset email

Use the UserAuth.SendPasswordResetEmail RPC with the user’s email address.

Field	Notes
`email`	Raw email string entered by the user.

Backend behavior

The service first validates the email format. Requests with invalid email addresses return INVALID_EMAIL immediately.
If the email doesn’t map to any existing account with email login, the call returns NOT_FOUND so the UI can inform the user appropriately.
When rate limiting is triggered (same address requested again before resend_interval elapses, default 30 seconds), the server returns TOO_FREQUENT. The frontend should display a message asking the user to wait before requesting another reset email.
For a valid request, the service creates a password reset link record containing a 32-character auth_key, queues an email via RabbitMQ, and responds with SENT:

#[derive(Debug, Clone, PartialEq, Eq, sqlx::FromRow)]
pub struct EmailVerifyLink {
    pub id: i64,
    pub email: String,
    pub auth_key: String,
    pub send_at: PrimitiveDateTime,
    pub reason: EmailVerifyReason,
    pub user_id: Option<Uuid>,
    pub is_unused: bool,
}

const AUTH_KEY_LENGTH: usize = 32;

Frontend expectations

Present a single email field and an action button. On submission, call SendPasswordResetEmail and branch on the enum:
- PWD_RESET_EMAIL_RESULT_SENT: Show success UI (“Check your inbox for a password reset link”) and inform the user to click the magic link in their email.
- PWD_RESET_EMAIL_RESULT_INVALID_EMAIL: Highlight the field with a validation error.
- PWD_RESET_EMAIL_RESULT_NOT_FOUND: Inform the user that no account exists with this email address.
- PWD_RESET_EMAIL_RESULT_TOO_FREQUENT: Display a message asking the user to wait before requesting another reset email (default 30 seconds).
Add a visible countdown using the configured resend_interval (default 30 seconds) before enabling a “Resend” button.
The email template contains a clickable magic link that embeds the auth_key in the URL. When clicked, it should route the user directly to the password reset page (Step 2/3) with the token automatically populated from URL query parameters.

Step 2. Validate the reset token (optional but recommended)

Use the UserAuth.CheckPasswordResetToken RPC to validate the token before allowing the user to enter a new password. This step is optional but provides better user experience by catching expired or invalid tokens early.

Field	Notes
`token`	The 32-character auth_key extracted from the URL query parameter

Backend behavior

The service looks up the token in the database and validates:
- The token exists
- The token is for password reset (not registration or other purposes)
- The token hasn’t been used yet
- The token hasn’t expired (default expiry is 5 minutes, controlled by link_expire_after)
If all validations pass, it returns valid: true and the expire_at timestamp (Unix timestamp in seconds).
If any validation fails, it returns valid: false with no expiration time.

pub struct CheckPasswordResetToken {
    pub token: String,
}

pub enum CheckPasswordResetTokenResult {
    Valid(PrimitiveDateTime),
    Invalid,
}

Frontend expectations

Extract the auth_key from the URL query parameters when the user lands on the password reset page via the magic link.
Call this RPC with the extracted token to validate it before showing the password reset form.
If valid is true:
- Display the password reset form
- Optionally show a countdown timer based on the expire_at timestamp to inform the user how much time they have left
If valid is false:
- Display an error message that the reset link is invalid or has expired
- Offer a link to request a new password reset email
This validation step helps prevent the user from filling out a new password only to discover the token is invalid when they submit.

Step 3. Collect the new password

The magic link is time-limited (default 5 minutes). After that, reset attempts fail with INVALID_LINK.

On the frontend, the password reset page should:

Use the token (auth_key) extracted from the URL query parameters (Step 2).
Collect a new password that satisfies the platform’s password policy.

Step 4. Finalize password reset

Call UserAuth.ResetPassword with the collected data:

pub struct ResetPassword {
    pub auth_key: String,
    pub new_password: String,
}

Field	Required	Notes
`auth_key`	✓	Token from Step 1. Each token is single-use.
`new_password`	✓	Plain password; hashing happens server-side.

Response handling

ResetPasswordReply can return three results based on the service outcome:

pub enum ResetPasswordResult {
    Success,
    InvalidLink,
    AccountNotFound,
}

RESET_PASSWORD_RESULT_SUCCESS: Password was successfully reset. All active sessions for this user have been terminated for security. Direct the user to the login page with a success message.
RESET_PASSWORD_RESULT_INVALID_LINK: Token was unknown, already used, expired, or not for password reset. Show an error and offer to resend a reset email.
RESET_PASSWORD_RESULT_ACCOUNT_NOT_FOUND: The account associated with this reset token no longer exists. This is rare but can happen if an account was deleted between steps. Display an appropriate error message.

Security considerations

When the password reset succeeds, the backend automatically:

Hashes the new password using the configured password hashing algorithm
Updates the password in the database
Terminates all active sessions for this user account to prevent any potentially compromised sessions from remaining active
Emits a PasswordResetEvent for audit logging and downstream processing

Users will need to log in again with their new password after a successful reset.

Retrying after failures

If the user encounters an INVALID_LINK error, require them to restart from Step 1. Once a token has been consumed or expired, it cannot be retried. The single-use nature of tokens prevents replay attacks.

UI checklist

Provide separate screens for email submission and password reset completion.
When the user lands on the password reset page via magic link, automatically extract the auth_key from the URL query parameters—no manual token entry is needed.
Show a visible countdown for link expiration (5 minutes by default, based on link_expire_after config).
Use CheckPasswordResetToken before showing the password reset form to provide early feedback about token validity and show an expiration countdown.
After successful password reset, clearly direct the user to the login screen and inform them that all their sessions have been terminated for security.
Consider implementing the token validation (Step 2) to improve user experience by catching invalid links before the user enters a new password.

Typical UX flow

A recommended user experience flow:

Forgot Password Page: User enters email → call SendPasswordResetEmail
Check Email Page: Show success message and instructions to click the magic link in their email
Reset Password Page (accessed by clicking the magic link in the email):
- Extract auth_key from URL query parameters
- On page load: call CheckPasswordResetToken with the extracted token
- If valid: show password form with expiration timer
- If invalid: show error and link back to Step 1
Submit New Password: call ResetPassword with the token from URL and new password
Success Page: Inform user their password was reset and all sessions were terminated → redirect to login

This flow mirrors the backend implementation and should keep the frontend in sync with the server-side invariants without re-reading the Rust code every time changes are made.

Authentication for Other Modules

This guide explains how gRPC services outside of the auth module reuse the authentication middleware so they can trust the UserId that reaches their handlers. It focuses on the shared layer, required request metadata, and the pattern every user-facing RPC follows when extracting identity.

Middleware overview

The UserAuthLayer type wraps gRPC routers with a Tower middleware that runs before your service logic. When the layer sees an incoming request it:

Looks for the x-user-authorization header (a raw JWT access token).
Loads the current AuthConfig from Redis via find_config_from_redis so it can decode the token with the same secret and issuer values used during minting.
Decodes and validates the JWT using the access-token audience/issuer rules, producing a UserId newtype when everything checks out.
Stores that UserId in the request extensions so downstream handlers can pull it without re-validating the token.【F:modules/auth/src/rpc/middleware.rs†L12-L110】

GrpcWorker::server_ready installs this layer globally before registering user-scoped services. Any service added after .layer(self.user_auth_middleware) automatically receives authenticated requests and does not need to declare the middleware explicitly.【F:server/src/worker/grpc.rs†L302-L349】

Required request metadata

Clients must send the access token in the x-user-authorization header on every RPC guarded by the middleware. Tokens are the opaque strings returned from login/registration flows; do not prefix them with Bearer. If the header is missing or the token fails validation, user_auth returns a Status::unauthenticated error. The layer logs the failure and forwards the request; when your handler subsequently calls UserId::from_request it will see the error and propagate the unauthenticated status back to the caller.【F:modules/auth/src/rpc/middleware.rs†L74-L110】

The middleware never accepts refresh tokens—those belong in the x-refresh-token header and are handled exclusively by the session RPC (RefreshSession).【F:modules/auth/src/rpc/middleware.rs†L82-L92】【F:modules/auth/src/rpc/auth_service.rs†L288-L331】 Keep access and refresh tokens in their respective lanes to avoid confusing downstream services.

Reading the authenticated user

Inside a gRPC handler, retrieve the caller identity by importing UserId from auth::rpc::middleware and calling UserId::from_request(&request) at the top of the method. That helper pulls the UserId extension set by the middleware and converts it back into a plain Uuid. Every user-facing module (market, telecom, shop, notification, support, etc.) follows this pattern so new services should mirror it.【F:modules/auth/src/rpc/middleware.rs†L94-L110】【F:modules/market/src/rpc/market.rs†L70-L132】【F:modules/telecom/src/rpc/telecom.rs†L69-L265】【F:modules/shop/src/rpc/order.rs†L149-L321】

Avoid re-parsing JWTs or threading user IDs through request payloads; the middleware ensures a single source of truth. If UserId::from_request returns Status::unauthenticated, simply propagate that error to the caller so clients know to refresh their session.

Adding a new authenticated service

When you introduce a new gRPC server that should require user authentication:

Register it after the .layer(self.user_auth_middleware) call in GrpcWorker::server_ready.
In every handler, call UserId::from_request before executing business logic.
Treat the returned Uuid as the authenticated principal and authorize against your domain resources accordingly.

If you need unauthenticated entry points (e.g., public lookups), mount that service before the middleware layer or expose the RPC through the UserAuth service instead. Mixing authenticated and unauthenticated handlers in the same service leads to confusing guarantees, so prefer splitting them.

Testing tips

Integration tests that hit authenticated RPCs should obtain a valid access token through the login helpers and attach it to the request metadata under x-user-authorization. When writing unit tests that call handlers directly, construct a tonic::Request and insert a UserId into its extensions to simulate the middleware path. This mirrors what production infrastructure does and keeps your tests aligned with the runtime behavior.【F:modules/auth/src/rpc/middleware.rs†L30-L110】

Telecom Module

Overview

The Telecom Module is the core networking and proxy management component of the Helium system. It provides comprehensive functionality for managing VPN/proxy networks, user subscriptions, traffic monitoring, and billing operations.

This module implements a sophisticated proxy network infrastructure that supports multiple protocols and backends, including XRayR and SSP compatibility, making it suitable for various deployment scenarios.

Key Features

Node Management

Node Servers: Physical proxy servers that handle user connections
Node Clients: Proxy endpoints that users connect to
Multi-protocol Support: Compatible with various proxy protocols
Geographic Distribution: Node location and route classification
Status Monitoring: Real-time node health and availability tracking

Package System

User Packages: Subscription-based service packages with traffic limits
Package Queue: Automated package activation system
Flexible Billing: Traffic-based and time-based billing models
Traffic Factor: Custom billing multipliers per node

Traffic Analysis

Real-time Monitoring: Track upload/download usage per user
Historical Data: Traffic usage history and trends
Billing Analytics: Calculate actual billed traffic vs raw usage
Node Usage Statistics: Identify popular nodes and usage patterns

Subscription Management

Dynamic Links: Generate subscription URLs for client applications
Multiple Formats: Support various client configurations
Token Management: Secure subscription tokens per user
Auto-configuration: Client-side configuration generation

Architecture

Core Services

Service	Purpose
NodeClientService	Manages proxy client endpoints
NodeServerService	Handles proxy server infrastructure
PackageQueueService	Processes user subscription packages
SubscribeLinkService	Generates subscription links
AnalysisService	Analyzes traffic and usage patterns
ManageService	Administrative operations

gRPC API Structure

The module exposes two main gRPC services:

Telecom Service (helium.telecom)
- User-facing operations
- Node listing and information
- Package management
- Subscription links
- Traffic usage queries
Management Service (helium.telecom_manage)
- Admin operations
- Node server/client CRUD
- Package queue management
- Configuration validation

Data Models

Node Hierarchy

NodeServer (Physical Server)
├── NodeClient (Proxy Endpoint 1)
├── NodeClient (Proxy Endpoint 2)
└── NodeClient (Proxy Endpoint N)

Package Lifecycle

Created → In Queue → Active → Consumed/Cancelled

Integration Points

Dependencies

Database: PostgreSQL for persistent data
Redis: Caching and session management
Message Queue: AMQP for event processing
Authentication: Integration with auth module
Management: Admin interface integration

External Libraries

libsubconv: Subscription format conversion
xrayr_feeder: XRayR backend integration
subscribe_client_config: Client configuration generation

Usage Examples

Creating a Node Client

use telecom::services::manage::ManageService;

let manage_service = ManageService::new(db_pool, redis_conn);

// Create node client via processor pattern
let request = CreateNodeClientRequest {
    server_id: 1,
    name: "US-West-1".to_string(),
    traffic_factor: "1.0".to_string(),
    display_order: 100,
    // ... other fields
};

let result = manage_service.process(request).await?;

Checking User Package

use telecom::services::package_queue::PackageQueueService;

let package_service = PackageQueueService::new(db_pool, redis_conn);

// Get user's current active package
let request = GetCurrentPackage { user_id };
let package_info = package_service.process(request).await?;

Generating Subscription Links

use telecom::services::subscribe_link::SubscribeLinkService;

let subscribe_service = SubscribeLinkService::new(db_pool, redis_conn);

// Generate subscription links for user
let request = GetSubscribeLinks { user_id };
let links = subscribe_service.process(request).await?;

Database Schema

Key database entities:

node_servers: Physical proxy servers
node_clients: Proxy client endpoints
packages: Available service packages
package_queue: User package subscriptions
user_package_usage: Traffic usage tracking
node_status_history: Node availability history

Configuration

Environment Variables

The module uses configuration from telecom::config which includes:

Database connection settings
Redis connection parameters
External service endpoints
Billing calculation parameters

Node Configuration

Both node servers and clients support flexible JSON configuration for different proxy protocols and backends.

Event System

The module implements an event-driven architecture with:

Events

Package Activation: When packages become active
Usage Recording: Traffic usage events
Node Status Changes: Server/client status updates

Hooks

Billing Hook: Calculate traffic charges
Registration Hook: New user package setup
Package Queue Hook: Automated package processing

Automated Tasks (Cron)

The module includes scheduled tasks for:

Package queue processing
Traffic usage aggregation
Node status monitoring
Billing calculations

Testing

Comprehensive test coverage includes:

Unit tests for each service
Integration tests with database
Management service tests
Package queue processing tests
Subscribe link generation tests

Test infrastructure uses testcontainers for isolated testing environments.

Development Guidelines

Service Pattern

All business logic is implemented using the Processor pattern (not object-oriented patterns). Each service exposes its functionality through the Processor trait.

Error Handling

Use anyhow::Result for error propagation
Proper error context with tracing
No unwrap() or expect() calls (forbidden by clippy)

Database Access

Use owned RedisConnection type (not static lifetimes)
Proper connection pooling
Transaction management for consistency

API Documentation

For detailed API documentation, refer to the generated protobuf documentation from:

proto/telecom/telecom.proto
proto/telecom/manage.proto
proto/telecom/common.proto

Troubleshooting

Common Issues

Node Offline: Check server connectivity and configuration
Package Not Activating: Verify queue processing and billing status
Subscription Links Invalid: Check token generation and node availability
Traffic Not Recording: Verify event processing and database connections

Logging

The module uses structured logging with tracing. Key log points:

Package activation/deactivation
Node status changes
Traffic usage recording
API request/response patterns

Migration Notes

When migrating from other proxy management systems:

Import existing node configurations
Migrate user packages and quotas
Set up traffic factor mappings
Configure billing parameters
Test subscription link generation

For specific migration procedures, refer to the migration documentation in doc/src/migration/.

Node Server

Concept

Node Server represents the physical proxy server infrastructure in the Helium telecom system. It is the actual backend server that handles proxy connections and processes user traffic.

Key Concepts

Physical Infrastructure: Node Server is the actual server hardware/software that performs proxy operations
Backend Component: Operates behind the scenes, not directly visible to end users
Traffic Handler: Processes all user proxy connections and traffic routing
Configuration Target: Holds server-side configuration that determines how the proxy server operates

Architecture Position

User Client → Node Client (User-facing) → Node Server (Infrastructure) → Internet

Node Server sits at the infrastructure layer, receiving connections from multiple Node Clients and handling the actual proxy work.

Relationship with Node Client

Aspect	Node Server	Node Client
Purpose	Physical proxy infrastructure	User-facing proxy endpoint
Visibility	Backend/Admin only	Visible to end users
Configuration	Server-side proxy config	Client-side connection config
Relationship	1 server : N clients	N clients : 1 server
Responsibility	Traffic processing	User interface

Configuration

Node Server supports two main configuration types through the NodeServerConfig enum:

Configuration Types

1. NewV2b (UniProxy)

Modern proxy configuration using the UniProxy protocol:

NodeServerConfig::NewV2b(Box<UniProxyProtocolConfig>)

Features:

High-performance proxy protocol
Built-in traffic reporting
Advanced user management
Speed limiting per user
Device limiting

2. SSP (SSPanel Compatible)

Legacy SSPanel-compatible configuration:

NodeServerConfig::Ssp(Box<CustomConfig>)

Features:

SSPanel API compatibility
Traditional proxy methods
Custom host configuration
Legacy traffic reporting

Core Configuration Fields

pub struct NodeServer {
    pub id: i32,
    pub server_side_config: Json<NodeServerConfig>,  // Protocol configuration
    pub speed_limit: i64,                           // Per-user speed limit in Byte/s
    pub status: NodeServerStatus,                   // Online/Offline/Maintenance
    pub last_online_time: PrimitiveDateTime,        // Last heartbeat timestamp
}

Configuration Examples

These JSON examples show the API request format for creating Node Servers with different backend configurations.

Creating NewV2b Node Server

{
  "server_side_config": {
    "compatibility": "newv2b",
    "api_host": "127.0.0.1",
    "api_port": 8080,
    "node_id": 1,
    "cert_mode": "none",
    "cert_domain": "example.com",
    "cert_file": "",
    "key_file": "",
    "ca_file": "",
    "timeout": 30,
    "listen_ip": "0.0.0.0",
    "send_ip": "0.0.0.0",
    "device_limit": 0,
    "speed_limit": 0,
    "rule_list_path": "",
    "dns_type": "AsIs",
    "enable_dns": false,
    "disable_upload_traffic": false,
    "disable_get_rule": false,
    "disable_ivpn_check": false,
    "disable_memory_optimizations": false,
    "enable_reality_show": false,
    "enable_brutal": false,
    "brutal_debug": false,
    "enable_ip_sync": false,
    "ip_sync_interval": 60
  },
  "speed_limit": 1000000000
}

Key Configuration Points:

compatibility: “newv2b” specifies UniProxy protocol backend
speed_limit: 1GB/s total server capacity per user
node_id: Must match the Node Server ID in database
api_host/api_port: Backend API connection settings
cert_mode: Certificate handling (“none”, “file”, “http”, “dns”)

Creating SSP Node Server

{
  "server_side_config": {
    "compatibility": "ssp",
    "host": "proxy.example.com",
    "port": 80,
    "node_id": 1,
    "key": "your-api-key",
    "speed_limit": 0,
    "device_limit": 0,
    "rule_list_path": "",
    "custom_config": {
      "offset_port_user": 0,
      "offset_port_node": 0,
      "server_key": "",
      "host": "proxy.example.com",
      "server_port": 443
    }
  },
  "speed_limit": 500000000
}

Key Configuration Points:

compatibility: “ssp” specifies SSPanel-compatible backend
speed_limit: 500MB/s total server capacity per user
host: SSPanel API host
key: Authentication key for SSPanel API
custom_config: SSPanel-specific configuration options

Configuration Validation

The system provides configuration validation through the gRPC management API:

gRPC Service: helium.telecom_manage.NodeServerManage
Method: VerifyNodeServerConfig

Request:

message VerifyNodeServerConfigRequest {
  string config = 1;
}

Response:

message VerifyReply {
  bool valid = 1;
}

Example gRPC call:

grpcurl -plaintext \
  -d '{"config": "{\"compatibility\":\"newv2b\",\"api_host\":\"127.0.0.1\",\"api_port\":8080}"}' \
  localhost:50051 \
  helium.telecom_manage.NodeServerManage/VerifyNodeServerConfig

Frontend applications should validate server configurations before creating Node Servers to ensure proper backend protocol settings and prevent deployment failures.

JSON Schema Reference

For frontend developers, here are the key JSON structures:

Node Server Creation Request

{
  "server_side_config": {
    // Required: Backend protocol configuration
    "compatibility": "<string>" // Required: "newv2b" or "ssp"
    // Configuration fields vary by compatibility type
  },
  "speed_limit": "<integer>" // Required: Per-user speed limit in Byte/s
}

Compatibility Types

“newv2b”: Modern UniProxy protocol backend
“ssp”: Legacy SSPanel-compatible backend

NewV2b Configuration Schema

{
  "compatibility": "newv2b",
  "api_host": "<string>", // Backend API host
  "api_port": "<integer>", // Backend API port
  "node_id": "<integer>", // Must match database Node Server ID
  "cert_mode": "<string>", // Certificate mode: "none", "file", "http", "dns"
  "cert_domain": "<string>", // Domain for certificate
  "cert_file": "<string>", // Certificate file path
  "key_file": "<string>", // Private key file path
  "ca_file": "<string>", // CA certificate file path
  "timeout": "<integer>", // Request timeout in seconds
  "listen_ip": "<string>", // IP to bind for incoming connections
  "send_ip": "<string>", // IP to use for outgoing connections
  "device_limit": "<integer>", // Device limit per user (0 = no limit)
  "speed_limit": "<integer>", // Speed limit per user (0 = no limit)
  "rule_list_path": "<string>", // Path to routing rules file
  "dns_type": "<string>", // DNS resolution type
  "enable_dns": "<boolean>", // Enable DNS server
  "disable_upload_traffic": "<boolean>",
  "disable_get_rule": "<boolean>",
  "disable_ivpn_check": "<boolean>",
  "disable_memory_optimizations": "<boolean>",
  "enable_reality_show": "<boolean>",
  "enable_brutal": "<boolean>",
  "brutal_debug": "<boolean>",
  "enable_ip_sync": "<boolean>",
  "ip_sync_interval": "<integer>" // IP sync interval in seconds
}

SSP Configuration Schema

{
  "compatibility": "ssp",
  "host": "<string>", // SSPanel API host
  "port": "<integer>", // SSPanel API port
  "node_id": "<integer>", // Must match database Node Server ID
  "key": "<string>", // API authentication key
  "speed_limit": "<integer>", // Speed limit per user (0 = no limit)
  "device_limit": "<integer>", // Device limit per user (0 = no limit)
  "rule_list_path": "<string>", // Path to routing rules file
  "custom_config": {
    // SSPanel-specific settings
    "offset_port_user": "<integer>",
    "offset_port_node": "<integer>",
    "server_key": "<string>",
    "host": "<string>",
    "server_port": "<integer>"
  }
}

Management and Observability

Server Status Management

Node Servers have three possible states:

Status Values:

["online", "offline", "maintenance"]

Status Descriptions:

“online”: Server is healthy and processing requests
“offline”: Server missed heartbeat threshold
“maintenance”: Server manually marked for maintenance

Status Transitions

Online → Offline: Automatic when last_online_time exceeds offline_timeout
Offline → Online: Automatic when server sends heartbeat
Any → Maintenance: Manual admin action
Maintenance → Online: Manual admin action + heartbeat

Heartbeat System

Node Servers maintain connectivity through heartbeat reporting via traffic upload APIs. The specific API endpoint depends on the server compatibility type:

NewV2b servers: Use /api/v1/server/UniProxy/push (see NewV2b Traffic Reporting section)
SSP servers: Use /mod_mu/users/traffic (see SSP Traffic Reporting section)

Heartbeat Behavior:

Heartbeat is automatically updated when servers report traffic data
last_online_time is set to current timestamp on each successful API call
Failed API calls do not update heartbeat timestamp

Health Monitoring Configuration

Configure health check timeouts through system configuration:

{
  "node_health_check_config": {
    "offline_timeout": 600
  }
}

Configuration Fields:

offline_timeout: Seconds after which a server is marked offline (default: 600 = 10 minutes)

Health Check Logic:

If current_time - last_online_time > offline_timeout, server status becomes “offline”
If current_time - last_online_time <= offline_timeout, server status becomes “online”

Automated Status Updates

The system runs automated background tasks for status management:

Automated Operations:

Periodic Status Refresh: Runs every 5 minutes to update server statuses
Offline Detection: Marks servers offline if last_online_time exceeds threshold
Online Recovery: Automatically marks servers online when they resume heartbeat
Status History: Records status change events for monitoring and analytics

Status Update API for Monitoring:

Status updates are handled automatically by the system background tasks. Administrative monitoring uses the gRPC management API:

gRPC Service: helium.telecom_manage.NodeServerManage
Method: ListNodeServers

Request:

message ListNodeServersRequest {
  int64 limit = 1;
  int64 offset = 2;
  optional NodeServerStatus filter_status = 3;
}

Response:

message ListNodeServersReply {
  repeated NodeServerSummary servers = 1;
}

message NodeServerSummary {
  int32 id = 1;
  NodeServerCompatibility compatibility = 2;
  NodeServerStatus status = 3;
  int64 last_online_time = 4;
  int64 client_number = 5;
}

enum NodeServerCompatibility {
  NODE_SERVER_COMPATIBILITY_NEW_V2B = 0;
  NODE_SERVER_COMPATIBILITY_SSP = 1;
}

Example gRPC call:

grpcurl -plaintext \
  -d '{"limit": 100, "offset": 0}' \
  localhost:50051 \
  helium.telecom_manage.NodeServerManage/ListNodeServers

Administrative Operations

Node Server management is restricted to system administrators through the management gRPC service. The system implements role-based access control (RBAC) with four distinct admin levels, each with specific capabilities.

Admin Levels and Permissions

Admin Level	Node Server Capabilities	Access Level
SuperAdmin	• List all servers • View detailed server information • Create new servers • Delete servers • Modify server configurations • Validate server configurations • Manual status overrides	Full access to all operations
Moderator	• List all servers • View detailed server information • Create new servers • Delete servers • Modify server configurations • Validate server configurations • Manual status overrides	Full access except super-admin exclusive operations
CustomerSupport	• List all servers (read-only) • View basic server status • Monitor server health	Read-only access for support purposes
SupportBot	• No direct server access	Automated systems only, no server management

Detailed Permission Matrix

Server Management Operations:

List Servers (list_servers): ✅ SuperAdmin, ✅ Moderator, ✅ CustomerSupport
Show Server Details (show_server): ✅ SuperAdmin, ✅ Moderator
Create Server (create_server): ✅ SuperAdmin, ✅ Moderator
Delete Server (delete_server): ✅ SuperAdmin, ✅ Moderator
Verify Configuration (verify_server_config): ✅ SuperAdmin, ✅ Moderator

Access Control Features:

All operations require valid admin authentication tokens
Each administrative action is logged and auditable
Server deletion is protected by dependency checking (cannot delete servers with active node clients)
Configuration changes are validated before application
Role-based restrictions prevent unauthorized access

Safety Mechanisms:

Dependency validation prevents accidental service disruption
Configuration validation ensures server stability
Comprehensive audit logging tracks all administrative changes
Graceful handling of server state transitions
Permission checks occur before any operation execution

Traffic Monitoring

Node Servers automatically collect traffic statistics through API endpoints:

NewV2b Traffic Reporting

Servers push traffic data periodically using the UniProxy traffic upload API:

POST /api/v1/server/UniProxy/push?node_id=1&node_type=V2ray&token=your-server-token
Content-Type: application/json

{
  "123": [5000000, 1000000],
  "456": [8000000, 2000000]
}

Request Details:

Path: /api/v1/server/UniProxy/push
Authentication: Query parameters (node_id, node_type, token)
Body Format: {"user_id": [download_bytes, upload_bytes]}
Content-Type: application/json

Response:

HTTP 200 OK

SSP Traffic Reporting

SSPanel-compatible servers use the legacy traffic reporting API:

POST /mod_mu/users/traffic?node_id=1&key=your-api-key
Content-Type: application/json

[
  {
    "u": 123,
    "d": 1000000,
    "upload": 5000000
  },
  {
    "u": 456,
    "d": 2000000,
    "upload": 8000000
  }
]

Request Details:

Path: /mod_mu/users/traffic
Authentication: Query parameters (node_id, key)
Body Format: Array of {"u": user_id, "d": download_bytes, "upload": upload_bytes}
Content-Type: application/json

Response:

HTTP 200 OK

Observability Features

Status History Tracking

The system automatically records server status changes for monitoring and analytics:

Tracked Events:

Status transitions (online ↔ offline ↔ maintenance)
Uptime statistics and availability metrics
Heartbeat intervals and response times
Configuration change events

Status History API:

Status history is tracked automatically by the system. For administrative queries, use the gRPC management API:

gRPC Service: helium.telecom_manage.NodeServerManage
Method: ShowNodeServer

Request:

message ShowNodeServerRequest {
  int32 id = 1;
}

Response:

message NodeServerReply {
  int32 id = 1;
  int64 speed_limit = 2;
  string config = 3;
  NodeServerStatus status = 4;
  int64 last_online_time = 5;
}

enum NodeServerStatus {
  NODE_SERVER_STATUS_ONLINE = 0;
  NODE_SERVER_STATUS_OFFLINE = 1;
  NODE_SERVER_STATUS_MAINTENANCE = 2;
}

Example gRPC call:

grpcurl -plaintext \
  -d '{"id": 1}' \
  localhost:50051 \
  helium.telecom_manage.NodeServerManage/ShowNodeServer

Metrics and Logging

Structured Logging: All operations use tracing for detailed logs
Performance Metrics: Traffic throughput and response times
Health Metrics: Heartbeat intervals and status transitions
Error Tracking: Failed authentications and connection issues

When to Use Node Server vs Node Client

Use Node Server When:

Setting up new physical infrastructure

gRPC Service: helium.telecom_manage.NodeServerManage
Method: CreateNodeServer

Request:

message CreateNodeServerRequest {
  string config = 1;
  int64 speed_limit = 2;
}

Response:

message AdminEditReply {
  AdminEditResult result = 1;
}

Example gRPC call:

grpcurl -plaintext \
  -d '{"config": "{\"compatibility\":\"newv2b\",\"api_host\":\"127.0.0.1\",\"api_port\":8080}", "speed_limit": 10000000000}' \
  localhost:50051 \
  helium.telecom_manage.NodeServerManage/CreateNodeServer

Configuring proxy backend behavior
- Protocol selection (UniProxy vs SSPanel)
- Server-side performance settings
- Traffic processing configuration
Managing physical resources
- Server capacity planning
- Geographic deployment
- Infrastructure monitoring
Backend administration
- Server health monitoring
- Traffic aggregation
- System maintenance

Use Node Client When:

Creating user-facing proxy endpoints

gRPC Service: helium.telecom_manage.NodeClientManage
Method: CreateNodeClient

Request:

message CreateNodeClientRequest {
  int32 server_id = 1;
  string name = 2;
  string traffic_factor = 3;
  int32 display_order = 4;
  string client_side_config = 5;
  repeated int32 available_groups = 6;
  optional helium.telecom.NodeMetadata metadata = 7;
}

Response:

message AdminEditReply {
  AdminEditResult result = 1;
}

Example gRPC call:

grpcurl -plaintext \
  -d '{"server_id": 1, "name": "US West Coast", "traffic_factor": "1.0", "display_order": 100, "available_groups": [2, 3, 4], "config": "{\"protocol\":\"Vmess\"}"}' \
  localhost:50051 \
  helium.telecom_manage.NodeClientManage/CreateNodeClient

Organizing user access
- Different service tiers (Premium, Basic)
- Geographic regions for users
- Access control by user groups
Billing and traffic management
- Different traffic factors per endpoint
- User-specific speed limits
- Package-based access control
User experience customization
- Display names and ordering
- Regional preferences
- Service level differentiation

Typical Workflow

Infrastructure Setup: Create Node Servers for physical infrastructure
Service Configuration: Create multiple Node Clients pointing to each server
User Management: Assign users to appropriate Node Clients based on packages
Monitoring: Monitor Node Server health while tracking Node Client usage

Example Architecture

Physical Infrastructure Layer (Node Servers):
├── US-West-Server (NewV2b, 10GB/s capacity)
├── EU-Central-Server (NewV2b, 5GB/s capacity)
└── Asia-Pacific-Server (SSP, 3GB/s capacity)

User-Facing Layer (Node Clients):
├── US-West-Premium (→ US-West-Server, 2x traffic factor)
├── US-West-Standard (→ US-West-Server, 1x traffic factor)
├── EU-Premium (→ EU-Central-Server, 2x traffic factor)
├── EU-Standard (→ EU-Central-Server, 1x traffic factor)
├── Asia-Premium (→ Asia-Pacific-Server, 2x traffic factor)
└── Asia-Budget (→ Asia-Pacific-Server, 0.5x traffic factor)

This separation allows for:

Flexible service offerings without infrastructure changes
Independent scaling of physical and logical resources
Simplified user management through logical groupings
Cost-effective resource utilization across multiple service tiers

Security Considerations

Authentication

Node Servers authenticate using different methods depending on compatibility type:

NewV2b Authentication (Query Parameters)

POST /api/v1/server/UniProxy/push?node_id=1&node_type=V2ray&token=your-server-token

Authentication Process:

Server includes node_id, node_type, and token as query parameters
Backend validates token and node_id combination
If valid, request is processed and heartbeat updated
If invalid, request is rejected with 401 Unauthorized

SSP Authentication (Query Parameters)

POST /mod_mu/users/traffic?node_id=1&key=your-api-key

Authentication Process:

Server includes node_id and key as query parameters
Backend validates key and node_id combination
If valid, request is processed and heartbeat updated
If invalid, request is rejected with 401 Unauthorized

Authentication Configuration:

{
  "telecom_config": {
    "vpn_server_token": "your-secure-server-token-here"
  }
}

Note: The authentication tokens are configured per node server and validated against the vpn_server_token configuration. Both NewV2b and SSP servers use the same token validation mechanism, but with different request formats.

Access Control

Server-side configuration is admin-only
API endpoints require proper authentication
Traffic data is validated before processing
Heartbeat verification prevents spoofing

Data Protection

All traffic statistics are aggregated and anonymized
Configuration data is encrypted at rest
API communications use secure channels
User identification uses secure tokens

Troubleshooting

Common Issues

Server Shows Offline
- Check heartbeat timing configuration
- Verify server can reach the API endpoints
- Confirm authentication tokens are correct
- Review network connectivity
Traffic Not Reporting
- Verify server configuration type matches API calls
- Check traffic threshold filtering (>10KB)
- Confirm database connectivity
- Review authentication tokens
Configuration Validation Failures
- Validate JSON syntax in server config
- Check protocol-specific requirements
- Verify all required fields are present
- Test with minimal configuration first
Performance Issues
- Monitor server resource utilization
- Check speed_limit configuration
- Review traffic patterns and peaks
- Consider load balancing across servers

Debugging Tools

Health Check API: Monitor server status programmatically
Traffic Reports: Analyze throughput and usage patterns
Status History: Review historical availability data
Configuration Validation: Test configs before deployment
Structured Logging: Detailed operation traces with tracing

Best Practices

Monitor heartbeat intervals regularly
Use automation for status management
Implement proper alerting for offline servers
Regular configuration backups
Capacity planning based on traffic trends
Geographic distribution for reliability

Node Client

Purpose of Node Client

Node Client serves as the user-facing proxy endpoint in the Helium telecom system. It provides a crucial abstraction layer that enables flexible service delivery while maintaining efficient resource utilization.

Service Tier Differentiation

The primary purpose of Node Client is to enable different service tiers to utilize the same physical infrastructure while providing distinct user experiences. Consider this scenario:

Physical Infrastructure (Node Server):
├── High-performance server in US-West datacenter
│   ├── 10Gbps bandwidth capacity
│   └── Premium network routing

User-Facing Services (Node Clients):
├── "US Premium" (traffic_factor: 2.0, high-priority routing)
├── "US Standard" (traffic_factor: 1.0, standard routing)
└── "US Budget" (traffic_factor: 0.5, economy routing)

All three service tiers connect to the same physical density server, but users access them through different entry servers that provide different characteristics:

Budget Plan: Uses cheaper entry server with higher latency to user, but same backend density server
Premium Plan: Uses premium entry server with optimized routing, same backend density server
Standard Plan: Balanced entry server performance, same backend density server

Business Value

This architecture enables:

Cost-Effective Infrastructure: One physical server supports multiple service tiers
Flexible Pricing Models: Different billing rates (traffic_factor) for same infrastructure
Access Control: User package groups control which nodes are accessible
Geographic Organization: Logical grouping by region while optimizing server placement
Service Quality Differentiation: Different route classes and metadata per service tier

Concept of Node Client

Architecture Overview

Node Client operates as the entry point of the proxy line - it’s what users see and configure in their proxy client applications.

User's Proxy Client → Node Client (User-facing) → Node Server (Infrastructure) → Internet

Key Relationships

Component	Role	Visibility	Configuration Focus
Node Server	Physical infrastructure	Admin-only	Server-side proxy protocols, capacity
Node Client	User-facing endpoint	User-visible	Client-side connection settings, billing

Core Concepts

1. Server Relationship

pub struct NodeClient {
    pub server_id: i32,  // Points to the physical Node Server
    // ... other fields
}

Each Node Client must reference an existing Node Server. This creates a 1:N relationship where one physical server can support multiple user-facing endpoints.

2. Traffic Factor System

// Billing calculation:
// Billed Traffic = Actually Used Traffic × Traffic Factor
pub traffic_factor: Decimal,

Traffic factor enables flexible billing models:

0.5: Budget tier (user pays for half of actual usage)
1.0: Standard tier (user pays for actual usage)
2.0: Premium tier (user pays double, presumably for better service)

3. Access Control System

pub available_groups: Vec<i32>,  // Package groups that can access this node

Users can only access Node Clients if their active package belongs to one of the available_groups. This enables:

Package-based access control: Different subscription tiers access different nodes
Geographic restrictions: Certain packages only access specific regions
Service level enforcement: Premium packages get access to premium nodes

4. Protocol Configuration

pub client_side_config: Json<NodeClientConfig>,

Node Client stores the client-side protocol configuration that determines how users connect. This includes protocol-specific settings for:

VMess, VLess, Trojan, Shadowsocks, Hysteria2, WireGuard, etc.
Connection parameters (hostname, port, encryption methods)
Transport settings (WebSocket, gRPC, TCP, etc.)

5. Metadata System

pub struct NodeClientMetadata {
    pub country: Option<CountryCode>,      // Geographic identification
    pub location: Option<Locations>,       // Regional classification
    pub route_class: Option<RouteClass>,   // Service quality indicator
}

Route Classes define service quality expectations:

SpecialCustom: Enterprise-grade infrastructure
Premium: High-end infrastructure (IPLC, dedicated lines)
Backbone: Standard backbone infrastructure (most common)
GlobalAccess: International access nodes
Budget: Cost-optimized infrastructure
Experimental: Testing and development nodes

Configuration of Node Client

Core Configuration Fields

pub struct NodeClient {
    pub id: i32,                                          // Unique identifier
    pub server_id: i32,                                   // Physical server reference
    pub name: String,                                     // Display name for users
    pub traffic_factor: Decimal,                          // Billing multiplier
    pub display_order: i32,                               // Sort order in client apps
    pub client_side_config: Json<NodeClientConfig>,       // Protocol configuration
    pub available_groups: Vec<i32>,                       // Access control groups
    pub node_metadata: Json<NodeClientMetadata>,          // Geographic/quality metadata
    pub created_at: PrimitiveDateTime,                    // Creation timestamp
    pub updated_at: PrimitiveDateTime,                    // Last modification
}

Configuration Examples

These JSON examples show the API request format that frontend applications should use when creating Node Clients.

Creating a Premium VMess Node Client

{
  "server_id": 1,
  "name": "🇺🇸 US Premium West",
  "traffic_factor": "2.0",
  "display_order": 100,
  "available_groups": [1, 2],
  "client_side_config": {
    "protocol": "Vmess",
    "v": 2,
    "hostname": "premium-us.example.com",
    "port": 443,
    "alter_id": 0,
    "encrypt_method": "auto",
    "network": "ws",
    "fake_type": "none",
    "host": "premium-us.example.com",
    "path": "/premium-path",
    "tls": "tls",
    "sni": "premium-us.example.com",
    "alpn": null,
    "fingerprint": null
  },
  "metadata": {
    "country": "US",
    "location": "north_america",
    "route_class": "premium"
  }
}

Key Configuration Points:

server_id: References the physical Node Server (ID: 1)
traffic_factor: “2.0” means users pay 2x actual usage (premium pricing)
available_groups: [1, 2] restricts access to Premium and Enterprise packages
protocol: “Vmess” specifies VMess protocol with WebSocket transport
route_class: “premium” indicates high-end infrastructure

Creating a Budget Shadowsocks Node Client

{
  "server_id": 3,
  "name": "🇸🇬 Singapore Budget",
  "traffic_factor": "0.5",
  "display_order": 900,
  "available_groups": [3, 4],
  "client_side_config": {
    "protocol": "Ss",
    "server": "budget-asia.example.com",
    "port": 8080,
    "cipher": "aes-256-gcm",
    "server_key": null,
    "obfs": null,
    "plugin": null
  },
  "metadata": {
    "country": "SG",
    "location": "southeast_asia",
    "route_class": "budget"
  }
}

Key Configuration Points:

server_id: References different physical server (ID: 3)
traffic_factor: “0.5” means users pay half actual usage (budget pricing)
available_groups: [3, 4] restricts access to Basic and Student packages
protocol: “Ss” specifies Shadowsocks with AES-256-GCM encryption
display_order: 900 (lower priority in client app display)

Creating a Hysteria2 High-Performance Node

{
  "server_id": 2,
  "name": "🇩🇪 Germany Hysteria2",
  "traffic_factor": "1.5",
  "display_order": 200,
  "available_groups": [1, 2, 5],
  "client_side_config": {
    "protocol": "Hy2",
    "server": "hy2-eu.example.com",
    "port": 443,
    "ports": "20000-55000",
    "obfs": "salamander",
    "obfs_password": "secret123",
    "alpn": ["h3"],
    "up": "100 Mbps",
    "down": "500 Mbps",
    "sni": "hy2-eu.example.com",
    "skip_cert_verify": false,
    "ca": null,
    "ca_str": null,
    "fingerprint": null,
    "cwnd": 32
  },
  "metadata": {
    "country": "DE",
    "location": "europe",
    "route_class": "backbone"
  }
}

Key Configuration Points:

traffic_factor: “1.5” for premium protocol pricing
available_groups: [1, 2, 5] for Premium, Enterprise, and Gaming packages
protocol: “Hy2” specifies Hysteria2 with QUIC transport
ports: Port range for UDP multiplexing
obfs: “salamander” obfuscation method

Protocol Support

Node Client supports a comprehensive range of proxy protocols through the NodeClientConfig enum:

Protocol	Use Case	Authentication Method
VMess	General purpose, good compatibility	UUID-based
VLess	Modern, lower overhead than VMess	UUID-based
Trojan	Designed to bypass DPI	Password-based
Shadowsocks	Lightweight, good performance	Password-based
ShadowsocksR	Enhanced Shadowsocks with obfuscation	Password-based
Hysteria2	High-performance UDP-based protocol	Password-based
Tuic	QUIC-based, low latency	UUID + Password
WireGuard	VPN protocol, excellent performance	Private key-based
Trojan-Go	Enhanced Trojan implementation	Password-based
HTTP/HTTPS/SOCKS	Basic proxy protocols	Username + Password

Configuration Validation

The system provides configuration validation through the gRPC management API:

gRPC Service: helium.telecom_manage.NodeClientManage
Method: VerifyNodeClientConfig

Request:

message VerifyNodeClientConfigRequest {
  string config = 1;
}

Response:

message VerifyReply {
  bool valid = 1;
}

Example gRPC call:

grpcurl -plaintext \
  -d '{"config": "{\"protocol\":\"Vmess\",\"hostname\":\"test.example.com\",\"port\":443}"}' \
  localhost:50051 \
  helium.telecom_manage.NodeClientManage/VerifyNodeClientConfig

Frontend applications should validate configurations before creating Node Clients to ensure proper protocol settings and prevent runtime errors.

JSON Schema Reference

For frontend developers, here are the key JSON structures:

Node Client Creation Request

{
  "server_id": "<integer>", // Required: Physical server ID
  "name": "<string>", // Required: Display name
  "traffic_factor": "<decimal_string>", // Required: Billing multiplier (e.g., "1.0", "2.5")
  "display_order": "<integer>", // Required: Sort order (higher = lower priority)
  "available_groups": ["<integer>"], // Required: Package group IDs
  "client_side_config": {
    // Required: Protocol configuration
    "protocol": "<protocol_name>" // Required: See Protocol Support table
    // Protocol-specific fields vary
  },
  "metadata": {
    // Optional: Geographic/quality metadata
    "country": "<iso_country_code>", // Optional: Two-letter country code
    "location": "<location_enum>", // Optional: Geographic region
    "route_class": "<route_class_enum>" // Optional: Service quality tier
  }
}

Location Enum Values

[
  "north_america",
  "south_america",
  "europe",
  "east_asia",
  "southeast_asia",
  "south_asia",
  "middle_east",
  "africa",
  "oceania",
  "arctic",
  "antarctic"
]

Route Class Enum Values

[
  "special_custom",
  "premium",
  "backbone",
  "global_access",
  "budget",
  "experimental"
]

Common Protocol Configurations

VMess Protocol:

{
  "protocol": "Vmess",
  "v": 2,
  "hostname": "<hostname>",
  "port": "<integer>",
  "alter_id": "<integer>",
  "encrypt_method": "<string|null>",
  "network": "<string|null>",
  "fake_type": "<string|null>",
  "host": "<string|null>",
  "path": "<string|null>",
  "tls": "<string|null>",
  "sni": "<string|null>",
  "alpn": "<string[]|null>",
  "fingerprint": "<string|null>"
}

Shadowsocks Protocol:

{
  "protocol": "Ss",
  "server": "<hostname>",
  "port": "<integer>",
  "cipher": "<string>",
  "server_key": "<string|null>",
  "obfs": "<object|null>",
  "plugin": "<object|null>"
}

Hysteria2 Protocol:

{
  "protocol": "Hy2",
  "server": "<hostname>",
  "port": "<integer>",
  "ports": "<string|null>",
  "obfs": "<string|null>",
  "obfs_password": "<string|null>",
  "alpn": "<string[]|null>",
  "up": "<string|null>",
  "down": "<string|null>",
  "sni": "<string|null>",
  "skip_cert_verify": "<boolean>",
  "ca": "<string|null>",
  "ca_str": "<string|null>",
  "fingerprint": "<string|null>",
  "cwnd": "<integer|null>"
}

Administrative Management

Node Client management requires administrator privileges with role-based access control:

Admin Level	Permissions
SuperAdmin	Full CRUD operations, configuration validation
Moderator	Full CRUD operations, configuration validation
CustomerSupport	Read-only access for support purposes
SupportBot	No direct access to node management

When to Use Node Client vs Node Server

For a comprehensive comparison of when to use Node Client versus Node Server, including decision matrices, workflow examples, and use case scenarios, see the Node Server vs Node Client Usage Guide.

Quick Decision Guide for Node Client

Use Node Client when you need:

User-facing service tiers (Premium, Standard, Budget)
Service differentiation with same physical infrastructure
Flexible billing models via traffic factors
Package-based access control
Geographic service organization
Protocol-specific client configurations

Node Client Specific Examples

The following examples demonstrate Node Client’s unique capabilities:

1. Creating User-Facing Service Tiers

Scenario: Same physical server (ID: 1), multiple service levels

Premium Tier:

{
  "server_id": 1,
  "name": "🏆 US Premium Plus",
  "traffic_factor": "2.0",
  "available_groups": [1],
  "client_side_config": {
    "protocol": "Vmess",
    "hostname": "premium.example.com",
    "port": 443,
    "network": "ws",
    "tls": "tls"
  },
  "metadata": {
    "route_class": "premium"
  }
}

Standard Tier:

{
  "server_id": 1,
  "name": "⚡ US Standard",
  "traffic_factor": "1.0",
  "available_groups": [2, 3],
  "client_side_config": {
    "protocol": "Vmess",
    "hostname": "standard.example.com",
    "port": 443,
    "network": "tcp"
  },
  "metadata": {
    "route_class": "backbone"
  }
}

Budget Tier:

{
  "server_id": 1,
  "name": "💰 US Economy",
  "traffic_factor": "0.5",
  "available_groups": [4],
  "client_side_config": {
    "protocol": "Ss",
    "server": "budget.example.com",
    "port": 8080,
    "cipher": "aes-256-gcm"
  },
  "metadata": {
    "route_class": "budget"
  }
}

2. Implementing Geographic Service Organization

Physical servers in strategic locations:

EU Server (ID: 2): Frankfurt datacenter
Asia Server (ID: 3): Singapore datacenter

UK Node Client (uses Frankfurt server):

{
  "server_id": 2,
  "name": "🇬🇧 United Kingdom",
  "traffic_factor": "1.0",
  "available_groups": [1, 2, 3],
  "client_side_config": {
    "protocol": "Vless",
    "hostname": "uk.example.com",
    "port": 443,
    "encrypt_method": "none",
    "network": "ws",
    "tls": "tls"
  },
  "metadata": {
    "country": "GB",
    "location": "europe",
    "route_class": "premium"
  }
}

Singapore Node Client (uses Singapore server):

{
  "server_id": 3,
  "name": "🇸🇬 Singapore",
  "traffic_factor": "1.0",
  "available_groups": [1, 2, 3],
  "client_side_config": {
    "protocol": "Vless",
    "hostname": "sg.example.com",
    "port": 443,
    "encrypt_method": "none",
    "network": "tcp"
  },
  "metadata": {
    "country": "SG",
    "location": "southeast_asia",
    "route_class": "backbone"
  }
}

3. Protocol Differentiation for Same Server

Multi-protocol capable server (ID: 4):

VMess Endpoint:

{
  "server_id": 4,
  "name": "VMess - US West",
  "traffic_factor": "1.0",
  "display_order": 100,
  "available_groups": [2, 3, 4],
  "client_side_config": {
    "protocol": "Vmess",
    "v": 2,
    "hostname": "vmess.example.com",
    "port": 443,
    "alter_id": 0,
    "network": "ws",
    "tls": "tls"
  }
}

Hysteria2 Endpoint (same server, faster protocol):

{
  "server_id": 4,
  "name": "Hysteria2 - US West",
  "traffic_factor": "1.2",
  "display_order": 150,
  "available_groups": [1, 2],
  "client_side_config": {
    "protocol": "Hy2",
    "server": "hy2.example.com",
    "port": 443,
    "obfs": "salamander",
    "up": "50 Mbps",
    "down": "200 Mbps"
  }
}

4. Access Control and Package Management

Enterprise Dedicated Node (restricted access):

{
  "server_id": 1,
  "name": "🏢 Enterprise Dedicated",
  "traffic_factor": "3.0",
  "available_groups": [1],
  "client_side_config": {
    "protocol": "Vless",
    "hostname": "enterprise.example.com",
    "port": 443,
    "encrypt_method": "none",
    "network": "grpc",
    "tls": "tls"
  },
  "metadata": {
    "route_class": "special_custom"
  }
}

Consumer Standard Node (broader access):

{
  "server_id": 1,
  "name": "👤 Consumer Standard",
  "traffic_factor": "1.0",
  "available_groups": [2, 3, 4],
  "client_side_config": {
    "protocol": "Vmess",
    "v": 2,
    "hostname": "consumer.example.com",
    "port": 443,
    "alter_id": 0,
    "network": "ws"
  },
  "metadata": {
    "route_class": "backbone"
  }
}

Node Server Use Cases

For complete Node Server use cases and detailed infrastructure examples, see the Node Server documentation.

1. Setting Up Physical Infrastructure

Node Server setup focuses on backend infrastructure configuration:

{
  "server_side_config": {
    "compatibility": "newv2b"
    // UniProxy protocol configuration for actual proxy server
    // Backend-specific settings not visible to end users
  },
  "speed_limit": 10000000000
}

Key Focus:

Server-side proxy protocol configuration
Physical infrastructure capacity (10GB/s total)
Backend performance tuning
Not visible to end users

2. Configuring Backend Proxy Behavior

Protocol selection (UniProxy vs SSPanel compatibility)
Server-side performance settings
Traffic processing configuration
Capacity and resource limits

3. Physical Resource Management

Server capacity planning
Geographic server deployment
Infrastructure health monitoring
Hardware resource allocation

4. Backend System Administration

Server heartbeat monitoring
Traffic aggregation processing
System maintenance operations
Infrastructure-level configuration

Decision Matrix

Requirement	Use Node Client	Use Node Server
User-facing configuration	✅ Yes	❌ No
Service tier differentiation	✅ Yes	❌ No
Billing rate control	✅ Yes	❌ No
Access control by package	✅ Yes	❌ No
Protocol-specific client config	✅ Yes	❌ No
Physical infrastructure setup	❌ No	✅ Yes
Server capacity management	❌ No	✅ Yes
Backend protocol configuration	❌ No	✅ Yes
System resource monitoring	❌ No	✅ Yes

Development Workflow

For the complete development workflow including infrastructure setup, service configuration, and monitoring phases, see the Development Workflow Guide in the Node Server documentation.

Node Client Management APIs

For Node Client specific operations, use these gRPC endpoints:

Create Node Client:

// helium.telecom_manage.NodeClientManage/CreateNodeClient
message CreateNodeClientRequest {
  int32 server_id = 1;
  string name = 2;
  string traffic_factor = 3;
  int32 display_order = 4;
  string client_side_config = 5;
  repeated int32 available_groups = 6;
  optional helium.telecom.NodeMetadata metadata = 7;
}

Edit Access Groups:

// helium.telecom_manage.NodeClientManage/EditNodeClientGroups
message EditNodeClientGroupsRequest {
  int32 id = 1;
  repeated int32 available_groups = 2;
}

List Node Clients:

// helium.telecom_manage.NodeClientManage/ListNodeClients
message ListNodeClientsRequest {}

message ListNodeClientsReply {
  repeated NodeClientAdminGlance nodes = 1;
}

See Also: Node Server gRPC APIs for server management operations.

Node Client Best Practices

Service Tier Strategy: Use Node Client to create multiple service tiers (Premium, Standard, Budget) on the same physical infrastructure
Traffic Factor Planning: Set appropriate billing multipliers:
- 0.5 for budget tiers
- 1.0 for standard service
- 2.0+ for premium tiers
Access Control: Always configure available_groups to enforce package-based restrictions
User Experience: Use meaningful names, emojis, and proper display_order for client applications
Geographic Metadata: Include country codes and location metadata to help users choose optimal endpoints
Protocol Selection: Choose appropriate protocols based on target user base and technical requirements

For comprehensive best practices covering both Node Client and Node Server management, see the Best Practices Guide in the Node Server documentation.

Version Control of Packages

This document explains how the package version control system works in Helium, ensuring purchased packages remain consistent while allowing marketing teams to update offerings. This is essential for maintaining service reliability and customer trust.

The Concept of Packages

A package is the fundamental service offering unit that defines what a user receives when they purchase a production. Each package contains:

Traffic Limit: Maximum data transfer allowed (in bytes)
Max Client Number: Maximum simultaneous client connections
Expire Duration: How long the package remains valid after activation
Available Group: Access control group determining which proxy nodes are accessible
Version: Version number for tracking changes

Packages are organized into Package Series - logical groupings identified by UUID that contain multiple versions of the same offering. When a user purchases a production, they receive a package from the associated series.

pub struct Package {
    pub id: i64,
    pub series: Uuid,           // Groups related packages
    pub version: i32,           // Version within the series
    pub is_master: bool,        // Currently delivered version
    pub available_group: i32,   // Access control group
    pub max_client_number: i32, // Connection limit
    pub expire_duration: PgInterval, // Validity period
    pub traffic_limit: i64,     // Data transfer limit
}

Version Control System

The version control system ensures package stability for users while enabling content updates for marketing. The core principle is:

Production always delivers the master version of a package series

Master Package Concept

Within each package series, exactly one package is marked as is_master = true. This is the version that:

Gets delivered to users when they purchase a production
Appears in production listings and user interfaces
Represents the current “live” offering

-- Finding the master package for a series
SELECT * FROM "telecom"."packages"
WHERE series = $1 AND is_master = TRUE
LIMIT 1

Version Control Benefits

User Protection: Once a user purchases a package, their service parameters never change unexpectedly
Marketing Flexibility: Marketing can update package contents by creating new versions
Rollback Capability: Previous versions remain available for troubleshooting or rollbacks
Audit Trail: Complete history of package changes through version tracking

Two Types of Editing Operations

The system supports two distinct editing patterns depending on the impact and intent:

Create New Version (Version-Changing Edit)

When to use: When making changes that affect the core service offering or user experience.

Use cases:

Changing traffic limits or client connection limits
Modifying expire duration
Updating available proxy groups
Any change that affects what users receive

Process:

Create a new package in the same series with incremented version
Set the new package as is_master = true
Set the previous master as is_master = false
New purchases will receive the new version
Existing users keep their original package version

Example:

-- Old master: series=uuid-123, version=1, is_master=true, traffic_limit=100GB
-- New master: series=uuid-123, version=2, is_master=true, traffic_limit=200GB

Update Without Version Change (Non-Version Edit)

When to use: When making changes that don’t affect core functionality or user experience.

Use cases:

Internal metadata updates
Performance optimizations that don’t change user-visible behavior
Bug fixes that don’t alter service parameters
Administrative flags or internal tracking data

Process:

Directly update the existing package record
Version number remains unchanged
is_master status remains unchanged
Changes may affect both new and existing users (use carefully)

Example:

-- Update internal flags without affecting service delivery
UPDATE "telecom"."packages"
SET internal_metadata = $1
WHERE id = $2

Decision Matrix

Change Type	Version Impact	User Impact	Edit Type
Traffic limit increase	High	Positive	New Version
Client limit change	High	Variable	New Version
Proxy group modification	High	Variable	New Version
Performance optimization	Low	None	No Version Change
Internal metadata	None	None	No Version Change
Bug fix (no behavior change)	Low	Positive	No Version Change

Admin Capabilities

Administrators have several tools for managing packages and the version control system:

Package Queue Management

Admins can directly manage user package assignments:

Add Queued Package: Assign specific packages to users
Cancel Queued Package: Remove packages from user queues
List/Count Queued Packages: Monitor package distribution

pub struct AdminAddQueuedPackage {
    pub user_id: Uuid,
    pub package_id: i64,      // Direct package ID, not series
    pub by_order: Option<Uuid>, // Optional order reference
}

Production Management

Admins control the production catalog that references package series:

Create Production: Define new offerings linked to package series
Delete Production: Remove offerings from the market
View Production Details: See master package information

pub struct AdminCreateProduction {
    pub package_series: Uuid,    // References the series
    pub package_amount: i32,     // Number of packages to deliver
    // ... other production fields
}

Version Control Operations

Current Limitations: The codebase shows that direct package creation/editing APIs are not yet implemented in the admin interface. Package management currently happens at the database level.

Typical Admin Workflow:

Create new package versions via database operations
Update is_master flags to promote new versions
Create/update productions to reference package series
Monitor package queues and user assignments

Admin Roles and Permissions

Different admin roles have different package management capabilities:

SuperAdmin: Full access to all package operations
Moderator: Can manage package queues and productions
CustomerSupport: Can add/cancel user packages for support purposes

Best Practices

Always use version changes for user-facing modifications
Test new package versions before setting as master
Maintain clear version history with meaningful version increments
Monitor user impact when promoting new package versions
Keep rollback capability by preserving previous versions
Document version changes for team coordination

Technical Integration

Database Schema

-- Package series grouping
CREATE TABLE "telecom"."package_series" (
    id UUID PRIMARY KEY
);

-- Individual packages with version control
CREATE TABLE "telecom"."packages" (
    id BIGSERIAL PRIMARY KEY,
    series UUID REFERENCES "telecom"."package_series"(id),
    version INTEGER NOT NULL,
    is_master BOOLEAN NOT NULL DEFAULT FALSE,
    -- ... service parameters
    UNIQUE(series, version)
);

Key Queries

-- Get current master package for a series
SELECT * FROM packages WHERE series = ? AND is_master = TRUE;

-- Promote a package to master (transaction required)
BEGIN;
UPDATE packages SET is_master = FALSE WHERE series = ?;
UPDATE packages SET is_master = TRUE WHERE id = ?;
COMMIT;

This version control system ensures that the platform can evolve its offerings while maintaining service consistency for existing users, providing both stability and flexibility for business operations.

Package Queue

Overview

The Package Queue is a core component of the Telecom module that manages user packages in a queue-based system. It ensures that users can have multiple packages but only one active package at a time, with automatic activation of the next package when the current one expires.

Concept

What is Package Queue?

The Package Queue is a system that manages the lifecycle of telecom packages for users. Think of it as a “playlist” for packages - users can have multiple packages in their queue, but only one plays (is active) at a time. When the current package expires or is consumed, the system automatically activates the next package in line.

Key Characteristics

Single Active Package Rule: Each user can only have one active package at any given time
FIFO Queue: Packages are activated in First-In-First-Out order based on created_at timestamp
Automatic Activation: When an active package expires, the next queued package is automatically activated
Traffic Tracking: Each package tracks upload/download usage with quota adjustments
Event-Driven: Package lifecycle changes trigger events for other system components

Package States

pub enum LivePackageStatus {
    /// The package is in the queue, but not active
    InQueue,

    /// The package that the user is using
    Active,

    /// The package that has expired due to time or traffic limits
    Consumed,

    /// The package that was cancelled (e.g., refunded)
    Cancelled,
}

How it Works

Data Structure

The core data structure is PackageQueueItem:

pub struct PackageQueueItem {
    pub id: i64,
    pub user_id: Uuid,
    pub package_id: i64,              // Reference to package definition
    pub by_order: Option<Uuid>,       // Optional order ID that created this item
    pub status: LivePackageStatus,
    pub created_at: PrimitiveDateTime,
    pub activated_at: Option<PrimitiveDateTime>,

    // Traffic usage tracking
    pub upload: i64,                  // Billed upload traffic in bytes
    pub download: i64,                // Billed download traffic in bytes
    pub adjust_quota: i64,           // Quota adjustment (can be negative or positive)
}

Queue Processing Workflow

1. Package Creation

When packages are purchased, they are added to the queue with InQueue status:

// Single package
CreateQueueItem { user_id, package_id, by_order }

// Multiple identical packages
CreateQueueItems { user_id, package_id, by_order, amount }

2. Automatic Activation

The system automatically activates packages through the process_package_queue_push function:

pub async fn process_package_queue_push(
    transaction: &mut sqlx::Transaction<'_, sqlx::Postgres>,
    user_id: Uuid,
) -> Result<PackageQueuePushResult, sqlx::Error>

Activation Logic:

Check if user has an active package
If active package exists, do nothing
If no active package, find the oldest queued package (ORDER BY created_at)
Activate the found package by setting status = 'active' and activated_at = NOW()

3. Traffic Usage Recording

Traffic usage is recorded through the billing system:

pub struct RecordPackageUsage {
    pub user_id: Uuid,
    pub upload: i64,    // Additional upload traffic to bill
    pub download: i64,  // Additional download traffic to bill
}

Billing Logic:

Find the user’s active package
Add the new traffic to existing usage counters
Check if total usage exceeds limit: (upload + download) >= (traffic_limit + adjust_quota)
If limit exceeded, automatically set status to Consumed

4. Package Expiration

Packages can expire due to two reasons:

Time expiration: Handled by cron jobs that check expire_at timestamps
Usage expiration: Triggered automatically when traffic limits are exceeded during billing

When a package expires, a PackageExpiringEvent is published.

5. Queue Advancement

When a package expires, the system automatically activates the next package:

PackageExpiringEvent is consumed by TelecomPackageQueueHook
Expired package status is updated to Consumed
System looks for the next InQueue package for the user
If found, activates it and publishes PackageActivateEvent
If no more packages, publishes AllPackageExpiredEvent

Concurrency Control

The system uses Redis-based distributed locks to prevent race conditions:

pub struct PackageQueueLock;

Lock Usage:

Lock ID: User ID (LockId(user_id))
TTL: 30 seconds default
Retry Logic: Up to 10 retries for lock acquisition
Operations Protected:
- Package queue push processing
- Package expiration handling
- Package activation

Event System

The Package Queue publishes several events for system integration:

PackageQueuePushEvent

pub struct PackageQueuePushEvent {
    pub item_ids: Vec<i64>,
    pub user_id: Uuid,
    pub package_id: i64,
    pub pushed_at: u64,
}

Purpose: Internal event when packages are added to queue
Route: telecom.package_queuing

PackageActivateEvent

pub struct PackageActivateEvent {
    pub item_id: i64,
    pub user_id: Uuid,
    pub package_id: i64,
    pub activated_at: u64,
}

Purpose: Internal event when a package becomes active
Route: telecom.package_activate

PackageExpiringEvent

pub struct PackageExpiringEvent {
    pub item_id: i64,
    pub user_id: Uuid,
    pub package_id: i64,
    pub expired_at: u64,
    pub reason: PackageExpiredReason, // Time or Usage
}

Purpose: Internal event when a package expires
Route: telecom.package_expiring

Service Layer

The PackageQueueService provides high-level operations:

pub struct PackageQueueService {
    pub db: DatabaseProcessor,
    pub redis: RedisConnection,
}

Available Operations:

GetCurrentPackage: Get user’s active package info
GetAllMyPackages: List all packages for a user

Database Schema

The package queue is stored in the telecom.package_queue table with the following key indexes:

User-based queries: (user_id, status)
Queue ordering: (user_id, status, created_at)
Package lookup: (package_id)

Usage Examples

For Developers

Adding Packages to Queue

// Add single package
let item = db.process(CreateQueueItem {
    user_id: user.id,
    package_id: package.id,
    by_order: Some(order.id),
}).await?;

// Add multiple identical packages
let items = db.process(CreateQueueItems {
    user_id: user.id,
    package_id: package.id,
    by_order: Some(order.id),
    amount: 5,
}).await?;

Getting Active Package

let service = PackageQueueService { db, redis };
let active_package = service.process(GetCurrentPackage {
    user_id: user.id,
}).await?;

Recording Traffic Usage

// This is typically done by the billing system
let record = db.process(RecordPackageUsage {
    user_id: user.id,
    upload: 1024 * 1024,    // 1MB upload
    download: 10 * 1024 * 1024,  // 10MB download
}).await?;

if record.map(|r| r.expired).unwrap_or(false) {
    // Package expired due to usage, will trigger queue advancement
}

Integration Points

Shop Module: Creates queue items when users purchase packages
Billing System: Records traffic usage and triggers expiration
Node Management: Checks active packages for user access control
Admin Interface: Views and manages user package queues

Best Practices

Always use transactions when modifying package queue state
Handle lock acquisition failures gracefully with retries
Listen to package events for system integration
Consider quota adjustments when calculating available traffic
Test concurrent scenarios due to multi-user nature of the system

Common Pitfalls

Race Conditions: Always use Redis locks when modifying queue state
Transaction Boundaries: Ensure event publishing happens after database commits
Zero-Duration Packages: Handle edge cases where packages expire immediately
Quota Calculations: Remember that adjust_quota can be negative
Event Ordering: Package events may arrive out of order in distributed systems

Usage Recording Flow

Overview

The Usage Recording Flow is a comprehensive traffic monitoring and billing system in the Telecom Module that tracks user bandwidth consumption, applies billing multipliers, and processes package usage. This system ensures accurate billing while providing detailed analytics for both users and administrators.

The flow consists of three main phases:

Data Collection: Node servers report raw traffic data
Aggregation: Raw usage is collected, multiplied by traffic factors, and prepared for billing
Billing: Aggregated usage is applied to user packages, potentially expiring them when limits are exceeded

System Architecture

┌─────────────────┐    ┌──────────────────────┐    ┌─────────────────────┐
│   Node Servers  │───▶│ Traffic Report APIs  │───▶│  Raw Usage Storage  │
└─────────────────┘    └──────────────────────┘    └─────────────────────┘
                                                              │
                                                              ▼
┌─────────────────────┐    ┌──────────────────┐    ┌──────────────────────┐
│ Package Expiry      │◀───│   Billing Hook   │◀───│ Cron Jobs - Traffic  │
│ Events              │    │                  │    │ Billing              │
└─────────────────────┘    └──────────────────┘    └──────────────────────┘
          ▲                          │                        │
          │                          ▼                        ▼
┌─────────────────────┐    ┌──────────────────┐    ┌──────────────────────┐
│ Package Expiration  │    │ Package Usage    │    │ Usage Aggregation    │
│ Check               │    │ Update           │    │                      │
└─────────────────────┘    └──────────────────┘    └──────────────────────┘
                                                              │
                                                              ▼
                                                    ┌──────────────────────┐
                                                    │   Message Queue      │
                                                    │ (UserUsageBilling    │
                                                    │  Event)              │
                                                    └──────────────────────┘

Data Flow:

Node Servers → Report traffic via APIs (ReportTraffic, UploadUniProxyTraffic)
Traffic Report APIs → Store raw usage in user_traffic_usage table
Cron Jobs → Periodically gather unbilled usage and aggregate with traffic factors
Usage Aggregation → Create UserUsageBillingEvent messages
Message Queue → Distribute billing events to processing hooks
Billing Hook → Apply usage to user packages and check limits
Package Usage Update → Update package usage counters
Package Expiration Check → Determine if package limits exceeded
Package Expiry Events → Trigger package expiration workflows

Core Components

1. Data Collection Layer

Node Traffic Reporting APIs

ReportTraffic: Legacy SSP-compatible traffic reporting
UploadUniProxyTraffic: Modern UniProxy traffic reporting

Both APIs collect traffic reports from node servers and store them as raw usage records.

2. Data Storage

`user_traffic_usage` Table

Stores individual traffic records with the following key fields:

user_id: User UUID
upload/download: Raw traffic in bytes
node_client_id: Source node information
timestamp: When traffic occurred
has_been_billed: Billing status flag

3. Aggregation & Billing System

Components:

TelecomCronExecutor: Scheduled jobs for billing
GatherUnbilledUsage: Aggregates and marks unbilled traffic
UserUsageBillingEvent: Message queue events for billing
TelecomBillingHook: Processes billing events and updates packages

Detailed Flow Walkthrough

Phase 1: Data Collection

Node servers periodically report traffic usage to the system:

// Example: UniProxy traffic reporting
let report = UploadUniProxyTraffic {
    node_id: 1,
    records: vec![
        TrafficReportRecord {
            user_id: 12345, // number_id from token system
            upload: 1048576,   // 1MB uploaded
            download: 5242880, // 5MB downloaded
        }
    ]
};

// Process the report
let result = node_server_service.process(report).await?;

Key Processing Steps:

Validation: Only records with total traffic > 10KB are processed
User Resolution: The number_id is resolved to actual user_id via tokens
Node Matching: System finds the appropriate node_client based on user’s active package
Storage: Raw usage is stored in user_traffic_usage with has_been_billed = FALSE

// Internal process: InsertTrafficReportBatch
impl Processor<InsertTrafficReportBatch, Result<(), sqlx::Error>> for DatabaseProcessor {
    async fn process(&self, input: InsertTrafficReportBatch) -> Result<(), sqlx::Error> {
        // Complex SQL that resolves number_id -> user_id -> node_client
        // and inserts traffic records with proper node association
    }
}

Phase 2: Aggregation & Billing Preparation

The system runs periodic cron jobs (typically every few minutes) to process unbilled usage:

// Cron job execution
impl Processor<PrimitiveDateTime, Result<Box<[BillTrafficJob]>, Error>> for TelecomCronExecutor {
    async fn process(&self, _input: PrimitiveDateTime) -> Result<Box<[BillTrafficJob]>, Error> {
        // Gather all unbilled usage and create billing jobs
        let find_result = self
            .db
            .process(GatherUnbilledUsage)  // Critical aggregation step
            .await?
            .into_iter()
            .map(BillTrafficJob)
            .collect::<Box<[_]>>();
        Ok(find_result)
    }
}

Critical Aggregation Process (GatherUnbilledUsage):

-- This query does several important things:
WITH updated AS (
    UPDATE "telecom"."user_traffic_usage" AS u
    SET has_been_billed = TRUE  -- Mark as billed to prevent double-billing
    FROM "telecom"."node_client" AS nc
    WHERE u.node_client_id = nc.id AND u.has_been_billed = FALSE
    RETURNING
        nc.server_id,
        u.user_id,
        CEIL(u.download::numeric * nc.traffic_factor) AS billed_download,  -- Apply traffic multiplier
        CEIL(u.upload::numeric * nc.traffic_factor) AS billed_upload,
        u.timestamp
)
SELECT
    server_id,
    user_id,
    SUM(billed_download)::BIGINT AS billed_download,
    SUM(billed_upload)::BIGINT AS billed_upload,
    MAX(timestamp) AS time
FROM updated
GROUP BY server_id, user_id  -- Aggregate by server and user

Key Operations:

Atomic Marking: Marks records as billed to prevent race conditions
Traffic Factor Application: Applies node-specific billing multipliers (nc.traffic_factor)
Aggregation: Groups usage by server and user for efficient processing
Ceiling Function: Ensures no fractional billing (always rounds up)

Phase 3: Package Billing & Expiration

For each aggregated usage record, the system publishes a billing event:

// Create and send billing event
let event = UserUsageBillingEvent {
    server_id: item.server_id,
    user: item.user_id,
    billed_download: item.billed_download,
    billed_upload: item.billed_upload,
    time: start_time.assume_utc().unix_timestamp() as u64,
};
event.send(&mq).await?;  // Send to message queue

The TelecomBillingHook consumes these events and applies usage to user packages:

impl Processor<UserUsageBillingEvent, Result<(), Error>> for TelecomBillingHook {
    async fn process(&self, event: UserUsageBillingEvent) -> Result<(), Error> {
        // Apply usage to user's active package
        let Some(record) = self
            .db
            .process(RecordPackageUsage {
                user_id: event.user,
                upload: event.billed_upload,
                download: event.billed_download,
            })
            .await?
        else {
            error!(user_id = %event.user, "Cannot find package for user");
            return Err(Error::NotFound);
        };

        // Check if package exceeded limits
        if record.expired {
            // Send package expiration event
            let ev = PackageExpiringEvent {
                item_id: record.item_id,
                user_id: record.user_id,
                package_id: record.package_id,
                expired_at: event.time,
                reason: PackageExpiredReason::Usage,  // Expired due to traffic usage
            };
            ev.send(&self.mq).await?;
        }
        Ok(())
    }
}

Package Usage Recording (RecordPackageUsage):

This is the most critical operation that:

Finds user’s currently active package
Adds billed traffic to package usage counters
Checks if total usage exceeds package limits
Automatically transitions package status to ‘consumed’ if limits exceeded

-- Simplified version of the package usage update query
UPDATE "telecom"."package_queue" AS pq
SET upload = pq.upload + $2,
    download = pq.download + $3,
    status = CASE
        WHEN pq.upload + $2 + pq.download + $3 >= p.traffic_limit + pq.adjust_quota
            THEN 'consumed'::telecom.live_package_status
        ELSE 'active'::telecom.live_package_status
    END
FROM "telecom"."packages" AS p
WHERE pq.user_id = $1 AND pq.package_id = p.id

Developer Usage Guide

Adding New Traffic Sources

To add support for new node types or reporting formats:

Create Traffic Report Structure:

#[derive(Debug, Clone, PartialEq)]
pub struct MyCustomTrafficReport {
    pub node_id: i32,
    pub usage_records: Vec<MyCustomRecord>,
}

Implement Processor:

impl Processor<MyCustomTrafficReport, Result<ReportResult, Error>> for NodeServerService {
    async fn process(&self, input: MyCustomTrafficReport) -> Result<ReportResult, Error> {
        // Convert to standard TrafficReportRecord format
        let records: Vec<TrafficReportRecord> = input
            .usage_records
            .into_iter()
            .map(|r| TrafficReportRecord {
                number_id: r.user_number,
                upload: r.sent_bytes,
                download: r.received_bytes,
            })
            .filter(|r| r.download + r.upload > 10_000)  // Filter minimum threshold
            .collect();

        // Use existing batch insertion
        self.db.process(InsertTrafficReportBatch {
            server_id: input.node_id,
            timestamp: now(),
            records,
        }).await?;

        Ok(ReportResult::Ok)
    }
}

Querying Usage Data

Get User’s Recent Usage

// Get hourly usage for the last 24 hours
let usage_data = db.process(GetUserHourlyUsage {
    user: user_id,
    begin: now - Duration::hours(24),
    end: now,
}).await?;

for record in usage_data {
    println!("Hour: {}, Raw: {}MB, Billed: {}MB",
        record.time,
        (record.upload + record.download) / 1_048_576,
        (record.billed_upload + record.billed_download) / 1_048_576
    );
}

Monitor Unbilled Usage

// Check for pending billing (useful for monitoring)
let unbilled = db.process(GatherUnbilledUsage).await?;
println!("Found {} users with unbilled usage", unbilled.len());

Analytics & Reporting

Traffic Factor Impact Analysis

// Compare raw vs billed traffic to understand traffic factor impact
let usage = db.process(GetUserDailyUsage {
    user_id,
    begin: start_date,
    end: end_date,
}).await?;

for day in usage {
    let raw_total = day.upload + day.download;
    let billed_total = day.billed_upload + day.billed_download;
    let factor = billed_total as f64 / raw_total as f64;
    println!("Date: {}, Factor: {:.2}x", day.time, factor);
}

Important Developer Considerations

1. Race Conditions & Data Integrity

Billing Flag: The has_been_billed flag prevents double-billing during concurrent cron jobs
Atomic Updates: Use database transactions for critical operations
Event Ordering: Message queue ensures billing events are processed in order

2. Traffic Factor System

Node-Specific Multipliers: Each node_client has a traffic_factor (e.g., 1.0, 1.5, 2.0)
Ceiling Rounding: Always rounds up to prevent under-billing
Factor Changes: Changing factors only affects new traffic, not historical data

3. Performance Optimization

Minimum Threshold: Only records with >10KB total traffic are processed
Batch Processing: Traffic reports are processed in batches for efficiency
Index Usage: Ensure proper indexing on user_id, has_been_billed, and timestamp

4. Error Handling

// Always handle missing package scenarios
if let Some(record) = db.process(RecordPackageUsage { ... }).await? {
    // Process successful usage recording
} else {
    // User has no active package - this is expected for expired/inactive users
    warn!("User {} has no active package for billing", user_id);
}

5. Monitoring & Alerting

Key metrics to monitor:

Unbilled Records: Should trend toward zero between cron runs
Failed Billing Events: Errors in TelecomBillingHook processing
Package Expiration Rate: Monitor PackageExpiringEvent frequency
Traffic Factor Distribution: Ensure factors are applied correctly

6. Testing Considerations

When writing tests:

// Always test with realistic traffic factors
let test_factor = 1.5;
let raw_usage = 1_000_000; // 1MB
let expected_billed = (raw_usage as f64 * test_factor).ceil() as i64; // 1,500,000

// Test edge cases around package limits
let package_limit = 1_000_000_000; // 1GB
let usage_just_under = package_limit - 1;
let usage_just_over = package_limit + 1;

This usage recording system provides robust, scalable traffic monitoring with accurate billing that scales to handle high-traffic proxy networks while maintaining data integrity and providing detailed analytics.

Observability

This document describes the observability features implemented in the telecom module. These features enable users and administrators to monitor system performance, track usage, and maintain visibility into the health of the telecom infrastructure.

User Observability Features

The telecom module provides several APIs that allow users to monitor their usage and the status of their assigned nodes.

Traffic Usage Tracking

The AnalysisService provides comprehensive traffic monitoring capabilities for users through the GetRecentTrafficUsage API.

API Endpoint

gRPC Service: Telecom.GetRecentTrafficUsage
Request: GetRecentTrafficUsageRequest
Response: RecentTrafficUsageResponse

Implementation Details

The service provides traffic data in different time ranges:

Day: Hourly bucketing for the last 24 hours
Week: Daily bucketing for the last 7 days
Month: Daily bucketing for the last 30 days

Key Components:

Raw Traffic: Actual traffic consumed by the user
Billed Traffic: Traffic that was actually charged to the user’s quota (may differ due to traffic multipliers)

// Usage example in service layer
use crate::services::analysis::{AnalysisService, GetRecentTrafficUsage, RecentRange};

let usage_response = analysis_service.process(GetRecentTrafficUsage {
    user_id: user_id,
    range: RecentRange::Day,  // or Week, Month
}).await?;

// Response contains two data sets:
// - usage_response.raw: actual traffic consumed
// - usage_response.actually_billed: traffic charged to quota

Database Queries Used:

GetUserHourlyUsage: For day-range queries
GetUserDailyUsage: For week/month-range queries

Node Status History

Users can monitor the historical status of their assigned proxy nodes through the ListNodeStatusHistory API.

API Endpoint

gRPC Service: Telecom.ListNodeStatusHistory
Request: ListNodeStatusHistoryRequest
Response: ListNodeStatusHistoryReply

Implementation Details

The API provides hourly aggregated node status information:

Online Nodes: Count of nodes that were online in each hour
Offline Nodes: Count of nodes that were offline in each hour
Maintenance Nodes: Count of nodes under maintenance in each hour

// Usage example
use crate::services::analysis::{AnalysisService, ListUserNodeStatusHistory};

let history = analysis_service.process(ListUserNodeStatusHistory {
    start: start_time,
    end: end_time,
    user_id: user_id,
}).await?;

// Each history entry contains:
// - bucket_start: timestamp of the hour
// - online_nodes, offline_nodes, maintenance_nodes: counts for that hour

Data Source: Uses materialized view node_status_hourly_mv for efficient querying.

Node Usage Analytics

The system tracks which nodes users utilize most frequently through the ListUsuallyUsedNodes API.

API Endpoint

gRPC Service: Telecom.ListUsuallyUsedNodes
Request: ListUsuallyUsedNodesRequest
Response: ListUsuallyUsedNodesResponse

Implementation Details

Provides analytics on user’s node usage patterns:

Node Information: ID, name of frequently used nodes
Traffic Statistics: Upload, download, and billed traffic per node

// Usage example
use crate::services::analysis::{AnalysisService, ListUserUsuallyUsedNodes};

let nodes = analysis_service.process(ListUserUsuallyUsedNodes {
    user_id: user_id,
}).await?;

// Each node entry contains:
// - node_client_id, node_name: identification
// - upload, download, billed_traffic: usage statistics

Node List and Status

Users can view their available nodes and their current status through the ListNodes API.

API Endpoint

gRPC Service: Telecom.ListNodes
Request: ListNodesRequest
Response: ListNodesReply

Implementation Details

Provides real-time information about user’s assigned nodes:

Node Details: ID, name, traffic factor, display order
Performance Info: Speed limits, current status
Metadata: Country, location, route class

Admin Observability Features

Administrators have access to comprehensive monitoring and management capabilities for the entire telecom infrastructure.

Server Monitoring

List Node Servers

Admins can monitor all proxy servers in the system.

gRPC Service: NodeServerManage.ListNodeServers
Features:
- Filter by server status (Online/Offline/Maintenance)
- Pagination support (limit/offset)
- Shows server compatibility, status, last online time, and client count

// Usage example in manage service
use crate::services::manage::{AdminListServers, ManageService};

let servers = manage_service.process(AdminListServers {
    limit: 50,
    offset: 0,
    filter_status: Some(NodeServerStatus::Offline), // Optional filter
}).await?;

Show Individual Server Details

gRPC Service: NodeServerManage.ShowNodeServer
Features:
- Complete server configuration
- Current status and performance metrics
- Last online timestamp

Node Client Management

List All Node Clients

Comprehensive view of all proxy node clients.

gRPC Service: NodeClientManage.ListNodeClients
Features:
- Complete client information including server relationships
- Traffic factors and routing configurations
- Status monitoring and metadata

Individual Client Details

gRPC Service: NodeClientManage.ShowNodeClient
Features:
- Detailed client configuration
- Associated server information
- Performance and status metrics

Package Queue Monitoring

Queue Statistics

Monitor package queue health and performance.

gRPC Service: PackageQueueManage.CountQueuedPackages
Features:
- Count of packages by series
- Queue status overview

Package List Management

gRPC Service: PackageQueueManage.ListQueuedPackages
Features:
- Filter by user, order, package, or status
- Pagination support
- Complete package lifecycle visibility

Background Job Monitoring

The telecom module runs several scheduled jobs for system maintenance and monitoring:

Node Health Monitoring (`RefreshServerStatus`)

Purpose: Automatically mark servers as online/offline based on heartbeat
Frequency: Configurable via TelecomConfig.node_health_check.offline_timeout
Implementation: TelecomCronExecutor in cron.rs

Package Expiration Management (`PackageExpiringJob`)

Purpose: Automatically expire packages based on time limits
Frequency: Regular scanning for expired packages
Events: Publishes PackageExpiringEvent to message queue

Traffic Billing Processing (`BillTrafficJob`)

Purpose: Process unbilled traffic usage and publish billing events
Frequency: Regular processing of accumulated traffic data
Events: Publishes UserUsageBillingEvent for each user

Node Status History Recording (`RecordNodeStatusHistoryJob`)

Purpose: Record current status of all node servers for historical analysis
Frequency: Hourly status snapshots
Storage: Populates node_status_history table

Status View Refresh (`RefreshNodeStatusViewJob`)

Purpose: Refresh the materialized view for efficient status queries
Frequency: Regular refresh of node_status_hourly_mv
Optimization: Includes data cleanup and analysis for performance

Event-Driven Observability

Usage Billing Events

The system processes usage data through asynchronous events:

`UserUsageBillingEvent`

Publisher: External systems (cron jobs, usage collectors)
Consumer: TelecomBillingHook
Route: telecom.user_usage_billing
Purpose: Record user traffic consumption and trigger package expiration

// Event structure
pub struct UserUsageBillingEvent {
    pub server_id: i32,
    pub user: Uuid,
    pub billed_download: i64,
    pub billed_upload: i64,
    pub time: u64,
}

`PackageExpiringEvent`

Publisher: Telecom billing system
Route: Package expiration processing
Purpose: Handle package lifecycle events

Tracing and Instrumentation

All RPC endpoints and critical services include comprehensive tracing:

Instrumentation: Uses tracing::instrument for observability
Error Logging: Structured error reporting with context
Performance Tracking: Request/response times and error rates

Database Schema for Observability

Core Tables

`node_status_history`

Stores historical node status data:

- id: Primary key
- node_server_id: Reference to node server
- status: Online/Offline/Maintenance
- created_at: Timestamp

`user_package_usage`

Tracks user traffic consumption:

- Hourly and daily aggregations
- Raw and billed traffic separation
- User and server associations

Materialized Views

`node_status_hourly_mv`

Optimized view for status history queries:

Hourly aggregations of node status
Efficient querying for analytics
Automatic refresh via cron jobs

Usage Guidelines

For Users

Use GetRecentTrafficUsage to monitor bandwidth consumption
Check ListNodeStatusHistory for node availability patterns
Analyze ListUsuallyUsedNodes to optimize node selection
Monitor ListNodes for real-time node status

For Administrators

Use server management APIs to monitor infrastructure health
Monitor package queues for system performance
Review cron job logs for automated maintenance status
Analyze event streams for system-wide observability

Development Considerations

All APIs follow the Processor pattern [[memory:6079830]]
Database connections use owned types, not static lifetimes [[memory:7107428]]
Comprehensive error handling with structured logging
Event-driven architecture for scalable monitoring
Materialized views for performance-critical queries

This observability framework provides complete visibility into the telecom system’s operation, enabling both users and administrators to monitor, analyze, and optimize the service effectively.

Fetching Config

Overview

The Fetching Config system provides a subscription-based mechanism for users to dynamically retrieve proxy client configurations. This system allows users to get up-to-date proxy server configurations compatible with their preferred proxy clients without manual intervention.

A subscribe link is a unique URL that allows users to fetch their personalized proxy configuration from the telecom service. Each user has a unique subscription token that grants access to their allowed proxy nodes based on their active package and permissions.

Token Generation: Each user gets a unique subscription_link_key (UUID) stored in the nodes_token table
Template URLs: The system supports multiple subscribe endpoints configured via SubscribeLinkConfig
Dynamic Generation: The final subscribe URL is generated by replacing {SUBSCRIBE_TOKEN} placeholder with the user’s actual token
Access Control: Users can only access nodes their active package group allows

pub struct SubscribeLinkTemplate {
    pub url_template: String,    // e.g., "https://subscribe.congyu.moe/subscribe/{SUBSCRIBE_TOKEN}"
    pub endpoint_name: String,   // e.g., "default"
}

The default configuration provides a template like:

https://subscribe.congyu.moe/subscribe/{SUBSCRIBE_TOKEN}

Which becomes a working URL like:

https://subscribe.congyu.moe/subscribe/550e8400-e29b-41d4-a716-446655440000

Supported Proxy Clients

The system supports the following proxy clients through the libsubconv library:

Supported Client Types

Client Name	Description	Detection Keywords
Clash	Popular cross-platform proxy client	`clash`, `stash`, `shadowrocket`, `meta`
V2Ray	Core V2Ray client	`v2ray`
SingBox	Next-generation universal proxy platform	`singbox`
QuantumultX	Advanced proxy client for iOS	`quantumult`
Loon	Network proxy client for iOS	`loon`
Surfboard	Network proxy client for iOS	`surfboard`
Surge4	Advanced network toolbox	`surge`
Trojan	Uses V2Ray format	-

Client Detection

The system automatically detects the client type using two methods:

Query Parameter: ?client=clash (explicit specification)
User-Agent Header: Automatic detection based on HTTP User-Agent string

The detection logic prioritizes explicit query parameters over User-Agent detection.

How Proxy Client Config is Generated

Generation Process Flow

Authentication: Validate the subscription token and retrieve user information
Node Filtering: Apply user’s package permissions and filter options
Node Retrieval: Fetch available nodes based on user’s active package group
Format Conversion: Convert node configurations to client-specific format
Response Generation: Generate final configuration with proper headers

Configuration Generation Steps

// 1. Validate subscription token
let token = FindNodesTokenBySubscribeId { subscribe_id };

// 2. Get user's active package and available nodes
let nodes = ListUserNodeClientConfigs {
    user_id: token.user_id,
    filter_option: NodeFilterOption { ... }
};

// 3. Convert to client-specific format
match client_name {
    ClientName::Clash => {
        let nodes = cores.into_iter()
            .filter_map(|c| c.to_clash_node())
            .collect();
        Clash::generate(nodes).stringify()
    },
    ClientName::SingBox => {
        let nodes = cores.into_iter()
            .filter_map(|c| c.to_singbox_node())
            .collect();
        SingBox::generate(nodes).stringify()
    },
    // ... other clients
}

Node Filtering Options

The system supports sophisticated filtering based on:

Country Exclusion: exclude_country - List of country codes to exclude
Location Filtering:
- only_locations - Only include specific locations
- exclude_locations - Exclude specific locations
Route Class Filtering:
- only_route_classes - Only include specific route classes
- exclude_route_classes - Exclude specific route classes

Available Locations

pub enum Locations {
    NorthAmerica,
    SouthAmerica,
    Europe,
    EastAsia,
    SoutheastAsia,
    MiddleEast,
    Africa,
    Oceania,
    Antarctica,
}

Available Route Classes

pub enum RouteClass {
    /// Highest class, this kind of node is for enterprise customers.
    SpecialCustom,

    /// This class means the node is using high-end infrastructure like IPLC.
    Premium,

    /// This class means the node is using backbone infrastructure.
    Backbone,

    /// This class means the node is out of major countries and regions, provided for global access.
    GlobalAccess,

    /// This class means the node is a budget node, provided for budget sensitive customers.
    Budget,
}

RESTful API Reference

gRPC Endpoint: TelecomService.GetSubscribeLinks

Purpose: Retrieve available subscribe link templates for a user

Request:

message GetSubscribeLinksRequest {}

Response:

message GetSubscribeLinksReply {
    repeated SubscribeLink links = 1;
    string subscribe_token = 2;
}

message SubscribeLink {
    string url_template = 1;
    string endpoint_name = 2;
}

HTTP Endpoint: GET /subscribe/{token}

Purpose: Retrieve proxy configuration for a specific subscription token

Path Parameters

Parameter	Type	Description
`token`	UUID	User’s subscription token

Query Parameters

Parameter	Type	Required	Description
`client`	`ClientName`	No	Force specific client type (overrides User-Agent detection)
`exclude_country`	`CountryCode[]`	No	List of country codes to exclude
`only_locations`	`Locations[]`	No	Only include nodes from these locations
`exclude_locations`	`Locations[]`	No	Exclude nodes from these locations
`only_route_classes`	`RouteClass[]`	No	Only include nodes of these route classes
`exclude_route_classes`	`RouteClass[]`	No	Exclude nodes of these route classes

Example Requests

Basic Request:

GET /subscribe/550e8400-e29b-41d4-a716-446655440000
User-Agent: Clash/1.0

With Filtering:

GET /subscribe/550e8400-e29b-41d4-a716-446655440000?client=clash&exclude_country=CN,RU&only_route_classes=premium,backbone

Force Client Type:

GET /subscribe/550e8400-e29b-41d4-a716-446655440000?client=singbox

Response Format

The response format varies by client type:

Headers:

Content-Type: Varies by client (e.g., application/yaml for Clash, application/json for SingBox)
Custom headers with subscription information (traffic limits, expiration, etc.)

Response Body: Client-specific configuration format

Error Responses

HTTP Status	Description
`404 Not Found`	Invalid subscription token or no available nodes
`500 Internal Server Error`	Server processing error

Configuration Management

Subscribe link endpoints are configured via the TelecomConfig:

pub struct SubscribeLinkConfig {
    pub endpoints: Vec<SubscribeLinkEndpoint>,
}

pub struct SubscribeLinkEndpoint {
    pub url_template: String,  // Template with {SUBSCRIBE_TOKEN} placeholder
    pub endpoint_name: String, // Human-readable endpoint name
}

Default Configuration:

{
  "subscribe_link": {
    "endpoints": [
      {
        "url_template": "https://subscribe.congyu.moe/subscribe/{SUBSCRIBE_TOKEN}",
        "endpoint_name": "default"
      }
    ]
  }
}

Implementation Notes

Security Considerations

Token Validation: All requests must include a valid subscription token
Package Verification: Users can only access nodes their package allows
Rate Limiting: Consider implementing rate limiting for subscribe endpoints
Token Rotation: Subscription tokens should be rotatable for security

Performance Considerations

Caching: Node configurations can be cached to reduce database load
Filtering: Client-side filtering is applied efficiently using database indexes
Conversion: Node format conversion is optimized per client type

Maintenance Tasks

Monitor Usage: Track subscribe link usage patterns
Update Templates: Manage subscribe link templates through configuration
Clean Up: Remove expired or unused subscription tokens
Client Support: Add support for new proxy clients as needed

Market Module

The Market Module is responsible for managing affiliate marketing systems within the Helium platform. It provides comprehensive affiliate functionality including invite codes, referral tracking, reward calculation, and revenue distribution.

Overview

The market module implements a complete affiliate marketing system that allows users to invite new customers and earn commissions from their purchases. The system is built with the following core components:

Affiliate Policy Management: Configurable commission rates and invitation rules per user
Invite Code System: Generation and management of unique invitation codes
Referral Tracking: Automatic tracking of user registration through invite codes
Reward Calculation: Dynamic commission calculation based on order amounts and rates
Revenue Management: Accumulated reward tracking and withdrawal functionality

Core Features

User Invitation System

Generate unique 8-character alphanumeric invite codes
Maximum configurable number of active codes per user
Automatic code deactivation and cleanup
Referral relationship establishment during user registration

Commission & Rewards

Configurable commission rates per user
Trigger limits per referred user to prevent abuse
Automatic reward calculation on order payments
Real-time affiliate statistics tracking
Secure withdrawal system with balance verification

Administrative Controls

Centralized affiliate policy management
Global configuration for default rates and limits
Comprehensive audit trail for all affiliate activities
Integration with user balance and order systems

Module Structure

The market module follows the standard Helium module structure:

entities/: Database models and operations for affiliate data
services/: Business logic processors implementing affiliate functionality
rpc/: gRPC service implementations for client communication
hooks/: Event listeners for user registration and order processing
events/: Internal event definitions for reward processing
config.rs: Module configuration structure

Integration Points

The market module integrates with several other Helium modules:

Auth Module: Listens to user registration events to initialize affiliate policies
Shop Module: Processes order payment events to trigger reward calculations
Manage Module: Uses configuration system for affiliate settings
Redis: Stores module configuration for fast access
RabbitMQ: Event-driven communication with other modules

API Endpoints

The module exposes the following gRPC APIs:

GetAffiliateStats: Retrieve user’s affiliate statistics and invite codes
ListInviteCodes: List active invite codes for a user
CreateInviteCode: Generate new invite codes
DeleteInviteCode: Deactivate invite codes
WithdrawAffiliateReward: Transfer earned rewards to user balance

All APIs follow the Processor pattern for consistent business logic processing [[memory:6079830]].

Database Schema

The module uses a dedicated market schema with the following tables:

affiliate_policy: Per-user commission rates and invitation settings
affiliate_stats: Aggregated revenue and referral statistics
invite_code: User-generated invitation codes
affiliate_relation: Historical record of reward transactions

Event Flow

The affiliate system operates through an event-driven architecture:

User Registration: When users register with an invite code, the system creates affiliate policies and establishes referral relationships
Order Payment: When referred users make purchases, the system calculates and awards commissions to inviters
Reward Processing: Affiliate rewards are processed asynchronously through internal events
Balance Updates: Successful withdrawals update user account balances atomically

This documentation provides developers with the necessary understanding to maintain, extend, and integrate with the market module’s affiliate functionality.

Affiliate System

The Affiliate System is the core feature of the Market Module, providing a comprehensive referral and commission management system. This document details the implementation, data flow, and usage patterns for developers working with the affiliate functionality.

System Architecture

The affiliate system is built on an event-driven architecture that processes user interactions across multiple modules:

User Registration → Affiliate Policy Creation
Order Payment → Reward Calculation → Balance Update
Invite Code Generation → Code Validation → Referral Tracking

Core Components

1. Affiliate Policy (`affiliate_policy`)

Each user has an affiliate policy that defines their participation in the referral system:

Reward Rate: Commission percentage (e.g., 0.1 = 10%)
Trigger Time Per User: Maximum times a single referred user can trigger rewards
Invitation Rights: Whether the user can create invite codes
Referral Chain: Who invited this user (if anyone)

2. Invite Codes (`invite_code`)

Users generate unique codes to invite new customers:

8-character alphanumeric codes: Generated using secure randomization
Active status: Codes can be deactivated without deletion
User limits: Configurable maximum codes per user
Collision handling: Automatic retry on code conflicts

3. Affiliate Statistics (`affiliate_stats`)

Real-time tracking of affiliate performance:

Total Revenue: Cumulative commission earned
Withdrawn Revenue: Amount already transferred to user balance
Referral Count: Number of users successfully referred
Available Balance: total_revenue - withdrawn_revenue

4. Affiliate Relations (`affiliate_relation`)

Historical record of all reward transactions for audit and analytics.

Data Flow

1. User Registration Flow

When a new user registers with an invite code:

// Hook: RegisterHook in hooks/register.rs
impl Processor<UserRegisterEvent, Result<(), Error>> for RegisterHook {
    async fn process(&self, ev: UserRegisterEvent) -> Result<(), Error> {
        // 1. Load affiliate configuration
        let cfg = self.load_config().await?;

        // 2. Validate and find invite code
        let mut invited_by = None;
        if let Some(code) = ev.referral_code.clone() {
            if let Some(inv) = self.db.process(FindInviteCodeByCode { code }).await? {
                invited_by = Some(inv.user_id);
            }
        }

        // 3. Create affiliate policy for new user
        self.db.process(CreateAffiliatePolicy {
            user_id: ev.user_id,
            trigger_time_per_user: cfg.default_trigger_time_per_user,
            cannot_invite: false,
            rate: cfg.default_reward_rate,
            invited_by,
        }).await?;
    }
}

2. Order Payment Flow

When a referred user makes a purchase:

// Hook: OrderHook in hooks/orders.rs
impl Processor<OrderPaidEvent, Result<(), Error>> for OrderHook {
    async fn process(&self, event: OrderPaidEvent) -> Result<(), Error> {
        // 1. Skip account balance payments (no commission)
        if matches!(order.payment_method, Some(PaymentMethod::AccountBalance)) {
            return Ok(());
        }

        // 2. Find invitee's affiliate policy
        let invitee_policy = self.db.process(FindAffiliatePolicyByUser {
            user_id: event.user_id,
        }).await?;

        // 3. Check if user was referred
        if let Some(inviter) = invitee_policy.invited_by {
            // 4. Verify trigger limits
            let count = self.db.process(CountPaidOrders {
                user_id: event.user_id,
            }).await?;

            if count <= inviter_policy.trigger_time_per_user as i64 {
                // 5. Publish reward event
                let reward = AffiliateReward {
                    inviter,
                    invitee: event.user_id,
                    order_id: event.order_id,
                };
                reward.send(&self.mq).await?;
            }
        }
    }
}

3. Reward Processing Flow

The system processes affiliate rewards asynchronously:

// Hook: RewardHook in hooks/reward.rs
impl Processor<AffiliateReward, Result<(), Error>> for RewardHook {
    async fn process(&self, ev: AffiliateReward) -> Result<(), Error> {
        // 1. Load inviter's policy for commission rate
        let policy = self.db.process(FindAffiliatePolicyByUser {
            user_id: ev.inviter,
        }).await?;

        // 2. Calculate reward amount
        let order = self.db.process(FindOrderByIdOnly {
            order_id: ev.order_id,
        }).await?;
        let reward = order.total_amount * policy.rate;

        // 3. Create affiliate relation and update stats
        self.db.process(CreateAffiliateRelationAndReward {
            from: ev.invitee,
            to: ev.inviter,
            order_id: ev.order_id,
            rate: policy.rate,
            reward,
        }).await?;
    }
}

Service Implementation

The AffiliateService provides the main business logic using the Processor pattern [[memory:6079830]]:

Core Operations

Get Affiliate Statistics

pub struct GetMyAffiliateStats {
    pub user_id: Uuid,
}

// Returns: MyAffiliateStats with policy, stats, and invite codes

Invite Code Management

pub struct CreateMyInviteCode { pub user_id: Uuid }
pub struct DeleteMyInviteCode { pub user_id: Uuid, pub code_id: i64 }
pub struct ListMyInviteCodes { pub user_id: Uuid }

Reward Withdrawal

pub struct WithdrawAffiliateReward {
    pub user_id: Uuid,
    pub amount: Decimal,
}

The withdrawal operation is atomic and includes:

Balance verification (sufficient available rewards)
Affiliate stats update (increase withdrawn amount)
User balance credit
Transaction logging

Configuration

The affiliate system is configured through AffConfig:

pub struct AffConfig {
    pub max_invite_code_per_user: i32,      // Default: 10
    pub default_reward_rate: Decimal,       // Default: 0.1 (10%)
    pub default_trigger_time_per_user: i32, // Default: 3
}

Configuration is stored in Redis under the key affiliate and loaded dynamically by services.

Database Operations

All database operations use the Processor pattern with strongly-typed inputs and outputs:

Affiliate Policy Operations

FindAffiliatePolicyByUser: Retrieve user’s policy
CreateAffiliatePolicy: Initialize policy for new users

Invite Code Operations

CreateInviteCode: Generate new code with collision handling
ListInviteCodesByUser: Get user’s active codes
CountActiveInviteCodesByUser: Check against limits
SoftDeleteInviteCode: Deactivate code
FindInviteCodeByCode: Validate codes during registration

Statistics Operations

FindAffiliateStatsByUser: Get performance metrics
AddAffiliateReward: Credit new rewards
WithdrawAffiliateRewardAtomic: Transfer to balance

gRPC API

The affiliate system exposes user-facing APIs through the Market service:

service Market {
  rpc GetAffiliateStats (GetAffiliateStatsRequest) returns (GetAffiliateStatsReply);
  rpc ListInviteCodes (ListInviteCodesRequest) returns (ListInviteCodesReply);
  rpc CreateInviteCode (CreateInviteCodeRequest) returns (CreateInviteCodeReply);
  rpc DeleteInviteCode (DeleteInviteCodeRequest) returns (DeleteInviteCodeReply);
  rpc WithdrawAffiliateReward (WithdrawAffiliateRewardRequest) returns (WithdrawAffiliateRewardReply);
}

All APIs authenticate users and operate on their data only.

Event Integration

The system integrates with other modules through events:

Consumed Events

UserRegisterEvent (from auth module): Initialize affiliate policies
OrderPaidEvent (from shop module): Trigger reward calculations

Published Events

AffiliateReward (internal): Process commission calculations

Message Queues

helium_auth_user_register_market: User registration processing
helium_shop_order_paid: Order payment processing
helium_market_affiliate_reward: Internal reward processing

Business Rules

Reward Eligibility

Payment Method: Only real payments trigger rewards (no account balance)
Trigger Limits: Each referred user can only trigger rewards N times
Active Codes: Only active invite codes establish referral relationships
Valid Orders: Rewards only process for successfully paid orders

Security Considerations

Atomic Withdrawals: Balance checks and updates are transactional
Code Uniqueness: Invite codes are globally unique with retry logic
Rate Validation: Commission rates are validated and stored as decimals
Audit Trail: All reward transactions are recorded permanently

Error Handling

The system uses comprehensive error handling:

Invalid Input: Invalid amounts, missing data
Business Logic: Insufficient balance, code limits exceeded
System Errors: Database failures, message queue issues
Not Found: Missing policies, orders, or codes

Monitoring & Observability

Key metrics and logs for monitoring:

Successful referral registrations
Reward calculation events
Withdrawal success/failure rates
Invite code generation patterns
Commission distribution analytics

Development Guidelines

When working with the affiliate system:

Use Processors: All business logic must use the Processor pattern [[memory:6079830]]
Avoid Static Lifetimes: Use owned connection types like RedisConnection [[memory:7107428]]
Handle Decimals Carefully: Use rust_decimal::Decimal for all monetary calculations
Test Event Flows: Verify end-to-end event processing in integration tests
Monitor Performance: Track database query performance for statistics operations

Common Integration Patterns

Adding New Reward Triggers

Create event structure with routing information
Implement event processor with business logic
Register message queue and routing
Add appropriate error handling and logging

Extending Statistics

Update AffiliateStats entity
Modify aggregation queries
Update API response structures
Implement migration for existing data

This comprehensive documentation provides developers with the knowledge needed to maintain, extend, and troubleshoot the affiliate system effectively.

Shop Module

The shop module handles all e-commerce functionality: product listings, order management, coupon systems, user balance accounting, and gift card redemption. When you need to add payment integrations, modify pricing logic, or extend the checkout flow, this is where you start.

Overview

Config (config.rs) – Runtime configuration for order limits, auto-cancellation timing, and ePay integration URLs. Adjust these when tuning business rules or payment gateway endpoints.
Entities (entities/) – Database models for orders, productions (products), coupons, user balances, gift cards, and ePay providers. Extend these when adding new persistent data structures.
Services (services/) – Business logic services implementing order processing, coupon validation, balance management, and production queries. All APIs must be exposed through the Processor trait pattern, not object-oriented methods.
RPC (rpc/) – gRPC endpoints exposing shop capabilities to clients and other modules. Organized into user-facing services (order, production, account) and admin management services.
API (api/) – REST/HTTP API endpoints for payment gateway callbacks and integrations that require non-gRPC interfaces.
Events (events/) – Event definitions for order lifecycle (created, paid, cancelled, delivered) using RabbitMQ for inter-module communication.
Hooks (hooks/) – Event consumers that react to order events, handle ePay callbacks, log balance changes, and initialize user balances on registration.
Cron (cron.rs) – Background jobs for automatic order cancellation when unpaid orders exceed the configured timeout period.

Core Services

User-Facing Services

OrderService (services/order.rs)
Handles the complete order lifecycle from creation to payment. Key operations:

List user orders with production details and status
Create orders with optional coupon application
Generate ePay payment URLs for third-party payment gateways
Process payments via account balance or external providers
Cancel unpaid orders
List available ePay providers and channels

ProductionService (services/production.rs)
Manages product catalog visibility and access control:

List productions filtered by user group permissions
Retrieve individual production details with access validation
Products can be restricted to specific user groups or extra permission groups

CouponService (services/coupon.rs)
Validates promotional codes and discount rules:

Verify coupon validity considering time windows, usage limits, and user eligibility
Support for both percentage-based and amount-based discounts
Per-account and global usage tracking

BalanceService (services/user_balance.rs)
User account balance operations:

Query available and frozen balance
List balance change history with pagination
Redeem gift cards for balance credits

GiftCardService (services/gift_card.rs)
Gift card redemption and validation

Management Services

ManageService (services/manage.rs)
Admin operations for all shop entities with RBAC enforcement:

Orders: Mark paid, change amounts, list with filters, view details
Coupons: Create, update, delete, list with full metadata
Productions: Create, delete, list with package relationships
Balances: Adjust user balances, view change logs
Gift Cards: Generate in batches, create special codes, delete, list with filters

All management operations are integrated with the admin audit log system and require appropriate AdminRole permissions.

Payment Integration

The module integrates with ePay (易支付) third-party payment gateway:

Payment Flow:
- User creates order → receives order ID
- User requests payment URL with provider and channel (AliPay, WeChat, USDT)
- Module generates signed URL using provider credentials
- User completes payment on gateway
- Gateway sends callback to api/epay.rs webhook
- Hook verifies signature and updates order status
- OrderPaidEvent emitted for downstream processing
Provider Management:
- Multiple ePay providers supported with different channels
- Providers configured with merchant URLs, PIDs, and keys
- Each provider can be enabled/disabled via database switch

Events & Hooks

Published Events (via RabbitMQ):

OrderCreatedEvent (exchange: shop, routing: order_created) – Internal tracking
OrderPaidEvent (exchange: shop, routing: order_paid) – Consumed by market module for affiliate rewards
OrderCancelledEvent (exchange: shop, routing: order_cancelled) – Internal tracking
OrderDeliveredEvent (exchange: shop, routing: order_delivered) – Currently unused, candidate for removal

Event Consumers:

ePay Hook (hooks/epay.rs) – Listens to payment callbacks and updates order status
Log Hook (hooks/log.rs) – Tracks balance changes in audit logs
Register Hook (hooks/register.rs) – Initializes zero balance for new user accounts

Typical Extension Workflow

Add Configuration – Define new config fields in config.rs and surface through Redis configuration provider
Extend Entities – Add or modify database models in entities/ for new persistent data

Implement Service Logic – Create Processor implementations in services/ following the pattern:

impl Processor<MyInput, Result<MyOutput, Error>> for MyService {
    async fn process(&self, input: MyInput) -> Result<MyOutput, Error> {
        // Business logic here
    }
}

Expose via RPC – Add gRPC methods in rpc/ that delegate to service processors
Define Proto Messages – Update .proto files in proto/shop/ and rebuild
Emit Events – Publish events through AmqpPool when state changes need cross-module notification
Add Hooks – Create event consumers in hooks/ if other modules need to react
Schedule Maintenance – Add cron jobs in cron.rs for cleanup or periodic tasks

Architecture Notes

Processor Pattern: All service APIs use the Processor<Input, Result<Output, Error>> trait from the kanau crate. Never expose business logic through object methods.
Owned Resources: Services hold owned RedisConnection, DatabaseProcessor, and AmqpPool instead of static lifetimes or references.
Coupon Discount Types: Two variants – RateDiscount (percentage) and AmountDiscount (fixed amount with minimum threshold)
Order Status Flow: Unpaid → Paid → Delivered (or Cancelled/Refunding/Refunded)
Balance Types: Available balance (spendable) and frozen balance (temporarily locked during transactions)
Decimal Handling: All monetary values use rust_decimal::Decimal internally and serialize as strings in proto/JSON

Database Schema

Key tables (see migrations/20250815133800_create_shop_entites.sql):

productions – Product catalog with pricing and package references
orders – Order records with status tracking and payment details
coupons – Promotional codes with discount rules and usage tracking
user_balance – Account balance per user
user_balance_change_log – Audit trail for all balance modifications
gift_cards – Redeemable codes with amounts and expiration
epay_providers – Configured payment gateway providers

Usage Examples

Creating an Order:

let result = order_service.process(CreateOrder {
    user_id: user.id,
    production_id: prod_uuid,
    coupon_id: Some(123),
}).await?;

Payment with Balance:

let result = order_service.process(PayOrderWithBalance {
    user_id: user.id,
    order_id: order_uuid,
}).await?;

Redeeming Gift Card:

let result = balance_service.process(RedeemGiftCard {
    user_id: user.id,
    secret: "GIFT-CODE-123",
}).await?;

Configuration

Shop module configuration is stored in Redis under key shop:

{
    "max_unpaid_orders": 5,
    "auto_cancel_after": "30m",
    "epay_notify_url": "https://api.example.com/shop/epay/callback",
    "epay_return_url": "https://example.com/order/complete"
}

Keep this document synchronized with structural changes so future developers can quickly navigate the codebase.

Production

Production (also referred to as “product” in the codebase) is the user-facing catalog item that customers can purchase in the Helium shop system. It represents a complete service offering that, when purchased, grants users access to telecom packages.

Core Concept

A Production defines:

What users see: Display information (title, description, price)
What users receive: Reference to a package series and the quantity of packages
Who can access it: Access control through user groups and extra permission groups

pub struct ServiceProduction {
    pub id: Uuid,                      // Unique identifier
    pub title: String,                 // Display name
    pub description: String,           // Marketing description
    pub price: Decimal,                // Purchase price
    pub package_series: Uuid,          // References a package series
    pub package_amount: i32,           // Number of packages to deliver
    pub visible_to: i32,               // User group visibility
    pub is_private: bool,              // Private production flag
    pub limit_to_extra_group: i32,     // Extra group requirement if private
    pub on_sale: bool,                 // Whether currently available for purchase
}

Key Characteristics

Package Series Linkage: Production always references a package series, not individual packages. This ensures version control flexibility - the actual package delivered is the master package of the series at purchase time.
Quantity Control: package_amount specifies how many packages from the series the user receives. For example:
- package_amount = 1: Single subscription period
- package_amount = 3: Three subscription periods (stacked in package queue)
- package_amount = 12: Annual subscription with 12 monthly packages
Access Control Layers:
- Basic visibility: visible_to determines which user group can see the production
- Private productions: When is_private = true, only users with limit_to_extra_group in their user_extra_groups can access it
- On-sale status: on_sale controls whether the production is currently purchasable
Price Stability: The production price is independent of package contents. Even if the underlying master package changes (through version control), the production price remains stable unless explicitly updated.

Production Views

The system provides two different views of productions for different audiences:

User View (`ProductionUserView`)

pub struct ProductionUserView {
    pub id: Uuid,
    pub title: String,
    pub description: String,
    pub price: Decimal,
    pub package_amount: i32,
    pub traffic_limit: i64,           // From master package
    pub max_client_number: i32,       // From master package
    pub expire_duration: PgInterval,  // From master package
}

Purpose: Shows end users what they’re buying with key package details (traffic, connections, duration) pulled from the current master package.

Admin View (`ProductionAdminView`)

pub struct ProductionAdminView {
    pub id: Uuid,
    pub title: String,
    pub description: String,
    pub price: Decimal,
    pub package_series: Uuid,
    pub package_amount: i32,
    pub visible_to: i32,
    pub is_private: bool,
    pub limit_to_extra_group: i32,
    pub package_id: i64,              // Master package ID
    pub package_version: i32,         // Master package version
    pub package_available_group: i32, // Access control from package
    // ... additional package details
}

Purpose: Provides administrators with complete production metadata including internal IDs, version information, and access control settings.

Admin Management of Production

Administrators manage productions through the ManageService in the shop module. All operations require appropriate AdminRole permissions and are logged in the audit system.

Available Operations

1. Create Production

Operation: CreateProduction

pub struct CreateProduction {
    pub title: String,
    pub description: String,
    pub price: Decimal,
    pub package_series: Uuid,          // Must reference existing series
    pub package_amount: i32,           // Quantity to deliver
    pub visible_to: i32,               // User group visibility
    pub is_private: bool,              // Private production flag
    pub limit_to_extra_group: i32,     // Extra group requirement
}

Requirements:

Package series must exist in the telecom module
Package series must have a master package (is_master = true)
Admin must have Moderator or SuperAdmin role
All fields are mandatory except the private/extra_group can be zero if not private

Workflow:

Validate package series exists and has master package
Create production record with UUID
Production defaults to on_sale = true
Logged in admin audit system

2. Delete Production

Operation: DeleteProduction

pub struct DeleteProduction {
    pub id: Uuid,  // Production ID to delete
}

Requirements:

Production must exist
Admin must have Moderator or SuperAdmin role

Important Notes:

Soft delete behavior: Production is removed from catalog but existing orders referencing it remain valid
No cascade deletion: Deleting a production does NOT delete its associated package series
Impact: Users can no longer purchase this production, but already-purchased orders are unaffected

3. List Productions

Operation: ListProductions

// No input parameters - returns all productions

Returns: Vec<ProductionAdminView> with complete production details including master package information

Use Cases:

Catalog management dashboards
Production inventory audits
Package version tracking
Access control verification

Reference to Version Control of Packages

Productions are tightly integrated with the Version Control of Packages system. Understanding this relationship is crucial for managing service offerings.

How Productions Use Package Versions

When a user purchases a production:

At Purchase Time: The system looks up the master package of the referenced package series
Version Snapshot: The specific package version (master at that moment) is recorded in the order
Queue Insertion: Package queue items reference the specific package ID, not the series
Version Isolation: If the master package changes later, existing purchases are unaffected

Example: Package Version Evolution

Timeline:

Day 1: Create Production "Monthly Premium"
  └─> References package_series: abc-123
  └─> Master package: version 1 (100GB traffic, $10)

Day 15: User A purchases "Monthly Premium"
  └─> Receives: package version 1 (100GB)

Day 30: Marketing updates package series
  └─> New master package: version 2 (200GB traffic, same $10 price)
  └─> Old version 1: is_master = false (preserved for existing users)

Day 45: User B purchases "Monthly Premium"
  └─> Receives: package version 2 (200GB)

Day 60: Both users' services:
  └─> User A: Still has 100GB (version 1) - not affected by update
  └─> User B: Has 200GB (version 2) - received new version

Version Control Best Practices for Productions

Price Adjustments: Update production price separately from package content
Content Updates: Create new package versions to change service parameters
Catalog Refresh: Existing productions automatically deliver new master versions
Rollback Strategy: Keep old package versions available; delete and recreate production if needed
Testing: Verify master package is correct before major version changes

See Version Control of Packages for detailed information about:

Creating new package versions
Promoting packages to master
Version change vs. non-version change edits
Admin package management operations

Relationship Flow: Production → Package → Package Queue → Order → Node Client

Understanding how these components interconnect is essential for system comprehension and troubleshooting.

Component Relationships

┌──────────────┐      references      ┌──────────────┐      has master      ┌──────────────┐
│              │─────────────────────>│   Package    │─────────────────────>│   Package    │
│  Production  │                       │    Series    │                       │   (Master)   │
│              │                       │              │                       │              │
└──────────────┘                       └──────────────┘                       └──────────────┘
       │                                                                              │
       │                                                                              │ defines
       │ purchase creates                                                             │ service
       │                                                                              │ parameters
       ▼                                                                              ▼
┌──────────────┐      delivers        ┌──────────────┐      references      ┌──────────────┐
│              │─────────────────────>│   Package    │─────────────────────>│   Package    │
│    Order     │                       │ Queue Item   │                       │              │
│              │                       │              │                       │              │
└──────────────┘                       └──────────────┘                       └──────────────┘
       │                                      │                                       │
       │                                      │                                       │ controls
       │                                      │ activates                             │ access
       │                                      │ for user                              │
       │                                      ▼                                       ▼
┌──────────────┐                       ┌──────────────┐      filtered by     ┌──────────────┐
│   Payment    │                       │ Active       │─────────────────────>│ Node Client  │
│   Status     │                       │ Package      │                       │ Access       │
│              │                       │              │                       │              │
└──────────────┘                       └──────────────┘                       └──────────────┘

Detailed Relationship Flow

1. Production → Order

When: User initiates purchase
Process:

// User browses available productions
ListVisibleProductions { user_group, extra_groups }
  └─> Returns productions matching access control

// User creates order
CreateOrder { user_id, production_id, coupon_id }
  └─> Creates UserOrder with status: Unpaid
  └─> Records production reference
  └─> Applies coupon discount if provided
  └─> Emits OrderCreatedEvent

Data Flow:

Production ID stored in order.production
Production price becomes order.total_amount (after coupon)
Order status starts as Unpaid

2. Order → Payment → Package Queue

When: User completes payment
Process:

// Payment via ePay gateway
PayOrderWithEpay { callback }
  └─> Validates payment signature
  └─> Updates order: status = Paid, paid_at = NOW()
  └─> Emits OrderPaidEvent

// OR payment via account balance
PayOrderWithBalance { user_id, order_id }
  └─> Deducts user balance
  └─> Updates order: status = Paid, paid_at = NOW()
  └─> Emits OrderPaidEvent

Data Flow:

OrderPaidEvent published to RabbitMQ exchange shop with routing key order_paid
Market module consumes this event for affiliate rewards
Note: Current codebase shows OrderPaidEvent is published but the actual delivery hook that creates package queue items is not yet visible in the telecom module hooks. This delivery mechanism needs to be implemented or is handled through a separate service/cron job.

Expected Delivery Flow (to be implemented):

// Expected hook (not found in current codebase)
consume OrderPaidEvent:
  1. Query order details (production_id, user_id)
  2. Query production (package_series, package_amount)
  3. Find master package in package_series
  4. Create package queue items:
     CreateQueueItems {
         user_id: order.user,
         package_id: master_package.id,  // Specific version
         by_order: Some(order.id),
         amount: production.package_amount
     }
  5. Trigger package queue push event
  6. Update order: status = Delivered

3. Package Queue → Active Package

When: Package queue items are created
Process:

// Package queue push event triggers activation
PackageQueuePushEvent emitted
  └─> TelecomPackageQueueHook consumes event
  └─> process_package_queue_push(transaction, user_id)
      └─> Check if user has active package
      └─> If no active package:
          - Find oldest queued package (ORDER BY created_at)
          - Activate: status = Active, activated_at = NOW()
          - Emit PackageActivateEvent

Data Flow:

Package queue item status transitions: InQueue → Active
Only ONE package per user can be Active at a time
Remaining packages wait in queue (FIFO order)

Queue States:

InQueue: Waiting to be activated
Active: Currently providing service to user
Consumed: Expired due to time or traffic limit
Cancelled: Refunded or cancelled by admin

4. Active Package → Node Client Access

When: User requests node list or subscription link
Process:

// User lists available nodes
ListMyNodes { user_id }
  └─> FindActiveAvailableGroup { user_id }
      └─> Query active package
      └─> Extract package.available_group
  └─> ListUserNodeClients { group }
      └─> SELECT * FROM node_client
          WHERE available_groups @> [group]
      └─> Returns accessible node clients

Access Control Logic:

-- Node client access check
SELECT * FROM "telecom"."node_client"
WHERE available_groups && ARRAY[user_active_package.available_group]

Data Flow:

Active package defines user’s available_group (e.g., group 1 = Premium)
Node clients have available_groups array (e.g., [1, 2] = Premium and Standard users)
User can access node client if their package group is in node’s group array
No active package = empty node list (user cannot connect)

5. Package Expiration → Queue Advancement

When: Package expires (time or traffic limit)
Process:

// Time-based expiration (cron job)
FindExpirePackageByTimeBefore { time: NOW() }
  └─> For each expired package:
      └─> Emit PackageExpiringEvent { reason: Time }

// Usage-based expiration (billing)
RecordPackageUsage { user_id, upload, download }
  └─> If usage >= traffic_limit + adjust_quota:
      └─> Update status: Consumed
      └─> Emit PackageExpiringEvent { reason: Usage }

// Queue advancement (hook)
PackageExpiringEvent consumed
  └─> TelecomPackageQueueHook processes:
      └─> Find next queued package
      └─> If found: Activate next package
      └─> If none: Emit AllPackageExpiredEvent

Data Flow:

Current package: Active → Consumed
Next package: InQueue → Active
User’s node access switches to new package’s available_group
If no packages remain: User loses node access

Complete Purchase-to-Access Example

Scenario: User purchases “Monthly Premium” production

Step 1: Purchase
  User: "Buy Monthly Premium ($30, 3 months)"
  └─> Production {
        title: "Monthly Premium",
        price: $30,
        package_series: series-uuid-123,
        package_amount: 3
      }
  └─> Order created (Unpaid)

Step 2: Payment
  User: "Pay with AliPay"
  └─> Payment gateway callback
  └─> Order status: Unpaid → Paid
  └─> OrderPaidEvent emitted

Step 3: Delivery (Expected Implementation)
  OrderPaidEvent consumed
  └─> Find master package of series-uuid-123
      └─> Package {
            id: 12345,
            version: 2,
            available_group: 1 (Premium),
            traffic_limit: 100GB,
            expire_duration: 30 days
          }
  └─> Create 3 package queue items:
      └─> Item 1: status = Active (immediately activated)
      └─> Item 2: status = InQueue
      └─> Item 3: status = InQueue
  └─> Order status: Paid → Delivered

Step 4: Service Access
  User: "Show my nodes"
  └─> Query active package: Item 1 (package 12345)
  └─> Extract available_group: 1
  └─> Query node clients: WHERE available_groups @> [1]
  └─> Returns: Premium tier nodes
  User: "Generate subscription link"
  └─> Generates config for premium nodes
  └─> User can now connect

Step 5: Package Lifecycle (Day 30)
  System: "Package 1 expired (30 days)"
  └─> Item 1: Active → Consumed
  └─> Item 2: InQueue → Active (auto-activated)
  └─> User continues service with Item 2
  └─> Still has Item 3 waiting in queue

Step 6: All Packages Consumed (Day 90)
  System: "Package 3 expired"
  └─> Item 3: Active → Consumed
  └─> No more items in queue
  └─> AllPackageExpiredEvent emitted
  └─> User loses access to nodes (need to repurchase)

Node Client Relationship

Node clients use the active package’s available_group for access control. This creates a seamless flow from purchase to service access:

Access Control Chain:

User purchases Production → receives Packages
Package activation → defines available_group
Node Client filtering → matches available_groups
User connection → accesses filtered nodes

Example Group Mapping:

Package Groups:
- Group 1: Premium ($30/month, 200GB, premium nodes)
- Group 2: Standard ($15/month, 100GB, standard nodes)
- Group 3: Budget ($8/month, 50GB, budget nodes)

Node Client Configuration:
- "🇺🇸 US Premium": available_groups = [1]
- "🇺🇸 US Standard": available_groups = [1, 2]
- "🇸🇬 SG Budget": available_groups = [1, 2, 3]

User with Group 1 (Premium) package:
  ✅ Can access: US Premium, US Standard, SG Budget

User with Group 2 (Standard) package:
  ❌ Cannot access: US Premium
  ✅ Can access: US Standard, SG Budget

User with Group 3 (Budget) package:
  ❌ Cannot access: US Premium, US Standard
  ✅ Can access: SG Budget only

See Node Client for detailed information about node access control and configuration.

Implementation Notes for Developers

Current Implementation Status

Implemented:

✅ Production CRUD operations (Create, Delete, List)
✅ Production access control (user groups, private productions)
✅ Order creation with production reference
✅ Payment processing with OrderPaidEvent emission
✅ Package queue system with activation logic
✅ Node client access control based on active package

Needs Implementation/Verification:

⚠️ Order delivery hook: The mechanism that consumes OrderPaidEvent and creates package queue items is not visible in the current codebase. This critical component needs to be implemented or located.
⚠️ Order status update: Automatic transition from Paid to Delivered after package delivery
⚠️ Error handling: What happens if package delivery fails after payment?
⚠️ Refund workflow: How to handle package queue items when orders are refunded?

Expected Delivery Hook Implementation

Location: Should be in either:

modules/shop/src/hooks/delivery.rs (new file)
modules/telecom/src/hooks/order.rs (new file)
Integration service in server/src/

Pseudocode:

// File: modules/shop/src/hooks/delivery.rs (suggested)
pub struct ShopDeliveryHook {
    pub telecom_db: DatabaseProcessor,  // Access to telecom DB
    pub shop_db: DatabaseProcessor,     // Access to shop DB
    pub mq: AmqpPool,
}

impl AmqpMessageProcessor<OrderPaidEvent> for ShopDeliveryHook {
    const QUEUE: &'static str = "helium_shop_order_delivery";
}

impl Processor<OrderPaidEvent, Result<(), Error>> for ShopDeliveryHook {
    async fn process(&self, event: OrderPaidEvent) -> Result<(), Error> {
        // 1. Fetch order and production details
        let order = self.shop_db.process(FindOrderByIdOnly {
            order_id: event.order_id
        }).await?.ok_or(Error::NotFound)?;

        let production = self.shop_db.process(FindProductionById {
            id: order.production
        }).await?.ok_or(Error::NotFound)?;

        // 2. Find master package of the series
        let package = self.telecom_db.process(FindMasterPackageBySeries {
            series: production.package_series
        }).await?.ok_or(Error::NotFound)?;

        // 3. Create package queue items
        let items = self.telecom_db.process(CreateQueueItems {
            user_id: event.user_id,
            package_id: package.id,
            by_order: Some(event.order_id),
            amount: production.package_amount,
        }).await?;

        // 4. Emit package queue push event
        PackageQueuePushEvent {
            item_ids: items.iter().map(|i| i.id).collect(),
            user_id: event.user_id,
            package_id: package.id,
            pushed_at: OffsetDateTime::now_utc().unix_timestamp() as u64,
        }.send(&self.mq).await?;

        // 5. Update order status to delivered
        self.shop_db.process(UpdateOrderDelivered {
            order_id: event.order_id
        }).await?;

        Ok(())
    }
}

Testing Checklist

When implementing or verifying the delivery mechanism:

Happy Path:
- User purchases production → order created
- User pays order → OrderPaidEvent emitted
- Delivery hook triggered → package queue items created
- First package auto-activated → user can access nodes
- Order status updated to Delivered
Edge Cases:
- Package series has no master package → error handling
- Production deleted after order created but before payment
- Duplicate payment callbacks (idempotency)
- User already has active package → new packages queue correctly
Error Recovery:
- Delivery fails → order remains Paid (manual intervention needed)
- Partial delivery → some items created, transaction rollback
- Event replay → idempotent delivery (check by_order reference)

Best Practices

Production Naming: Use clear, descriptive titles that indicate:
- Service tier (Premium, Standard, Budget)
- Duration/quantity (Monthly, Quarterly, Annual)
- Special features (High-speed, Unlimited, etc.)
Price Management:
- Keep production prices stable
- Use coupons for temporary discounts
- Create new productions for permanent price changes
Access Control:
- Use visible_to for tier-based catalog (free users, paid users, VIP)
- Use is_private + limit_to_extra_group for special promotions or corporate accounts
- Set on_sale = false to temporarily hide productions without deletion
Package Amount Strategy:
- package_amount = 1: Pay-per-period (most common)
- package_amount = 3: Quarterly bundles (slight discount)
- package_amount = 12: Annual subscriptions (significant discount)
- Higher amounts = fewer transactions, better user retention
Version Control Integration:
- Review current master package before creating production
- Coordinate with telecom team when updating master packages
- Document package version changes that affect existing productions
- Consider creating new production for major service upgrades
Audit Trail:
- All production operations are logged via admin audit system
- Track which admin created/deleted productions
- Monitor order counts per production for popularity metrics
- Review production-to-package-version history for customer support

See Also:

Version Control of Packages - Package versioning system
Package Queue - Package lifecycle management
Node Client - Access control and node configuration
Order System - Complete order lifecycle
Ordering Flow - Purchase workflow diagrams

Order System

The Order System manages the complete e-commerce transaction lifecycle in Helium, from order creation through payment processing to product delivery.

Overview

An order represents a user’s purchase of a production. Key capabilities:

Create orders with optional coupon discounts
Process payments via ePay gateways or account balance
Track order status through lifecycle states
Automatically cancel unpaid orders after timeout
Publish events to notify other modules
Admin order management and intervention

Order Data Model

Each order tracks:

User and production reference
Final amount after discounts
Applied coupon (if any)
Current status and timestamps
Payment method and provider
Soft deletion flag

Order Status Lifecycle

Orders transition through the following states:

┌─────────┐      payment      ┌──────┐     delivery     ┌───────────┐
│ Unpaid  │──────────────────>│ Paid │─────────────────>│ Delivered │
└─────────┘                   └──────┘                  └───────────┘
     │                              │
     │ timeout/                     │ admin/
     │ manual                       │ user
     │                              │
     v                              v
┌───────────┐                  ┌───────────┐
│ Cancelled │                  │ Refunding │
└───────────┘                  └───────────┘
                                    │
                                    │ processed
                                    v
                               ┌──────────┐
                               │ Refunded │
                               └──────────┘

Status Definitions

Unpaid: Order created, payment pending. Auto-cancelled after timeout (default: 30 minutes)
Paid: Payment confirmed. Awaiting product delivery
Delivered: Packages delivered to user’s account
Cancelled: Order cancelled before payment
Refunding: Refund in progress (not fully implemented)
Refunded: Refund completed (not fully implemented)

Payment Methods

Alipay: AliPay via ePay gateway
WeChatPay: WeChat Pay via ePay gateway
Usdt: USDT cryptocurrency via ePay gateway
AccountBalance: Direct payment from user’s account balance
AdminChange: Admin manually marked as paid

Order Creation

User Purchase Flow

Browse available productions
Select production and apply coupon (optional)
Create order → receives order ID
Choose payment method (ePay or balance)
Complete payment → order marked as paid
System delivers packages automatically

Validation Rules

When creating an order, the system validates:

Unpaid Order Limit: Users cannot have more than max_unpaid_orders unpaid orders (default: 5)
Production Existence: Production must exist and be available for purchase
Coupon Validation (if coupon applied):
- Coupon code must be valid
- Current time within coupon’s validity window
- Global usage limit not exceeded
- Per-user usage limit not exceeded
- Production price meets coupon’s minimum amount requirement

Possible Results

Created: Order created successfully, returns order ID
ProductionNotFound: Selected production doesn’t exist
CouponInvalid: Coupon is invalid or not applicable
TooManyUnpaid: User has reached the unpaid order limit

Payment Processing

Payment Methods

1. ePay Gateway Payment

ePay (易支付) is a third-party payment aggregator supporting:

AliPay
WeChat Pay
USDT cryptocurrency

Payment Flow:

User → Request Payment URL → Get Signed URL → Redirect to ePay
                                                      ↓
User ← Return to App ← OrderPaidEvent ← Callback ← Payment Complete

Process:

User requests payment URL with order ID, provider, and channel
System generates signed payment URL
User redirected to ePay gateway
User completes payment on external site
ePay sends callback to server with payment result
System verifies signature and updates order status
OrderPaidEvent published to trigger delivery
User redirected back to app

Security: All callbacks are signature-verified using provider’s secret key to prevent fraud.

2. Account Balance Payment

Users can pay directly from their account balance.

Process:

User requests to pay with balance
System checks sufficient balance available
Balance deducted atomically with order update
Balance change logged for audit trail
OrderPaidEvent published to trigger delivery

Transaction Safety: Balance deduction and order update occur in a single atomic transaction with pessimistic locking to prevent race conditions.

Order Cancellation

Cancellation Methods

1. User Cancellation

Users can cancel their own unpaid orders at any time. Once cancelled, the order cannot be restored.

2. Automatic Cancellation

A cron job automatically cancels unpaid orders after a timeout period (default: 30 minutes). This prevents abandoned orders from cluttering the system.

Configuration: auto_cancel_after in shop config (default: 30 minutes)

3. Admin Cancellation

Administrators can manually cancel orders through the management interface.

Order Events

The order system publishes events via RabbitMQ for inter-module communication.

Event Types

Event	When Published	Consumers
`OrderCreatedEvent`	Order created	Internal tracking
`OrderPaidEvent`	Payment confirmed	Market module (affiliate rewards), Delivery system
`OrderCancelledEvent`	Order cancelled	Internal tracking
`OrderDeliveredEvent`	Products delivered	⚠️ Not yet implemented

OrderPaidEvent → Product Delivery

When OrderPaidEvent is published, the system should:

Retrieve order and production details
Find current master package of the package series
Create package queue items for the user
Trigger package activation
Update order status to Delivered

Status: ⚠️ The delivery hook that consumes OrderPaidEvent needs to be implemented or located in the codebase.

Product Delivery

When an order is paid, the system delivers products by creating package queue items.

Delivery Flow

OrderPaidEvent → Delivery Hook → Create Package Queue Items → Activate First Package

What Happens

Delivery hook receives OrderPaidEvent
Looks up order and production details
Finds current master package of the package series
Creates package_amount queue items (e.g., 3 items for quarterly plan)
Links items to order for refund tracking
First package automatically activates for user
Order status updates to Delivered

Important: The delivery hook snapshots the master package version at payment time, ensuring users receive the package version that was advertised when they purchased.

⚠️ Implementation Status: The delivery hook needs to be implemented or located in the codebase.

Admin Management

Administrators can manage orders through the management interface.

Admin Operations

Operation	Permissions	Description
List Orders	Moderator, SuperAdmin, CustomerSupport	View all orders with filters (user, production, status)
Show Order Detail	Moderator, SuperAdmin, CustomerSupport	View complete order information
Mark as Paid	Moderator, SuperAdmin, CustomerSupport	Manually mark order as paid (triggers delivery)
Change Amount	Moderator, SuperAdmin, CustomerSupport	Adjust order amount for corrections or refunds

Common Use Cases

Manual Payment Processing: Mark orders as paid after offline payments
Customer Support: View order details and history for troubleshooting
Price Adjustments: Correct pricing errors or apply custom discounts
Partial Refunds: Adjust order amount for partial refunds

Audit Trail

All admin operations are automatically logged with:

Admin user ID and role
Operation performed
Target order ID
Parameters and changes
Timestamp and result

API Overview

The order system exposes a gRPC service: ShopOrderService (see proto/shop/order.proto)

Main Operations

Operation	Purpose
`VerifyCoupon`	Check if a coupon code is valid
`CreateOrder`	Create new order with optional coupon
`ListOrders`	Get user’s order history
`GetOrderDetail`	View specific order details
`GetEpayUrl`	Generate payment gateway URL
`PayOrderWithBalance`	Pay with account balance
`CancelOrder`	Cancel unpaid order
`DeleteOrder`	Hide order from user’s list
`ListEpayProviders`	Get available payment providers

Configuration

Stored in Redis under key shop:

Setting	Default	Description
`max_unpaid_orders`	5	Maximum unpaid orders per user
`auto_cancel_after`	30 minutes	Timeout before auto-cancellation
`epay_notify_url`	-	Server callback URL for payment notifications
`epay_return_url`	-	User redirect URL after payment

Usage Examples

User Purchase Flow

// 1. Verify coupon (optional)
const couponResponse = await orderService.verifyCoupon({ code: "DISCOUNT10" });
const couponId = couponResponse.isValid ? couponResponse.coupon.id : null;

// 2. Create order
const createResponse = await orderService.createOrder({
  productionId: selectedProductionId,
  couponId: couponId,
});

if (createResponse.result === "TOO_MANY_UNPAID") {
  showError("Please pay or cancel existing orders first");
  return;
}

const orderId = createResponse.orderId;

// 3. Choose payment method
if (useAccountBalance) {
  // Pay with balance
  const payResponse = await orderService.payOrderWithBalance({ orderId });

  if (payResponse.result === "NOT_ENOUGH_BALANCE") {
    showError("Insufficient balance");
    return;
  }

  showSuccess("Payment successful!");
} else {
  // Pay with ePay
  const providers = await orderService.listEpayProviders({});
  const urlResponse = await orderService.getEpayUrl({
    orderId: orderId,
    providerId: providers.providers[0].id,
    channel: "EPAY_CHANNEL_ALI_PAY",
  });

  // Redirect to payment gateway
  window.location.href = urlResponse.url;
}

// 4. Check order status
const detailResponse = await orderService.getOrderDetail({ orderId });
if (detailResponse.detail) {
  console.log("Order status:", detailResponse.detail.order.orderStatus);
  console.log("Production:", detailResponse.detail.production.title);
} else {
  console.error("Order not found or not accessible");
}

Implementation Status

Completed Features

✅ Order creation with coupon validation
✅ ePay payment gateway integration
✅ Account balance payment
✅ Order cancellation (user, automatic, admin)
✅ Order tracking and status management
✅ Event publishing for inter-module communication
✅ Admin management operations
✅ Complete gRPC API

Pending Implementation

⚠️ Product delivery hook: Needs to consume OrderPaidEvent and create package queue items
⚠️ OrderDeliveredEvent: Not currently published (implement or remove)
⚠️ Refund workflow: Status states exist but full refund process not implemented

Important Notes

For Backend Developers

Transaction Safety: Balance payments use atomic transactions with pessimistic locking
Idempotency: Payment callbacks can be replayed safely without duplicate charges
Event-Driven Architecture: Use RabbitMQ events for cross-module communication
Signature Verification: Always verify ePay callback signatures to prevent fraud
Processor Pattern: All APIs exposed via Processor trait, not object-oriented methods [[memory:6079830]]

For Frontend Developers

Order Status Polling: After payment, poll order status until Delivered
ePay Redirect: Handle user redirect to external payment gateway
Error Handling: Handle all result enums (TooManyUnpaid, CouponInvalid, etc.)
Coupon Verification: Always verify coupon before order creation to show discount preview
Balance Check: Check user balance before offering balance payment option

See Also:

Production - Product catalog and version control
Shop Module Introduction - Shop module overview
Package Queue - Package delivery mechanism
Balance System - User balance management (if documented)

Ordering Flow

This document explains the complete purchase flow in the Helium shop system - how a user goes from browsing products to receiving service access. It covers the conceptual flow, cross-module interactions, and important implementation notes that developers should remember.

Flow Overview

User Journey:
Browse Products → Apply Coupon → Create Order → Pay → Receive Packages → Access Nodes

System Flow:
┌─────────────┐      ┌──────────────┐      ┌─────────────┐      ┌──────────────┐
│  Production │ ───> │    Order     │ ───> │   Payment   │ ───> │   Package    │
│   Service   │      │   Service    │      │   Process   │      │    Queue     │
└─────────────┘      └──────────────┘      └─────────────┘      └──────────────┘
       │                     │                      │                    │
       ▼                     ▼                      ▼                    ▼
  Catalog with         Order Creation        Payment Methods        Service
  Access Control       + Pricing Logic      (ePay / Balance)       Activation

Key Concepts

1. Production Visibility

Productions (products) are filtered by access control before users can see them:

User Group: Basic tier matching (visible_to must equal user’s user_group)
Private Productions: If is_private = true, user must have limit_to_extra_group in their extra_groups
On-Sale Status: Only on_sale = true productions are visible

This means the same codebase can show different catalogs to different user tiers.

2. Coupon Validation

Coupons are validated TWICE in the flow:

Pre-order verification: VerifyCoupon RPC for UI preview
Order creation validation: Re-validated when creating order (security)

Why twice? The coupon state can change between preview and order creation (e.g., usage limit reached). Always validate at order creation to prevent abuse.

Discount Types:

Rate Discount: Percentage off (e.g., 10% → 0.1 rate)
Amount Discount: Fixed amount off with minimum threshold

Validation Rules:

Time window (start_time ≤ now ≤ end_time)
Global usage limit (total uses across all users)
Per-account usage limit (uses per individual user)
Minimum amount requirement (for amount-based discounts)

3. Order Creation

When creating an order, the system:

Checks unpaid order limit (default: 5 per user)
Validates production exists and is available
Applies coupon discount if provided
Creates order with final calculated price
Publishes OrderCreatedEvent (for tracking)

Important: The final price is locked at order creation time. Even if production price changes later, the order amount remains unchanged.

4. Payment Methods

Two payment paths with different characteristics:

ePay Gateway (AliPay, WeChat, USDT)

User Flow: Redirect to external gateway → Complete payment → Redirect back
Callback Flow: Gateway sends async callback to server → Signature verification → Update order
Security: All callbacks MUST verify signature using provider’s key
Idempotency: Callbacks can be replayed; check order status before processing

Architecture:

User Payment on Gateway
    ↓
Gateway POST /api/shop/epay/callback
    ↓
Publish EpayCallback event to RabbitMQ (immediate return)
    ↓
EpayHook consumer processes callback
    ↓
Verify signature → Update order → Publish OrderPaidEvent

Why async? The HTTP callback must return immediately to the gateway (within 2-3 seconds). Processing happens async via message queue.

Account Balance

Transaction Safety: Atomic operation with pessimistic locking (FOR UPDATE)
Balance Types: Only available_balance can be used (not frozen_balance)
Audit Trail: Every balance change logged to user_balance_change_log

Why pessimistic lock? Prevents race conditions if user attempts multiple simultaneous payments.

5. Package Delivery

Trigger: OrderPaidEvent published after payment confirmation

Expected Flow (delivery hook needs implementation):

OrderPaidEvent → DeliveryHook → Create Package Queue Items → Update Order Status

What happens:

Look up order and production details
Find master package of the package series (current version at purchase time)
Create N package queue items (N = production’s package_amount)
Link items to order via by_order field (for refund tracking)
Publish PackageQueuePushEvent
Update order status to Delivered

Critical: The package version is snapshot at purchase time. If the master package changes later, existing orders deliver the old version (version isolation).

Implementation Status: ⚠️ The delivery hook that consumes OrderPaidEvent is not yet visible in the codebase. This is the missing link between payment and package delivery.

6. Service Activation

Trigger: PackageQueuePushEvent published after package queue creation

Activation Logic:

Only ONE package per user can be Active at a time
If no active package: Activate oldest queued package (FIFO)
If active package exists: New packages remain in queue
When active package expires: Next queued package auto-activates

Package States:

InQueue: Waiting for activation
Active: Currently providing service
Consumed: Expired (time or traffic limit)
Cancelled: Refunded or cancelled by admin

7. Node Access

With an active package, users gain access to nodes filtered by available_group:

Active Package (available_group = 1)
    ↓
Query: WHERE node.available_groups && ARRAY[1]
    ↓
Returns: All nodes that include group 1 in their available_groups array

Access Control Chain:

Purchase Production → Receive Package → Package Activates → Defines available_group → Filters Nodes

No active package = no node access (empty list).

Cross-Module Interactions

Shop → Telecom (Package Delivery)

Event: OrderPaidEvent

Exchange: shop
Routing Key: order_paid
Consumer: Delivery hook (needs implementation in shop or telecom module)
Purpose: Trigger package queue creation after payment

Telecom → Telecom (Package Activation)

Event: PackageQueuePushEvent

Exchange: telecom
Routing Key: package_queue_push
Consumer: TelecomPackageQueueHook
Purpose: Auto-activate first package if user has none active

Shop → Market (Affiliate Rewards)

Event: OrderPaidEvent

Consumer: Market module
Purpose: Calculate and distribute affiliate commissions

Important Implementation Notes

For Backend Developers

Transaction Boundaries
- Balance payment: Single transaction includes balance deduction + order update
- Use FOR UPDATE to lock balance row during payment
- Order delivery: May need transaction spanning shop and telecom databases
Event-Driven Architecture
- Always publish events AFTER database commit (not before)
- Events must be idempotent (can be replayed safely)
- Use separate consumers for cross-module communication
Processor Pattern [[memory:6079830]]
- All service APIs exposed via Processor<Input, Result<Output, Error>>
- No object-oriented methods for business logic
- See existing services for reference
Security Considerations
- ePay callbacks: MUST verify signature before processing
- Balance operations: Use pessimistic locks to prevent race conditions
- Coupon validation: Re-validate at order creation (not just preview)
Missing Implementation
- Order delivery hook (consumes OrderPaidEvent) is not yet implemented
- This is why orders remain stuck in Paid status instead of moving to Delivered
- Implementation location: Either modules/shop/src/hooks/delivery.rs or modules/telecom/src/hooks/order.rs

For Frontend Developers

Order Status Polling
- After payment, poll GetOrderDetail until status becomes Delivered
- Recommended interval: 2-3 seconds
- Timeout: ~60 seconds (suggest manual refresh after)
ePay Redirect Handling
- Save order ID before redirecting to gateway
- User redirects back via epay_return_url configured in shop config
- On return: Check order status (payment may take a few seconds to process)
Error Handling
- All operations return result enums (not exceptions)
- Check result type before accessing response data
- Common errors: TooManyUnpaid, CouponInvalid, NotEnoughBalance, OrderNotFound
Balance vs ePay Decision
- Check user balance before showing payment options
- ePay: Redirect flow, user leaves app temporarily
- Balance: Instant payment, better UX if sufficient balance
Coupon UI/UX
- Verify coupon before order creation (show discount preview)
- Display applicable conditions (time window, usage limit, min amount)
- Show final price after discount in order summary

Configuration

Shop module config stored in Redis under key shop:

Field	Default	Purpose
`max_unpaid_orders`	5	Maximum unpaid orders per user
`auto_cancel_after`	30m	Timeout for automatic order cancellation
`epay_notify_url`	-	Server callback URL for payment notifications
`epay_return_url`	-	User redirect URL after payment

Common Issues and Solutions

Order Creation Fails with `TooManyUnpaid`

Cause: User has >= max_unpaid_orders unpaid orders
Solution: Cancel old unpaid orders or complete payment

Coupon Shows as Invalid

Causes:

Outside time window (check start_time and end_time)
Usage limit exceeded (global or per-account)
Production price below minimum amount (for amount-based discounts)

Solution: Check coupon conditions and inform user why it’s invalid

Balance Payment Fails

Causes:

Insufficient available_balance (frozen balance cannot be used)
Order already paid or cancelled
Concurrent payment attempt (transaction conflict)

Solution: Refresh balance, check order status, retry if conflict

ePay Callback Not Received

Causes:

epay_notify_url not publicly accessible
Firewall blocking gateway IPs
Signature verification failed

Solution: Check server logs, verify network config, confirm provider credentials

Order Stuck in `Paid` Status

Cause: Delivery hook not running or not implemented
Solution: Check RabbitMQ consumer status, verify OrderPaidEvent is being consumed

Packages Not Activating

Cause: PackageQueuePushEvent not triggering or hook not running
Solution: Check telecom module hooks, verify event publishing

Workflow Diagram

┌────────────────────────────────────────────────────────────────┐
│                        User Purchase Flow                       │
└────────────────────────────────────────────────────────────────┘

1. Browse Productions (filtered by user group + extra groups)
   └─> ProductionService.ListUserProduction

2. [Optional] Verify Coupon
   └─> CouponService.VerifyCoupon

3. Create Order (with optional coupon)
   └─> OrderService.CreateOrder
   └─> Validates: unpaid limit, production exists, coupon valid
   └─> Calculates final price with discount
   └─> Publishes: OrderCreatedEvent

4a. Pay with ePay
    └─> OrderService.GetEpayUrl (generate payment URL)
    └─> User redirects to gateway
    └─> Gateway calls back: POST /api/shop/epay/callback
    └─> EpayHook consumes EpayCallback event
    └─> OrderService.PayOrderWithEpay
    └─> Verifies signature, updates order
    └─> Publishes: OrderPaidEvent

4b. Pay with Balance
    └─> OrderService.PayOrderWithBalance
    └─> Atomic transaction: lock balance + deduct + update order
    └─> Publishes: OrderPaidEvent

5. Deliver Packages [⚠️ Needs Implementation]
   └─> DeliveryHook consumes OrderPaidEvent
   └─> Finds master package of production's package series
   └─> Creates N package queue items (N = package_amount)
   └─> Updates order status to Delivered
   └─> Publishes: PackageQueuePushEvent

6. Activate Service
   └─> TelecomPackageQueueHook consumes PackageQueuePushEvent
   └─> If no active package: activates oldest queued package
   └─> Publishes: PackageActivateEvent

7. User Accesses Nodes
   └─> Active package defines available_group
   └─> Nodes filtered by: available_groups && ARRAY[user_group]
   └─> User can generate subscription links and connect

Testing Considerations

Happy Path Testing

User with valid permissions can see productions
Coupon applies correct discount
Order creation succeeds with valid inputs
Payment updates order status to Paid
Packages are delivered automatically
First package activates immediately
User can access nodes matching package group

Edge Cases to Test

User at unpaid order limit (should reject new orders)
Coupon usage limit reached between verification and order creation
Production deleted after order created but before payment
Duplicate payment callbacks (idempotency check)
Concurrent balance payments (transaction locking)
User already has active package (new packages should queue)
Package series has no master package (should fail gracefully)

Error Recovery

Payment succeeds but delivery fails (manual intervention needed)
Partial delivery (transaction rollback)
Event replay (idempotent processing)
Network timeout during ePay redirect (order remains unpaid, can retry)

Account Balance

Account Balance is the user’s internal wallet system in Helium. It holds monetary value that users can use to pay for orders without external payment gateways. The system tracks two types of balance (available and frozen) and maintains a complete audit trail of all balance changes.

Core Concept

Each user has a single balance record with two components:

pub struct UserBalance {
    pub id: i64,
    pub user_id: Uuid,
    pub available_balance: Decimal,  // Spendable balance
    pub frozen_balance: Decimal,     // Temporarily locked
}

Available Balance: The amount user can spend on orders or withdraw. This is what users see as their “wallet balance”.

Frozen Balance: Temporarily locked funds that cannot be spent. Used for scenarios where balance needs to be reserved but not immediately consumed (e.g., pending transactions, dispute holds).

Balance Change Types

All balance modifications are categorized into four types:

Deposit: Adds to available balance (gift card redemption, admin top-up, refunds)
Consume: Deducts from available balance (order payment, admin deduction)
Freeze: Moves available balance to frozen balance (hold funds)
Unfreeze: Moves frozen balance back to available balance (release hold)

pub enum UserBalanceChangeType {
    Deposit,    // available_balance + amount
    Consume,    // available_balance - amount
    Freeze,     // available_balance - amount, frozen_balance + amount
    Unfreeze,   // frozen_balance - amount, available_balance + amount
}

Every balance change is automatically logged in user_balance_change_log with timestamp, amount, reason, and change type.

User Operations

Get Balance

Users query their current balance status:

Service: UserBalanceService::GetMyBalance

Returns the user’s UserBalance with both available and frozen amounts, or None if balance has not been initialized (should never happen post-registration).

List Balance Changes

Users can view their transaction history with pagination:

Service: UserBalanceService::ListMyBalanceChanges

Parameters:

limit, offset: Pagination controls
asc: Sort order (ascending/descending by created_at)

Returns a list of UserBalanceChangeLog entries showing:

Change amount (positive for deposits/unfreezes, negative for consumes/freezes)
Reason string (human-readable explanation)
Change type
Timestamp

Redeem Gift Card

Users can redeem gift cards to add balance:

Service: GiftCardService::RedeemGiftCardRequest

Flow:

Validate gift card exists and is not used/expired
Verify user exists
Add card amount to user’s available balance (transaction)
Log balance change with reason “Redeem Gift Card”
Mark gift card as redeemed with user ID and timestamp

Result Types:

Success: Balance credited, card redeemed
CardNotFound: Invalid secret code
AlreadyUsed: Card already redeemed by someone
Expired: Card past valid_until date
UserNotFound: User account doesn’t exist

Important: Gift card redemption is transactional. If any step fails, the entire operation rolls back.

Payment with Balance

Users can pay for orders using their available balance:

Service: OrderService::PayOrderWithBalance

Flow:

Verify order exists, belongs to user, and is unpaid
Check user has sufficient available balance
Transaction begins:
- Deduct order amount from available balance
- Log balance change with order reference
- Update order status to Paid
- Record paid_at timestamp
Emit OrderPaidEvent for downstream processing

Result Types:

Success: Order paid, balance deducted
OrderNotFound: Invalid order or already paid
NotEnoughBalance: Insufficient funds

Transaction Safety: The entire payment operation (balance deduction, log creation, order update) happens in a single database transaction. If any step fails, no changes are persisted.

Balance Initialization

User balances are automatically initialized when a new user registers:

Hook: RegisterHook consumes UserRegisterEvent from the auth module

Process:

Creates balance record with available_balance = 0 and frozen_balance = 0
Uses UpdateUserBalance with zero diffs (upsert behavior)
No change log entry created (zero change, reason is empty string)

Note: The UpdateUserBalance entity operation has built-in upsert logic:

INSERT INTO user_balance (user_id) VALUES ($1)
ON CONFLICT (user_id) DO NOTHING
RETURNING *

This means calling UpdateUserBalance on a non-existent user will initialize their balance first, then apply the change.

Admin Operations

Administrators can manage user balances through ManageService. All operations require appropriate AdminRole permissions and are logged in the audit system.

Change User Balance

Operation: AdminChangeUserBalance

Permissions: Moderator, SuperAdmin, CustomerSupport

Parameters:

user_id: Target user
amount: Change amount (always positive, type determines operation)
reason: Human-readable explanation (required for audit)
change_type: One of Deposit/Consume/Freeze/Unfreeze

Examples:

Manual top-up: { amount: 100, change_type: Deposit, reason: "Promotional credit" }
Correction: { amount: 50, change_type: Consume, reason: "Duplicate refund correction" }
Hold funds: { amount: 200, change_type: Freeze, reason: "Dispute investigation" }
Release hold: { amount: 200, change_type: Unfreeze, reason: "Dispute resolved" }

Important: The amount parameter is always a positive number. The operation type determines whether it’s added or subtracted:

Deposit: available += amount
Consume: available -= amount
Freeze: available -= amount, frozen += amount
Unfreeze: frozen -= amount, available += amount

List Balance Change Logs

Operation: AdminListUserBalanceLogs

Permissions: All admin roles

Returns paginated balance change history for a specific user, useful for customer support investigations.

Integration Points

Gift Cards

Gift cards are a primary source of balance deposits. When redeemed:

Card’s amount field is added to available balance
Card marked as used with used_by = user_id, redeem_at = NOW()
Balance change log created with change_type = Deposit

See Gift Card System for card management and generation.

Orders

Balance is consumed when users pay for orders via PayOrderWithBalance:

Order’s total_amount is deducted from available balance
Order transitions: Unpaid → Paid
OrderPaidEvent emitted for package delivery
Balance change log references the order

See Order System for complete payment flows.

Refunds

When orders are refunded (status: Refunding → Refunded), the original payment amount should be returned to the user’s balance. This is handled by admin operations manually or through automated refund processing.

Current Status: Manual refund workflow requires admin to use AdminChangeUserBalance with change_type: Deposit.

Database Schema

Key tables (see migrations/20250815133800_create_shop_entites.sql):

user_balance:

user_id UUID PRIMARY KEY
available_balance DECIMAL NOT NULL DEFAULT 0
frozen_balance DECIMAL NOT NULL DEFAULT 0

user_balance_change_log:

id BIGSERIAL PRIMARY KEY
user_id UUID NOT NULL
amount DECIMAL NOT NULL
reason TEXT NOT NULL
change_type user_balance_change_type NOT NULL
created_at TIMESTAMP NOT NULL DEFAULT NOW()

Indexes:

user_balance_change_log(user_id, created_at DESC): Efficient pagination of user transaction history
user_balance(user_id): Fast balance lookups (primary key)

Architecture Notes

Transaction Safety

All balance-modifying operations use database transactions:

Payment with balance: Locks order and balance rows with SELECT ... FOR UPDATE
Gift card redemption: Transaction ensures card can’t be double-redeemed
Admin changes: Atomic balance update + log insertion

Change Log Automation

The UpdateUserBalance entity operation automatically:

Creates or updates the balance record
Determines change type from diff signs
Inserts change log entry with correct amount/type
All in a single transaction

Developer Note: You should never manually insert into user_balance_change_log. Always use UpdateUserBalance to modify balance, which handles logging automatically.

Decimal Precision

All monetary values use rust_decimal::Decimal for precise arithmetic. This avoids floating-point errors in financial calculations. Decimal serializes as string in protobuf/JSON to preserve precision.

Frozen Balance Use Cases

Currently, frozen balance is supported in the data model but not actively used in the order flow. Potential future use cases:

Escrow for dispute resolution
Pre-authorization holds
Subscription renewals
Withdrawal processing delays

Best Practices

Always Provide Reason: When modifying balance via admin operations, provide clear, descriptive reasons. These appear in user transaction history and audit logs.
Check Balance Before Deduction: Always verify sufficient available balance before attempting payment operations to avoid transaction rollbacks.
Use Transactions: Any operation involving balance changes and other state updates (orders, gift cards) must be wrapped in a database transaction.
Don’t Bypass Change Logs: Never directly update user_balance table. Always use UpdateUserBalance to ensure change logs are created.
Validate Amounts: All balance operations should validate that amounts are positive and reasonable (not excessively large).

Frontend Integration

Balance Display:

Show available_balance as the user’s wallet balance
Optionally show frozen_balance if non-zero (with explanation)
Format decimals appropriately for currency display

Transaction History:

Display ListMyBalanceChanges with infinite scroll or pagination
Color-code change types: green for Deposit/Unfreeze, red for Consume/Freeze
Show reason string as transaction description
Format timestamps in user’s local timezone

Payment Method Selection:

When available_balance ≥ order total, enable “Pay with Balance” option
Show remaining balance after payment preview
Handle NotEnoughBalance error gracefully with top-up prompt

Gift Card Redemption:

Provide input field for gift card secret
Handle all RedeemGiftCardResult variants with appropriate messages
Refresh balance display after successful redemption

See Also:

Gift Card System - Card generation and management
Order System - Payment flows and order lifecycle
Shop Module Introduction - Overall shop architecture

Coupon

Coupon is the discount system that allows users to reduce their order total when purchasing productions. The system supports flexible discount strategies with comprehensive validation rules and usage limits.

Core Concept

A Coupon defines:

Discount strategy: How the discount is calculated (rate or amount)
Validity period: When the coupon can be used
Usage limits: How many times the coupon can be used globally and per user
Activation status: Whether the coupon is currently active

Data Model

pub struct Coupon {
    pub id: i32,
    pub code: String,                          // Unique code users enter
    pub is_active: bool,                       // Administrative on/off switch
    pub discount: Json<Discount>,              // Discount strategy
    pub start_time: Option<PrimitiveDateTime>, // When coupon becomes valid
    pub end_time: Option<PrimitiveDateTime>,   // When coupon expires
    pub time_limit_per_account: Option<i32>,   // Max uses per user
    pub time_limit_global: Option<i32>,        // Max total uses
    pub used_count: i32,                       // Current usage count
}

Discount Types

The system supports two discount strategies through the Discount enum:

1. Rate Discount

Applies a percentage discount to the order total, regardless of the order amount.

Discount::Rate(RateDiscount {
    rate: Decimal  // e.g., 0.20 for 20% off
})

Calculation: final_price = original_price × (1 - rate)
Use case: General promotions (e.g., “20% off any purchase”)
No minimum order requirement

2. Amount Discount

Subtracts a fixed amount from the order total, with a minimum order requirement.

Discount::Amount(AmountDiscount {
    min_amount: Decimal,  // Minimum order required
    discount: Decimal     // Amount to subtract
})

Calculation: final_price = original_price - discount (if original_price >= min_amount)
Use case: Threshold promotions (e.g., “$10 off orders over $50”)
Validation: Coupon is invalid if order doesn’t meet min_amount

The discount data is stored as JSON in the database, allowing flexible extension of discount strategies in the future.

Validation Rules

Coupon validation occurs in two places:

1. Pre-Purchase Verification (`VerifyCoupon`)

Allows users to check if a coupon is valid before creating an order. This provides immediate feedback in the UI.

Validation checks (in order):

Coupon exists (by code lookup)
Current time is after start_time (if set)
Current time is before end_time (if set)
Global usage hasn’t exceeded time_limit_global (if set)
User’s usage count hasn’t exceeded time_limit_per_account (if set)

Returns Option<Coupon> - None if any validation fails.

2. Order Creation Validation

When a user creates an order with a coupon, the system performs additional validation through coupon_applicable():

Additional checks:

All time-based validations (as above)
For Amount discounts: Production price must meet min_amount
Per-account usage limit is re-checked at the database level (race condition protection)

If validation fails during order creation, returns CreateOrderResult::CouponInvalid.

Integration with Orders

Order Creation Flow

When a user creates an order with a coupon:

Coupon lookup: Fetch coupon by ID
Applicability check: Validate using coupon_applicable()
Per-account limit check: Query database for user’s usage count
Price calculation: Apply discount to production price
Order creation: Store order with coupon_used field
Usage tracking: The order record links to the coupon for usage counting

Price Calculation

let mut amount = prod.price;

if let Some(coupon) = coupon {
    amount = match *coupon.discount {
        Discount::Rate(r) => amount * (Decimal::ONE - r.rate),
        Discount::Amount(a) => {
            if amount >= a.min_amount {
                amount - a.discount
            } else {
                amount  // Discount not applied if below minimum
            }
        }
    };
}

Usage Counting

The system tracks coupon usage through the orders table:

Orders store the coupon_used field (coupon ID)
used_count on the coupon is derived by counting orders that reference it
Per-user usage is counted via CountCouponUsageByUser query

Important: Usage counting is based on order creation, not order payment status. An unpaid order still counts toward usage limits.

Active Status and Code Uniqueness

Active Status (`is_active`)

The is_active flag allows administrators to enable/disable coupons without deletion:

Active: Coupon can be found and used
Inactive: Coupon is invisible to users but preserved in database

This is useful for:

Temporarily pausing a promotion
Testing coupons before public release
Historical record keeping

Code Uniqueness

The database enforces unique active codes through a partial index:

CREATE UNIQUE INDEX "idx_coupon_code"
ON "shop"."coupon" ("code")
WHERE is_active = TRUE;

Implications:

Multiple inactive coupons can share the same code
Only one active coupon can have a specific code at any time
This allows code reuse across different promotion periods

Management Operations

The system provides admin APIs for coupon lifecycle management:

CRUD Operations

Create: Generate new coupons with all configuration options
Update: Modify existing coupons (code, discount, limits, times)
List: Retrieve all coupons (no pagination - suitable for admin dashboard)
Get: Fetch individual coupon by ID or code
Delete: Permanently remove coupon from database

Note: Deleting a coupon does not cascade to orders. Orders retain the coupon_used ID even if the coupon is deleted.

Time Management

All timestamps use Unix epoch format in the API but are stored as TIMESTAMP WITHOUT TIME ZONE in the database:

API layer converts between Unix timestamps and PrimitiveDateTime
All time comparisons use UTC
start_time and end_time are optional - omitting them means no time restriction

Design Decisions

Why JSON for Discount?

Storing discount as JSON enables:

Easy addition of new discount strategies without schema changes
Type-safe handling through Rust’s serde deserialization
Database-level storage of complex discount rules

Why Count Orders, Not Payments?

Usage limits count order creation, not successful payments, because:

Prevents abuse through repeated unpaid orders
Simplifies usage tracking (no need to track order status changes)
Protects limited-use coupons from reservation attacks

Why Separate Verification API?

The VerifyCoupon endpoint exists separately from order creation to:

Provide immediate UI feedback without creating an order
Allow frontend to show applicable discounts before purchase
Reduce unnecessary order creation for invalid coupons

Frontend Integration Points

When implementing the coupon UI:

Code Entry: Call VerifyCoupon as user types/submits coupon code
Visual Feedback: Display discount type and amount from returned coupon
Price Preview: Calculate and show discounted price before order creation
Order Creation: Pass coupon_id (not code) in CreateOrderRequest
Error Handling: Handle COUPON_INVALID result with user-friendly message

Key point: The verification step returns a Coupon object with an id field. Use this ID when creating the order, not the code string.

EPay Support

EPay (易支付) is the payment gateway integration that enables third-party payment processing through aggregator services. The system supports multiple payment providers with different channels, handles async payment callbacks, and ensures payment security through signature verification.

Core Concept

EPay acts as an abstraction layer over payment aggregators that support:

AliPay (alipay)
WeChat Pay (wxpay)
USDT cryptocurrency (usdt)

The system is designed to support multiple providers simultaneously, each with their own credentials and enabled payment channels. This allows failover capability and regional/method-specific provider selection.

Data Model

pub struct EpayProviderCredential {
    pub id: i32,
    pub display_name: String,           // User-facing provider name
    pub enabled_channels: Vec<EpaySupportedChannels>,
    pub enabled: bool,                  // Admin on/off switch
    pub key: String,                    // Merchant secret key
    pub pid: i32,                       // Merchant ID
    pub merchant_url: String,           // Gateway endpoint
}

Provider Library

The libs/epay crate provides:

Signature generation: MD5-based signing for payment requests
Signature verification: Validate callbacks to prevent fraud
Request/Response types: Type-safe payment gateway communication
Channel enumeration: Standardized payment method identifiers

Payment Flow Architecture

The EPay payment flow involves multiple stages with async processing:

1. Payment URL Generation

User Journey: User creates order → selects provider and channel → receives payment URL

Process:

User calls GetPaymentUrl RPC with order ID, provider ID, and channel
System loads provider credentials from database
System generates signed payment request using provider’s key
Returns redirect URL: {merchant_url}?{signed_parameters}

The signed parameters include order details, callback URLs, and an MD5 signature. The signature ensures the gateway can verify the request came from an authorized merchant.

2. External Payment

User is redirected to the EPay gateway (external site) where they complete payment through their chosen method (AliPay, WeChat, USDT). This happens entirely outside the system.

3. Async Callback Processing

Critical Architecture: The callback must return immediately to the gateway (within 2-3 seconds), so processing happens asynchronously through RabbitMQ.

Flow:

EPay Gateway → POST /api/shop/epay/callback → Publish to RabbitMQ → Return 200 OK
                                                       ↓
                                            EpayHook Consumer
                                                       ↓
                                            Verify Signature
                                                       ↓
                                            Update Order Status
                                                       ↓
                                            Publish OrderPaidEvent

Components:

api/epay.rs: HTTP endpoint that receives gateway callback
events/epay.rs: EpayCallback event definition
hooks/epay.rs: EpayHook consumer that processes the callback

Why Async?: Payment gateways expect immediate HTTP responses. If the server takes too long, the gateway may retry the callback multiple times, potentially causing duplicate processing.

4. Callback Verification

Security Model: All callbacks MUST verify the signature before processing.

Verification Process:

Extract callback parameters (order ID, amount, status, etc.)
Load provider credentials from database using pid from callback
Reconstruct signature using provider’s secret key
Compare computed signature with received signature
Reject if signatures don’t match

Protection Against:

Forged callbacks from malicious actors
Man-in-the-middle attacks
Replay attacks with modified amounts

5. Idempotency Handling

Callbacks can be received multiple times due to network retries. The system handles this by:

Checking order status before processing
Only updating unpaid orders
Returning success for already-paid orders

Multi-Provider System

Provider Discovery

Frontend clients discover available providers via ListEpayProviders RPC:

pub struct EpayProviderSummary {
    pub id: i32,
    pub display_name: String,
    pub enabled_channels: Vec<EpaySupportedChannels>,
}

Query Filters:

Only returns providers where enabled = TRUE
Excludes providers with empty enabled_channels
Filters out channels not in the enabled list

This allows dynamic provider selection in the UI based on current availability.

Provider Selection

When requesting a payment URL, the user specifies:

Provider ID: Which payment aggregator to use
Channel: Which payment method (alipay, wxpay, usdt)

The system validates:

Provider exists and is enabled
Requested channel is in provider’s enabled_channels
Order is unpaid and belongs to the requesting user

Provider Management

The enabled flag (added via migration 20250929232831) allows administrators to:

Temporarily disable problematic providers without deletion
Switch between providers during incidents
A/B test different payment gateways
Phase in new providers gradually

Database Operations:

Providers are managed via admin interface or direct database access
No gRPC APIs exist for provider CRUD (admin-only operation)
Credentials are redacted in logs for security

Configuration

EPay requires configuration in the shop module config:

{
  "shop": {
    "epay_notify_url": "https://your-domain.com/api/shop/epay/callback",
    "epay_return_url": "https://your-domain.com/payment/success",
    ...
  }
}

Configuration Fields

epay_notify_url: Server-to-server callback endpoint (async notification)
epay_return_url: User redirect URL after payment (browser redirect)

Important Distinctions:

notify_url: Backend webhook for payment processing (reliable)
return_url: Frontend redirect for user experience (unreliable)

Never rely on return_url for order processing. Users may close the browser before redirecting. Always use the notify_url callback for payment confirmation.

Provider Credentials

Providers are stored in the shop.epay_provider_credential table:

INSERT INTO shop.epay_provider_credential (
  display_name,
  enabled_channels,
  enabled,
  key,
  pid,
  merchant_url
) VALUES (
  'My Payment Provider',
  ARRAY['alipay', 'wxpay']::text[],
  true,
  'your-merchant-secret-key',
  1234,
  'https://pay.provider.com/submit.php'
);

Obtaining Credentials: Register with an EPay-compatible payment aggregator to receive merchant credentials (PID, Key, Gateway URL).

Integration with Order System

Order Fields

Orders track EPay payment through:

paid_with_epay_provider: Stores provider ID when payment URL is generated
payment_method: Set to channel (AliPay, WeChat, USDT) after payment
order_status: Updated from Unpaid to Paid on successful callback

Payment Method Mapping

The system maps EPay channels to internal payment methods:

EpaySupportedChannels::AliPay => PaymentMethod::AliPay
EpaySupportedChannels::WeChatPay => PaymentMethod::WeChat
EpaySupportedChannels::Usdt => PaymentMethod::Usdt

Event Publishing

When a callback successfully processes:

Order status updated to Paid
OrderPaidEvent published to RabbitMQ (shop.order_paid)
Downstream consumers (e.g., market module) react to the event

Error Handling

Callback Validation Failures

If signature verification fails:

Log warning (may indicate exposed webhook or malicious request)
Return error to gateway (gateway may retry with correct signature)
Do NOT update order status

Order State Errors

If order is not found or already paid:

Return success to gateway (prevent infinite retries)
Log the incident for monitoring

Provider Not Found

If the callback references an unknown provider:

Cannot verify signature (no key available)
Log error and return failure

Frontend Integration Points

When implementing EPay payment UI:

List Providers: Call ListEpayProviders to get available providers and channels
Display Options: Show provider names and channel icons (AliPay, WeChat, USDT)
Request Payment: Call GetPaymentUrl with selected provider ID and channel
Redirect User: Open payment URL in browser or webview
Handle Return: When user returns via return_url, poll order status to confirm payment
Status Polling: Use GetOrderById to check if payment completed

Key Points:

Payment confirmation happens via backend callback, not frontend redirect
Frontend should poll order status after user returns
Don’t assume payment succeeded just because user returned to app
Handle timeout scenarios (user abandons payment gateway)

Design Decisions

Why Multi-Provider Support?

Supporting multiple providers enables:

Failover: Switch to backup provider if primary has issues
Regional Optimization: Use different providers for different regions
Rate Shopping: Select providers with better fees for specific channels
Risk Distribution: Avoid single point of failure

Why Async Callback Processing?

Payment gateways expect fast responses (< 3 seconds). Database queries, signature verification, and event publishing can exceed this threshold. Async processing via RabbitMQ ensures:

Immediate HTTP response to gateway
Reliable processing with automatic retries
Decoupled webhook handling from business logic

Why Store Provider ID on Order?

When generating a payment URL, the system stores the provider ID in paid_with_epay_provider. This enables:

Signature verification (need provider’s key)
Callback validation (ensure callback matches expected provider)
Analytics and reporting (which provider processed the payment)

Why MD5 Signatures?

MD5 is cryptographically weak but widely used by Chinese payment aggregators. The EPay library uses MD5 for compatibility with existing gateway implementations. The signature prevents tampering but should not be considered cryptographically secure.

Gift Card

Gift Card is a prepaid credit system that allows users to redeem balance into their account. Gift cards have a secret code, a fixed monetary value, an expiration date, and can only be used once. Administrators can generate gift cards in bulk or create special cards with custom secrets.

Core Concept

A Gift Card represents:

Secret code: Unique 64-character alphanumeric string for redemption
Amount: Fixed value credited to user’s balance upon redemption
Expiration: Time limit after which the card becomes invalid
Single-use: Once redeemed, the card is permanently marked as used

Data Model

pub struct GiftCard {
    pub id: i32,
    pub secret: String,                     // 64-char alphanumeric code
    pub amount: Decimal,                    // Value to credit
    pub used_by: Option<Uuid>,              // User who redeemed (if any)
    pub created_at: PrimitiveDateTime,      // Creation timestamp
    pub redeem_at: Option<PrimitiveDateTime>, // When it was redeemed
    pub valid_until: PrimitiveDateTime,     // Expiration date
}

Gift Card Lifecycle

1. Generation

Gift cards are created by administrators through bulk generation or special creation:

Bulk Generation (AdminGenerateGiftCard):

Generates N cards with identical amount and expiration
Secrets are randomly generated (64-character alphanumeric)
Returns list of secret codes for distribution
Uses hash set to ensure uniqueness before database insertion

Special Creation (AdminCreateSpecialGiftCard):

Creates a single card with custom secret (e.g., promotional codes like “WELCOME2025”)
Useful for marketing campaigns or personalized gifts
Secret must be unique (database constraint)

Permissions: Moderator, SuperAdmin, CustomerSupport

2. Distribution

Gift cards exist as secret codes. Administrators must distribute these codes to users through external channels (email, physical cards, promotional materials). The system does not handle distribution automatically.

3. Redemption

Users redeem gift cards through GiftCardService::RedeemGiftCard:

Flow:

User submits secret code
System looks up valid gift card (not used, not expired)
Validates user exists
Transaction begins:
- Add card amount to user’s available balance
- Log balance change (type: Deposit, reason: “Redeem Gift Card”)
- Mark card as used with used_by and redeem_at
Transaction commits

Validation Order:

Card exists by secret → CardNotFound
Card not already used → AlreadyUsed
Card not expired (valid_until > NOW()) → Expired
User exists → UserNotFound
Transaction succeeds → Success

Important: The validation provides specific error reasons. If a card is found but invalid, the system distinguishes between “already used” and “expired” to provide clear feedback.

4. Expiration

Expired cards remain in the database but cannot be redeemed:

Query filter: valid_until > NOW() AND used_by IS NULL
Redemption attempt returns Expired result
No automatic cleanup of expired cards (historical record)

User Operations

Redeem Gift Card

Service: GiftCardService::RedeemGiftCard

Users provide a secret code to add balance to their account.

Result Types:

Success: Balance credited successfully
CardNotFound: Secret doesn’t exist in database
AlreadyUsed: Card was previously redeemed by any user
Expired: Card’s valid_until date has passed
UserNotFound: Requesting user account doesn’t exist (edge case)

Transaction Safety: The redemption process is fully transactional. If the balance update fails, the card remains unused. This prevents double-redemption and ensures balance consistency.

Admin Operations

All gift card admin operations are exposed through ManageService and require appropriate admin roles.

Generate Gift Cards

Operation: AdminGenerateGiftCard

Bulk-creates gift cards with identical configuration.

Parameters:

number: How many cards to generate (batch size)
amount: Value of each card
valid_until: Expiration timestamp for all cards

Returns: List of secret codes (strings)

Use Cases:

Promotional campaigns (e.g., 1000 cards worth $10 each)
Customer rewards programs
Event giveaways

Note: The operation returns the secret codes in the response. These should be securely stored or distributed immediately, as they cannot be retrieved later (secrets are logged as [REDACTED] in debug output for security).

Create Special Gift Card

Operation: AdminCreateSpecialGiftCard

Creates a single card with a custom secret.

Parameters:

secret: Custom code (e.g., “NEWYEAR2025”)
amount: Card value
valid_until: Expiration timestamp

Use Cases:

Marketing promotions with memorable codes
Influencer partnerships with branded codes
VIP customer gifts

Database Constraint: The secret must be unique across all gift cards (used or unused). Attempting to create a duplicate secret will fail.

List Gift Cards

Operation: AdminListGiftCards

Retrieves gift cards with optional filtering and pagination.

Filters:

filter_id: Specific card ID
filter_secret: Partial or exact secret match
filter_is_used: Show only used or unused cards
filter_used_by: Cards redeemed by specific user
limit, page: Pagination controls

Use Cases:

Finding cards redeemed by a user (customer support)
Checking if a secret exists before creation
Monitoring unused expired cards
Audit trail of card usage

Delete Gift Cards

Operation: AdminDeleteGiftCards

Permanently removes gift cards from the database.

Parameters: List of card IDs to delete

Permissions: Moderator, SuperAdmin (more restricted than other operations)

Use Cases:

Removing expired promotional campaigns
Cleaning up unused cards after campaign ends

Warning: Deleting a redeemed card does not affect user balance. The balance change log remains intact with the reason “Redeem Gift Card”, but the reference to the card is lost.

Integration with Balance System

Gift card redemption is the primary way users add funds to their account (other than admin top-ups).

Balance Credit Flow

When a gift card is redeemed:

User’s available_balance increases by card amount
A UserBalanceChangeLog entry is created:
- change_type: Deposit
- amount: Card value (positive)
- reason: “Redeem Gift Card”
User can immediately use the balance to pay for orders

See Account Balance for complete balance system documentation.

Transaction Integrity

The redemption uses database transactions with row-level locking:

SELECT ... FOR UPDATE on gift card prevents concurrent redemption
Balance update and card marking happen atomically
If balance update fails (user not found), card remains unused

Security Considerations

Secret Generation

Secrets are 64-character random alphanumeric strings (A-Z, a-z, 0-9):

Entropy: ~380 bits (62^64 combinations)
Collision probability: Negligible even for millions of cards
Not cryptographically signed or verifiable offline

Uniqueness: The system generates secrets in memory using a HashSet before database insertion, ensuring no duplicates in a batch. Database constraint provides final uniqueness guarantee.

Debug Output Redaction

The GiftCard struct implements custom Debug to redact secrets:

.field("secret", &"[REDACTED]")

This prevents accidental secret leakage in logs, error messages, or debug traces.

No Secret Recovery

Once generated, secrets cannot be retrieved by ID. Admins must save the secret codes from the generation response. This is intentional—secrets are meant to be distributed, not stored centrally.

Database Schema

Key schema details (from migrations/20250922063531_gift_card_system.sql):

Table: shop.gift_card

id          SERIAL PRIMARY KEY
secret      VARCHAR(255) NOT NULL UNIQUE
amount      NUMERIC NOT NULL
used_by     UUID REFERENCES auth.user_profile(id)
created_at  TIMESTAMP NOT NULL DEFAULT NOW()
redeem_at   TIMESTAMP
valid_until TIMESTAMP NOT NULL

Indexes:

(secret): Fast lookup by secret (primary redemption path)
(secret) WHERE used_by IS NULL: Fast lookup for valid cards
(valid_until) WHERE used_by IS NULL: Expiration queries
(used_by): Find cards redeemed by a user

Foreign Key: used_by references user profile with ON DELETE RESTRICT, preventing user deletion if they’ve redeemed cards.

Frontend Integration Points

User Redemption Flow

Input Field: Provide text input for secret code (64 characters)
Validation: Optional client-side format check (alphanumeric, length)
Submission: Call RedeemGiftCard with the secret
Result Handling:
- Success: Show success message, update balance display
- CardNotFound: “Invalid gift card code”
- AlreadyUsed: “This gift card has already been redeemed”
- Expired: “This gift card has expired”
- UserNotFound: Generic error (should never happen)
Balance Refresh: Fetch updated balance after successful redemption

Admin Management UI

Generate Cards:

Form with number, amount, expiration date
Display generated secrets in a list (with copy buttons)
Warn user to save secrets before leaving page

List Cards:

Table with columns: ID, Secret (masked/copyable), Amount, Status (Used/Unused), Used By, Created, Redeemed, Expires
Filters for used/unused, user, date ranges
Search by secret or ID

Create Special Card:

Form with custom secret input, amount, expiration
Validate secret format before submission
Handle duplicate secret error clearly

Design Decisions

Why Single-Use Only?

Gift cards are one-time redeemable by design:

Simplifies balance tracking (single deposit event)
Prevents confusion about remaining card balance
Matches traditional physical gift card behavior
Users can check their balance history for redemptions

For recurring credits or subscriptions, use coupon system or scheduled balance deposits instead.

Why No Secret Retrieval?

Secrets are treated as bearer tokens:

Admin generates and distributes them
System validates but doesn’t need to recall them
Reduces risk of centralized secret exposure
Aligns with physical gift card model (code is on the card)

Admins can search by secret if a user provides it (customer support), but cannot list all secrets.

Why Soft Delete Not Supported?

Unlike coupons (which have is_active), gift cards are either used or deleted:

Once distributed, cards shouldn’t be “disabled” (already in user’s hands)
Unused cards can be deleted if campaign is cancelled
Used cards should be kept for audit trail (don’t delete redeemed cards)

Why Expiration is Required?

All gift cards must have a valid_until date:

Prevents indefinite liability on the system
Aligns with legal/financial regulations for prepaid instruments
Encourages timely redemption
Allows cleanup of old campaigns

Set far-future dates (e.g., 10 years) for effectively non-expiring cards if needed.

See Also:

Account Balance - Balance system and transaction history
Order System - Using balance to pay for orders
Shop Module Introduction - Overall shop architecture

Notification Module

The Notification module provides system-wide announcements and user notification preference management. It enables administrators to broadcast messages to users with different priorities and targeting options, while allowing users to control what types of notifications they wish to receive.

Core Features

Announcements

Announcements are system-wide messages that can be displayed to users. Key characteristics:

Priority Levels: Four priority levels (Journal, Info, Warning, Urgent) to indicate message importance
User Targeting: Announcements can be targeted to specific user groups and extra groups
Pinning: Important announcements can be pinned to stay at the top
Persistence: All announcements are stored in the database for historical reference

Notification Settings

Each user has personalized notification preferences that control:

Login Notifications: Whether to receive email notifications on login
Marketing Communications: Opt-in/out for promotional emails
Service Alerts: Notifications for package expiration and other service events

These settings are stored per-user and can be modified through the user-facing API.

Architecture

Services

The module exposes two gRPC services:

NotificationService: User-facing API for viewing announcements and managing personal notification settings
NotificationManageService: Admin-facing API for CRUD operations on announcements

Events & Hooks

The module uses an internal event system for announcement lifecycle management:

AnnouncementCreatedEvent: Published when a new announcement is created, consumed internally for potential side effects (e.g., cache invalidation, push notifications)
Event routing uses RabbitMQ with the notification exchange

Database Schema

The module uses its own notification PostgreSQL schema with two main tables:

announcement: Stores announcement data with GIN indexes on user group arrays for efficient targeting
settings: Stores per-user notification preferences, linked to user profiles via foreign key

Integration Points

With Auth Module

Notification settings are tied to user profiles through foreign key relationships
User group information is used for announcement targeting

With Mailer Module

While not directly coupled, the notification settings (especially send_login_email and receive_marketing_email) are designed to be consumed by the mailer module when sending emails.

Development Notes

All APIs follow the Processor pattern (not OOP)
The module uses RabbitMQ for internal event handling
Announcement targeting uses PostgreSQL array types with GIN indexes for performance
User groups are stored as integer arrays for flexible targeting

Announcement

Announcement is a system-wide messaging feature that allows administrators to broadcast important information to targeted user groups. Announcements support priority levels, user targeting, and pinning capabilities to ensure critical messages reach the right users.

Core Concept

An Announcement represents a broadcast message with:

Title and Content: Message headline and body text
Priority Level: Visual importance indicator (Journal, Info, Warning, Urgent)
User Targeting: Specify which user groups should see the announcement
Pinning: Pin important announcements to the top of the list
Persistence: All announcements are stored permanently for historical reference

Data Model

pub struct Announcement {
    pub id: i64,
    pub title: String,
    pub content: String,
    pub user_group: Vec<i32>,           // Target user groups
    pub user_extra_groups: Vec<i32>,    // Target extra user groups
    pub is_pinned: bool,                // Whether pinned to top
    pub priority: AnnouncementPriority, // Visual importance
    pub created_at: PrimitiveDateTime,
    pub updated_at: PrimitiveDateTime,
}

Priority Levels

Announcements have four priority levels that indicate message importance:

Journal: Routine informational messages (default priority)
Info: General information that users should be aware of
Warning: Important messages requiring user attention
Urgent: Critical messages that need immediate attention

Priority levels are purely visual indicators—they don’t affect targeting or delivery. Frontend implementations should style these differently to draw appropriate attention.

User Targeting

Announcements use a flexible targeting system based on user groups:

Targeting Logic

An announcement is visible to a user if either condition is true:

User’s primary user_group matches any group in announcement’s user_group array
User’s user_extra_groups has any overlap with announcement’s user_extra_groups array

This OR-based logic allows administrators to target announcements broadly or narrowly:

Broadcast to all: Include all possible user groups
Target specific roles: Specify only relevant user groups
Mixed targeting: Combine primary and extra groups for fine-grained control

Empty Targeting

If both user_group and user_extra_groups are empty arrays, the announcement won’t be visible to any users. This can be useful for drafting announcements before activating them.

Announcement Lifecycle

1. Creation

Administrators create announcements through AdminCreateAnnouncement:

Process:

Admin specifies title, content, targeting, priority, and pin status
Announcement is inserted into database
AnnouncementCreatedEvent is published to RabbitMQ
Event hook can trigger side effects (cache invalidation, push notifications)

Permissions: SuperAdmin, Moderator

Event System: Creation triggers an internal event for extensibility. Currently, the event hook is a placeholder for future features like real-time push notifications or cache updates.

2. Display

Users retrieve announcements through NotificationService::ListAnnouncements:

Behavior:

Returns announcements targeted to the requesting user
Sorted by pinned status first (pinned on top), then by creation date (newest first)
Limited to 20 announcements per request
No pagination—shows the 20 most relevant announcements

Targeting Query: Uses PostgreSQL array operators (= ANY() for primary group, && for array overlap) with GIN indexes for performance.

Single Announcement: Users can also fetch a specific announcement by ID via GetAnnouncement, useful for detail pages or deep links.

3. Updates

Administrators can edit existing announcements through AdminEditAnnouncement:

Editable Fields: All fields (title, content, targeting, priority, pinning) can be modified

Effect: Changes are immediate—users will see updated content on next fetch

No History: The system doesn’t track edit history. If audit trail is needed, it’s handled through the admin operation logging system.

4. Deletion

Administrators can permanently remove announcements through AdminDeleteAnnouncement:

Permissions: SuperAdmin, Moderator

Effect: Hard delete from database—no soft delete or archiving

Use Cases:

Removing outdated announcements
Cleaning up test announcements
Deleting announcements posted in error

User Operations

List Announcements

Service: NotificationService::ListAnnouncements

Retrieves all announcements targeted to the current user, ordered by relevance (pinned first, then newest).

Response: Returns up to 20 announcements. No pagination—users see the most important/recent messages.

Targeting: Automatically filtered based on user’s group membership—no need to specify groups in request.

Get Single Announcement

Service: NotificationService::GetAnnouncement

Fetches a specific announcement by ID.

Use Case: Detail pages, direct links, or refreshing a single announcement without fetching the entire list.

Access Control: No additional access control beyond targeting—if an announcement exists and targets the user, they can retrieve it by ID.

Admin Operations

All admin operations require SuperAdmin or Moderator roles and are exposed through NotificationManageService.

Create Announcement

Operation: AdminCreateAnnouncement

Broadcasts a new message to users.

Parameters:

title: Message headline
content: Full message body (supports long text, formatting handled by frontend)
user_group: Array of primary user group IDs to target
user_extra_groups: Array of extra user group IDs to target
is_pinned: Whether to pin to top of list
priority: Visual importance indicator

Returns: New announcement ID

Event: Triggers AnnouncementCreatedEvent for extensibility (future push notifications, cache invalidation)

List Announcements

Operation: AdminListAnnouncements

Retrieves all announcements (not filtered by user targeting) for management purposes.

Pagination: Uses limit and offset for pagination

Sorting: Returns announcements in creation order (newest first)

Use Case: Admin dashboard showing all system announcements regardless of targeting

Edit Announcement

Operation: AdminEditAnnouncement

Updates an existing announcement.

Parameters: All fields can be modified (same as creation)

Returns: Updated announcement object

Effect: Changes are immediate—users see updated content on next fetch

Delete Announcement

Operation: AdminDeleteAnnouncement

Permanently removes an announcement from the database.

Parameters: Announcement ID

Returns: Empty response on success

Warning: Hard delete with no recovery mechanism. Ensure announcement should be permanently removed.

Pinning Behavior

Pinned announcements always appear at the top of the list, regardless of creation date:

Sorting Order:

Pinned announcements (ordered by creation date, newest first)
Unpinned announcements (ordered by creation date, newest first)

Use Cases:

Pin urgent system maintenance notices
Keep important policy changes visible
Highlight time-sensitive information

Frontend Consideration: Pinned announcements should be visually distinct (e.g., pin icon, different background) to indicate their importance.

Database Schema

Key schema details (from 20250814171450_create_entities_for_notification_module.sql):

Table: notification.announcement

id                BIGSERIAL PRIMARY KEY
title             TEXT NOT NULL
content           TEXT NOT NULL
user_group        INTEGER[] NOT NULL
user_extra_groups INTEGER[] NOT NULL
is_pinned         BOOLEAN NOT NULL DEFAULT FALSE
priority          announcement_priority NOT NULL DEFAULT 'journal'
created_at        TIMESTAMP NOT NULL DEFAULT NOW()
updated_at        TIMESTAMP NOT NULL DEFAULT NOW()

Indexes:

GIN index on user_group: Fast targeting queries
GIN index on user_extra_groups: Fast targeting queries
B-tree index on is_pinned: Efficient pinned-first sorting
B-tree index on priority: Potential future priority-based queries

No Foreign Key: User groups are stored as integer arrays with no foreign key constraint, providing flexibility for group management.

Event System

The announcement system uses internal events for lifecycle hooks:

Event: AnnouncementCreatedEvent

Published by: ManageService::AdminCreateAnnouncement
Consumed by: Internal AnnouncementEventHook (extensibility placeholder)
Route: notification.announcement_created on notification exchange
Payload: announcement_id, created_at timestamp

Current Implementation: The event hook is a placeholder. Future implementations might:

Invalidate frontend caches
Send push notifications to targeted users
Trigger webhooks for external integrations
Log analytics events

Frontend Integration Points

User Announcement Display

List View:

Call ListAnnouncements on page load
Display announcements with priority-based styling:
- Urgent: Red/high-contrast styling
- Warning: Yellow/amber styling
- Info: Blue/neutral styling
- Journal: Default/subtle styling
Show pinned announcements with visual indicator (pin icon)
Limit display to 20 announcements (no pagination needed)

Detail View:

Provide “Read More” links using announcement ID
Call GetAnnouncement to fetch full content
Display full message with timestamp and priority indicator

Real-time Updates: Consider polling ListAnnouncements periodically (e.g., every 5 minutes) or implementing WebSocket for real-time announcement delivery.

Admin Management UI

Create Form:

Title input (required)
Content textarea (supports long text)
User group multi-select (checkboxes or dropdown)
Extra user groups multi-select
Priority dropdown (Journal, Info, Warning, Urgent)
Pin checkbox

List View:

Table showing all announcements
Columns: ID, Title, Priority, Pinned, Target Groups, Created, Updated
Edit and delete actions
Pagination controls (limit/offset)

Edit Form:

Pre-populate with existing values
Allow modification of all fields
Show last updated timestamp

Design Decisions

Why No Read Tracking?

Announcements don’t track which users have read them:

Simplicity: Avoids per-user state management
Broadcast Nature: Announcements are informational, not actionable
Scalability: No need to store read state for thousands of users
Use Case: For messages requiring acknowledgment, use notification systems with explicit confirmation flows

Why Hard Delete Instead of Soft Delete?

Announcements are permanently deleted rather than archived:

Clean Data: Old announcements clutter management UI
No Recovery Need: If an announcement needs to persist, don’t delete it
Admin Control: Admins have full control over lifecycle
Audit Trail: Admin operation logging provides deletion records if needed

Why Limit to 20 Announcements?

User-facing list is capped at 20 without pagination:

Relevance: Users don’t need to see dozens of old announcements
Performance: Keeps queries fast with simple sorting
UX: Encourages admins to keep announcements current
Workaround: For historical access, provide search/filter in admin panel

Why OR-Based Targeting?

Users see announcements if they match either primary group or extra groups:

Flexibility: Allows broad or narrow targeting
Ease of Use: Simpler than complex boolean logic
Common Use Case: “Show to all premium users OR all beta testers”
Empty Arrays: Provide draft mode (not visible to anyone)

See Also:

Notification Module Introduction - Module overview and architecture
Auth Module - User group management (user_group and user_extra_groups)

Notification Settings

Notification Settings provide per-user preferences that control which types of email notifications and alerts a user receives. This enables users to opt-in or opt-out of different notification categories according to their preferences.

Core Concept

Each user has a personalized notification settings record that controls three types of notifications:

Login Notifications (send_login_email): Email alerts when the user logs in
Marketing Communications (receive_marketing_email): Promotional and marketing emails
Service Alerts (notify_package_expired): Notifications about package/subscription expiration

These settings are stored per-user and are designed to be consumed by other modules (primarily the Mailer module) when deciding whether to send notifications.

Data Model

pub struct NotificationSettings {
    pub id: Uuid,                           // User ID (foreign key to auth.user_profile)
    pub send_login_email: bool,
    pub receive_marketing_email: bool,
    pub notify_package_expired: bool,
    pub created_at: PrimitiveDateTime,
    pub updated_at: PrimitiveDateTime,
}

Default Behavior

Notification settings use a lazy creation approach:

Settings are not automatically created when a user signs up
When fetching settings for a user without a record, the API returns default values
Settings are only created in the database when the user first modifies them

Default Values:

send_login_email: false (login notifications disabled by default)
receive_marketing_email: false (marketing emails disabled by default)
notify_package_expired: true (service alerts enabled by default)

This design reduces database writes for users who never change their preferences while still providing predictable defaults.

Notification Types

Purpose: Alert users when someone logs into their account

Use Case: Security monitoring—users can detect unauthorized access attempts

Default: Disabled (false)

Consumer: Mailer module checks this setting when the Auth module publishes login events

Frontend Consideration: Present as “Email me when I log in” checkbox in user settings

Marketing Email Notifications

Purpose: Promotional emails, feature announcements, and marketing campaigns

Use Case: Allow users to opt-out of non-essential marketing communications while still receiving critical service updates

Default: Disabled (false)

Consumer: Mailer module or external marketing systems check this before sending promotional content

Compliance: Important for GDPR and email marketing regulations—users must explicitly opt-in

Package Expiration Notifications

Purpose: Alert users when their service package/subscription is about to expire or has expired

Use Case: Ensure users are aware of expiring services and can take action (renew, upgrade, etc.)

Default: Enabled (true)

Consumer: Telecom module or background jobs check this before sending expiration alerts

Why Enabled by Default: Service continuity—users generally want to know when their service is expiring

User Operations

All operations are exposed through the NotificationService gRPC service and automatically use the authenticated user’s ID from the request middleware.

Get Notification Settings

Operation: GetMyNotificationSettings

Retrieves the current user’s notification preferences.

Authentication: User ID extracted from request middleware

Behavior:

If settings exist: Returns stored preferences
If settings don’t exist: Returns default values (without creating a database record)

Use Case: Display current preferences in user settings page

Set Notification Settings

Operation: SetNotificationSettings

Updates the user’s notification preferences.

Authentication: User ID extracted from request middleware

Parameters: All three boolean flags (must provide all values, not partial updates)

Behavior: Uses UPSERT operation—creates record if none exists, updates if it does

Effect: Changes are immediate—other modules will see updated preferences on their next check

Design Note: Requires all three flags to prevent partial updates. Frontend should fetch current settings first, then send all values with modifications.

Database Schema

Table: notification.settings

id                      UUID PRIMARY KEY
                        REFERENCES auth.user_profile(id) ON DELETE CASCADE
send_login_email        BOOLEAN NOT NULL DEFAULT FALSE
receive_marketing_email BOOLEAN NOT NULL DEFAULT FALSE
notify_package_expired  BOOLEAN NOT NULL DEFAULT TRUE
created_at              TIMESTAMP NOT NULL DEFAULT NOW()
updated_at              TIMESTAMP NOT NULL DEFAULT NOW()

Key Characteristics:

Primary key is user ID (foreign key to auth.user_profile)
Cascade delete: Settings are removed when user account is deleted
No additional indexes needed (lookups by primary key are already fast)
Automatic timestamps via database triggers

Integration Points

With Auth Module

Settings table references auth.user_profile via foreign key
User ID is the primary key—one settings record per user
Cascade delete ensures data consistency when users are removed

With Mailer Module

While not directly coupled (no foreign keys or direct queries), the notification settings are designed to be consumed by the mailer module:

Integration Pattern:

Mailer receives event (e.g., “user logged in”, “package expiring”)
Mailer queries notification settings for target user(s)
Mailer respects user preferences before sending email

Batch Operations: The FindNotificationSettingsByIds processor supports fetching settings for multiple users efficiently, useful for bulk email campaigns.

With Telecom Module

Service expiration notifications are checked against notify_package_expired before sending alerts about package/subscription expirations.

Frontend Integration Points

User Settings Page

Display Current Settings:

Call GetMyNotificationSettings on page load
Pre-populate checkboxes with returned values
Handle both existing settings and default values transparently

Save Changes:

Gather all three checkbox states (even unchanged ones)
Call SetNotificationSettings with all three values
Show confirmation message on success
No need to re-fetch—changes are immediate

UI Recommendations:

Group notifications by category (Security, Marketing, Service)
Provide clear descriptions of what each setting controls
Consider warning users before disabling critical notifications (package expiration)
Show last updated timestamp if available

Example Layout:

Notification Preferences
━━━━━━━━━━━━━━━━━━━━━━━

Security Notifications
☐ Email me when I log in

Marketing & Promotions
☐ Receive promotional emails and feature announcements

Service Alerts
☑ Notify me when my package is about to expire

Account Registration Flow

No Action Needed: Settings don’t need to be created during registration. Users will see default values until they modify preferences.

Design Decisions

Why Lazy Creation?

Settings are only created when users first modify them:

Pros:

Reduces database writes for users who never change settings
Simpler registration flow (one less insert operation)
No storage cost for default preferences

Cons:

API must handle missing records gracefully
Queries can’t use JOIN without LEFT JOIN

Decision: The storage savings and reduced write load outweigh the minor complexity of handling missing records.

Why Return Defaults Instead of 404?

When settings don’t exist, the API returns default values rather than an error:

Reasoning:

Better UX—frontend doesn’t need special handling for new users
Predictable behavior—defaults are consistent
Simpler frontend code—no need to distinguish “not set” from “default”

Why Require All Three Flags on Update?

The SetNotificationSettings API requires all three boolean values (no partial updates):

Reasoning:

Prevents accidental resets—frontend must be intentional about all values
Simpler implementation—no need to handle partial updates
Clear semantics—“set these preferences” not “update some preferences”
Atomic updates—all settings change together

Frontend Impact: Must fetch current settings first, then send all values. This is the standard pattern for settings pages anyway.

Why No Granular Notification Types?

Currently only three notification types exist:

Trade-off:

Fewer types = simpler UX, less overwhelming for users
Limited flexibility—can’t opt-in to some marketing emails but not others

Extensibility: If more granular control is needed in the future, additional boolean fields can be added to the schema without breaking existing APIs (protobuf supports field addition).

Why No Notification Channels?

Settings don’t distinguish between email, SMS, push notifications:

Current Design: All settings assume email notifications

Future Expansion: If other channels (SMS, push, in-app) are added:

Could add separate fields (e.g., send_login_email_sms, send_login_email_push)
Or restructure to nested channel preferences
Current design doesn’t block future expansion

See Also:

Notification Module Introduction - Module overview and architecture
Announcement - System-wide messaging feature
Mailer Module - Email sending and template management

Support Module

The Support module provides a ticket-based customer support system enabling two-way communication between users and administrators. It handles ticket creation, message threading, status management, and access control for both user-facing and admin-facing operations.

Core Features

Ticket Management

Tickets are conversation threads with lifecycle management:

Priority Levels: Three priority levels (Low, Medium, High) to help admins triage support requests
Status Transitions: Automatic status updates based on who sends messages (Open → Pending → Resolved → Closed)
Ownership Validation: Each ticket is tied to a specific user; access control ensures users can only see their own tickets
Title-based Organization: Each ticket has a title for quick identification in lists

Messaging System

Real-time threaded messaging within tickets:

Dual-author Model: Messages can be authored by either users or admins (never both on same message)
Message Editing: Both users and admins can edit their own messages (with edit timestamp tracking)
Admin-only Deletion: Admins can delete their own messages only (not for general moderation)
Status-aware Sending: Messages cannot be sent to Resolved or Closed tickets

Message Lists

Messages are returned with a simple limit-based approach:

User-facing API: Returns messages with enriched author metadata (name, avatar, email) via FrontendTicketMessage type
Admin-facing API: Returns plain TicketMessage objects without enriched metadata
Default Limit: Both APIs use a 100-message limit per request

Architecture

Services

The module exposes two gRPC services:

SupportService: User-facing API for creating tickets, viewing own tickets, sending messages, and closing own tickets
SupportAdminService: Admin-facing API with capabilities including viewing all tickets, sending responses, editing/deleting own messages, and manually resolving/closing any ticket

All service methods follow the Processor pattern for consistency with the rest of the codebase.

RBAC Integration

Admin operations integrate with the manage module’s role-based access control system:

Allowed Roles: SuperAdmin, Moderator, CustomerSupport, and SupportBot can access all admin operations
Audit Logging: All admin operations are automatically logged via the RecordedAdminOperation wrapper
Operation Validation: Each operation validates the admin’s role before execution

Status Transition Logic

Ticket statuses automatically transition based on message activity:

User sends message: Pending/Resolved tickets become Open (indicating new user activity requiring attention)
Admin replies: Open tickets become Pending (indicating admin has responded)
Admin closes ticket: Any non-Closed ticket can be closed manually
Closed/Resolved tickets: Cannot receive new messages (enforced by validation)

This state machine ensures tickets naturally reflect their support workflow status without manual updates.

Input Validation

The module includes comprehensive input validation:

Content validation: Message length limits, empty content checks (enforced at gRPC layer)
Ownership checks: Users can only access their own tickets and messages
Status checks: Operations respect ticket status (e.g., no messages on closed tickets)

Integration Points

With Auth and Manage Modules

Ticket ownership is tied to user UUIDs from the auth system
User authentication context is passed through auth::rpc::middleware::UserId to determine ticket access
Admin authentication is passed through manage::rpc::middleware::AdminId for admin operations
Admin authorization is verified through the manage module’s RBAC system

Database Schema

The module uses its own support PostgreSQL schema with:

ticket table: Core ticket data with status enum and priority enum
ticket_message table: Threaded messages with dual-author columns (user_author_id OR admin_author_id)
Foreign key relationships to auth module’s user tables

Development Notes

All APIs follow the Processor pattern (not OOP) [[memory:6079830]]
The module does NOT use static lifetimes for database connections [[memory:7107428]]
Ticket IDs are UUIDs, but message IDs are auto-incrementing integers for simpler database management
Message ownership validation happens at the service layer, not database layer
The FrontendTicketMessage type includes enriched author metadata (name, avatar, email) joined from auth module tables
Admin operations use the impl_recorded_admin_processor! macro to automatically wrap operations with audit logging

Ticket System

Ticket System provides structured two-way communication between users and administrators for customer support. Each ticket is a conversation thread with lifecycle management, priority levels, and automatic status transitions based on message activity.

Core Concept

A Ticket represents:

Conversation thread: Series of messages between a user and admin team
Priority: Low, Medium, or High (set by user at creation, helps admins triage)
Status: Open, Pending, Resolved, or Closed (automatically transitions based on activity)
Ownership: Tied to a specific user; only that user and admins can access it
Title: Brief description for quick identification in ticket lists

Ticket Lifecycle

Status States

Open: User has sent a message, awaiting admin response
Pending: Admin has replied, awaiting user feedback
Resolved: Admin has marked the issue as resolved
Closed: Ticket is archived, no further messages allowed

Automatic Transitions

Status changes automatically based on who sends messages:

User sends message → Open (even if previously Pending or Resolved)
Admin replies to Open ticket → Pending
Admin manually resolves → Resolved
Admin manually closes → Closed

This state machine ensures tickets naturally reflect their support workflow without manual status management for most interactions.

Message Restrictions

Resolved/Closed tickets: Cannot receive new messages
Open/Pending tickets: Both users and admins can send messages
Users can only edit their own messages; admins can only edit their own messages
Admins can only delete their own messages (not for general moderation)

Architecture

Dual Service Design

The module exposes two separate gRPC services:

SupportService (User-facing):

Create tickets
List own tickets
View ticket details (ownership validated)
Send and edit own messages
Close own tickets

SupportAdminService (Admin-facing):

List all tickets (with pagination and optional unreplied filter)
View any ticket detail
Send admin responses
Edit own messages
Delete own messages
Manually close or resolve tickets

This separation ensures clear permission boundaries and prevents accidental exposure of admin capabilities.

RBAC and Audit Logging

Admin operations integrate with the manage module’s RBAC system:

Allowed Roles: SuperAdmin, Moderator, CustomerSupport, and SupportBot
Audit Trail: All admin operations are logged with operation name, target, and input parameters
Authorization: Role verification happens before operation execution via AdminOperation trait

Message Threading

Messages within a ticket follow these rules:

Dual-author model: Each message has either user_author_id OR admin_author_id (never both)
Edit tracking: edited_at timestamp records when a message was last modified
Author metadata:
- User-facing API returns FrontendTicketMessage with enriched author data (name, avatar, email) joined from auth tables
- Admin-facing API returns plain TicketMessage objects without enriched metadata
Sequential IDs: Messages use auto-incrementing integers for efficient database operations

Message Retrieval

Messages are returned with a simple limit-based approach:

Default Limit: 100 messages per request
Ordering: Messages are ordered by sent_at timestamp in descending order (newest first)
No Pagination: Currently, there is no pagination support; all messages up to the limit are returned at once

Access Control

User Permissions

Users can only view tickets they created (validated by user_id match)
Users can only edit messages they authored
Ticket ownership is checked on every operation (list, view, send message)
Failed ownership checks return empty results or permission denied errors

Admin Permissions

Admins can access all tickets through SupportAdminService
Admin permissions are verified through the manage module’s RBAC system
Admins can perform operations like viewing all tickets, sending responses, and manually changing ticket status
Admins can only edit or delete their own messages (ownership is checked)

Validation Layer

Input validation includes:

Title validation: Non-empty, length limits (enforced at gRPC layer)
Content validation: Non-empty, length limits for messages (enforced at gRPC layer)
Ownership checks: Service-level validation verifies user owns the ticket/message before operations

Integration Points

Auth Module

Ticket user_id references users from auth module
User authentication context flows through auth::rpc::middleware::UserId for user operations
Admin authentication flows through manage::rpc::middleware::AdminId for admin operations

Database Schema

Lives in support PostgreSQL schema:

ticket table: Core ticket data with custom enums for priority and status
ticket_message table: Messages with dual-author columns
Foreign keys to auth module’s user tables for referential integrity

Development Notes

Ticket IDs are UUIDs for global uniqueness
Message IDs are integers for simpler database management
All service operations use the Processor pattern
Status transition logic is implemented in TicketStatus enum methods (on_user_message(), on_admin_reply())
Status transitions happen automatically within the CreateTicketMessage database transaction
User-facing API gets enriched message data (FrontendTicketMessage) with author metadata joined from user/admin tables
Admin operations use the impl_recorded_admin_processor! macro for automatic audit logging
Admin message edit/delete operations enforce ownership validation at the service layer

Email Sender

The Email Sender (mailer) module provides email delivery capabilities for the entire system. It operates as an asynchronous event-driven service that consumes email send requests from other modules via RabbitMQ and delivers them through a configured SMTP server.

Core Features

SMTP Integration

The module uses standard SMTP protocol for email delivery:

STARTTLS Support: Configurable TLS encryption for secure email transmission
Authentication: Username/password-based SMTP authentication
Plain Connection Option: Support for unencrypted connections (for testing or trusted networks)
Dynamic Configuration: SMTP settings can be reloaded without service restart

Event-Driven Architecture

All email sending is triggered through RabbitMQ events:

Asynchronous Processing: Email sending doesn’t block the calling service
Reliable Delivery: RabbitMQ ensures events are not lost if the mailer is temporarily unavailable
Decoupled Design: Other modules don’t need direct dependencies on mailer code

Template System

Pre-defined email templates with embedded assets:

HTML Templates: Rich HTML email templates with inline styling
Embedded Assets: Logo and other assets are compiled into the binary (no external files needed)
Template Types: Register verification, OTP codes, password reset, package expiration notifications

Architecture

Configuration Storage

SMTP configuration is stored in Redis using the Manage module’s config provider system:

Config Key: mailer
Hot Reload: The module periodically checks Redis for config changes and rebuilds the SMTP connection if needed
Sensitive Data Protection: Passwords are redacted in debug logs

Configuration fields:

host: SMTP server hostname
port: SMTP server port (typically 587 for STARTTLS, 25 for plain)
username: SMTP authentication username
password: SMTP authentication password
sender: Default sender email address (used when from is not specified)
starttls: Whether to use STARTTLS encryption

Event Consumption

The module consumes two types of events:

Generic Email Events (MailSendCall): Direct email sending with full control over content
- Queue: helium_mailer_send
- Exchange: mailer (Direct)
- Routing Key: send
Template-based Email Events: Events from other modules (primarily Auth and Telecom) that trigger template rendering
- Register verification emails
- OTP email codes
- Password reset emails
- Package expiration notifications

Hooks (Event Processors)

Each hook implements the Processor pattern to handle specific event types:

MailerHook: Core processor for MailSendCall events, handles actual SMTP transmission
RegisterEmailHook: Processes user registration verification emails
OtpEmailHook: Processes one-time password emails for 2FA
ForgotPasswordEmailHook: Processes password reset emails
AllPackageExpiredHook: Processes telecom package expiration notifications

All hooks follow the standard pattern: receive event → render template (if needed) → publish MailSendCall → SMTP delivery.

Integration Points

With Auth Module

The mailer consumes three types of email events from Auth:

RegisterEmailSendCall: When a new user registers
OtpEmailSendCall: When OTP authentication is required
ForgotPasswordEmailSendCall: When a user requests password reset

With Telecom Module

AllPackageExpiredEvent: Sent when all of a user’s telecom packages expire

With Manage Module

Uses Manage’s config_provider to fetch SMTP configuration from Redis
SMTP settings can be updated via Manage’s admin configuration APIs

Email Sending Flow

When another module needs to send an email:

The calling module publishes an event (either a specific template event or a generic MailSendCall)
If it’s a template event, the corresponding hook receives it and:
- Fetches necessary data from the database (e.g., user email)
- Renders the HTML template
- Publishes a MailSendCall event
MailerHook receives the MailSendCall event
SMTP connection is used to transmit the email
Errors are logged but don’t crash the worker (events can be retried by RabbitMQ)

Development Notes

All email content is sent as HTML with Content-Type: text/html
The SMTP sender can be overridden per-email via the from field in MailSendCall
Email templates use the askama templating engine
The module doesn’t expose any gRPC services (it’s worker-only)
All processors follow the Processor pattern (not OOP)
Email addresses are validated before sending; invalid addresses result in Error::InvalidInput
The module doesn’t track email delivery status or bounces (delegated to SMTP server)

Worker Setup

The mailer runs as a dedicated worker process that needs:

PostgreSQL connection (for template data like user emails)
Redis connection (for SMTP configuration)
RabbitMQ connection (for event consumption)

Refer to the helium-server deployment documentation for worker configuration details.

Built-in WAF

Shield implements three main security mechanisms:

Hashcash-based CAPTCHA - Proof-of-work challenges to prevent automated attacks
Rate Limiting - Token bucket algorithm to control API request rates
Anti-XSS Utilities - Input sanitization functions to prevent cross-site scripting

Components

Hashcash CAPTCHA

A proof-of-work based CAPTCHA system that requires clients to solve computational puzzles before performing sensitive operations. The frontend requests a challenge with specified difficulty and TTL, solves it locally, and submits the solution for verification.

Use cases:

Protecting registration/login endpoints
Preventing automated form submissions
Rate limiting expensive operations

The challenge-response cycle is stateful and stored in Redis with configurable expiration.

Rate Limiting Middleware

Per-user rate limiting using the token bucket algorithm. The middleware can be applied to any gRPC endpoint to automatically enforce rate limits based on user identity.

Configuration parameters:

api_name: Identifier for the rate limit bucket
capacity: Maximum tokens available
refill: Token refill rate (tokens per second)
cost: Tokens consumed per request
ttl: Bucket expiration time

Rate limit state is maintained in Redis using atomic Lua scripts. The middleware extracts user identity from request extensions (requires authentication middleware to run first).

Anti-XSS Utilities

Three sanitization functions for different content types:

anti_xss_text(): Basic HTML entity encoding for plain text
anti_xss_markdown(): Sanitizes markdown while preserving safe formatting, with allowlist-based filtering for images and links
anti_xss_enhanced(): Advanced protection that detects and removes script injections, event handlers, and dangerous schemes

When to use:

Sanitize user-generated content before storage or display
Clean markdown in announcements, tickets, or comments
Validate URLs and embedded content

Architecture Notes

Redis Dependency

Both hashcash and rate limiting rely on Redis for state management. The hashcash service stores active challenges, while rate limiting maintains token bucket state with atomic updates.

Middleware Integration

RateLimitLayer is a Tower middleware that integrates with the gRPC server stack. It must be placed after authentication middleware to access user identity from request extensions.

Performance Considerations

Hashcash difficulty should be tuned based on client capabilities (mobile vs desktop)
Rate limit refill rates should balance user experience with system protection
Anti-XSS functions are synchronous and relatively lightweight, safe to use in request paths

Rate Limiter

The Rate Limiter provides per-user request throttling using the token bucket algorithm. It’s designed to protect API endpoints from abuse while maintaining good user experience through gradual token refill.

Purpose

Rate limiting prevents users from overwhelming the system with too many requests. Unlike simple fixed-window counters, the token bucket algorithm allows for burst traffic while maintaining average rate limits over time.

How It Works

Token Bucket Algorithm

Each user gets a virtual “bucket” of tokens stored in Redis:

Capacity: Maximum tokens the bucket can hold
Refill Rate: Tokens added per second
Cost: Tokens consumed per request
TTL: Bucket expiration time (for cleanup of inactive users)

When a request arrives, the system checks if enough tokens are available. If yes, tokens are consumed and the request proceeds. If no, the request is rejected with RESOURCE_EXHAUSTED status.

Tokens gradually refill over time based on the refill rate, allowing users to make requests again after waiting.

Per-User Enforcement

Rate limits are enforced per-user, identified by their user ID from the authentication middleware. Each API endpoint can have its own rate limit configuration with a unique identifier.

Bucket keys in Redis follow the pattern: rate_limit:API[{api_name}]-{user_id}

Integration

As Middleware

RateLimitLayer is a Tower middleware that wraps individual gRPC service methods. Apply it to specific endpoints that need rate limiting:

use shield::rpc::middleware::{RateLimitLayer, RateLimitConfig};
use std::time::Duration;

let rate_limit = RateLimitLayer::new(
    redis_connection,
    RateLimitConfig {
        api_name: "CreateOrder",
        capacity: 10.0,
        refill: 1.0,  // 1 token per second
        cost: 1.0,
        ttl: Duration::from_secs(3600),
    }
);

let service = OrderServiceServer::new(order_service)
    .layer(rate_limit);

Configuration Parameters

api_name: Unique identifier for the rate limit bucket (used in Redis key)
capacity: Maximum burst size (how many requests can be made immediately)
refill: Token recovery rate in tokens per second
cost: Tokens consumed per request (usually 1.0, but can be higher for expensive operations)
ttl: How long to keep inactive buckets in Redis

Tuning Guidelines

High-frequency, lightweight endpoints:

Capacity: 30-100
Refill: 10-30 tokens/second
Cost: 1.0

Moderate-frequency endpoints:

Capacity: 10-20
Refill: 1-5 tokens/second
Cost: 1.0

Expensive operations (e.g., report generation):

Capacity: 3-5
Refill: 0.1-0.5 tokens/second
Cost: 1.0

Architecture

Redis-Backed State

Rate limit state is stored in Redis hash structures with two fields:

tokens: Current token count (float)
last_refill: Last refill timestamp (unix timestamp)

A Lua script executes atomically on Redis to:

Calculate elapsed time since last refill
Add refilled tokens (capped at capacity)
Check if enough tokens available
Consume tokens if allowed
Update state and TTL

This ensures thread-safe, distributed rate limiting across multiple server instances.

Authentication Dependency

The middleware requires UserId to be present in the request extensions, which is set by the authentication middleware. Therefore:

Rate limiting middleware must be applied after authentication middleware in the layer stack
Unauthenticated requests will be rejected before rate limiting is evaluated
Anonymous/public endpoints cannot use this middleware (use hashcash CAPTCHA instead)

Error Handling

When rate limit is exceeded, the middleware returns:

Status: RESOURCE_EXHAUSTED
Message: "Rate limit exceeded"

Frontend should handle this gracefully by:

Displaying user-friendly messages
Implementing exponential backoff for retries
Showing remaining quota if applicable (requires separate API)

When to Use Rate Limiting

Good use cases:

Order creation and payment endpoints
Data export and report generation
Account modification operations
Support ticket creation

When NOT to use:

Read-only queries (unless very expensive)
Authentication endpoints (use hashcash instead)
Static content delivery
Public announcement viewing

Implementation Notes

Uses the Processor pattern for the core rate limiting logic
Middleware integrates with Tower/tonic service stack
All Redis operations are atomic via Lua scripts
Float precision is used for tokens to support fractional costs and refill rates
TTL prevents memory leaks from abandoned user sessions

Hashcash Captcha

The Hashcash Captcha provides proof-of-work based challenge-response authentication to prevent automated attacks. Unlike traditional image-based CAPTCHAs, it requires clients to perform computational work, making it accessible and bot-resistant.

Purpose

Hashcash CAPTCHA protects sensitive operations from automated abuse by requiring clients to solve a cryptographic puzzle. The difficulty can be tuned to balance security and user experience - higher difficulty means more computation time required.

How It Works

Proof-of-Work Challenge

The system uses a challenge-response protocol:

Client requests a challenge - Specifies desired difficulty (19-35) and time-to-live
Server generates a random 32-byte challenge - Stores it in Redis with the specified TTL
Client solves the puzzle - Finds a nonce value that when hashed with the challenge produces a hash with the required number of leading zero bits
Client submits the solution - Sends back the challenge ID and nonce
Server verifies - Checks the solution and deletes the challenge if valid (one-time use)

The difficulty parameter controls how many leading zero bits are required in the SHA256 hash output. Each increment doubles the expected computation time.

Challenge Lifecycle

Challenges are stateful and stored in Redis:

Each challenge has a unique 16-byte ID
TTL ensures challenges expire and don’t accumulate
Successfully verified challenges are deleted immediately (prevents replay attacks)
Expired or non-existent challenges return NotFound result

Frontend Integration

Basic Workflow

1. Before submitting sensitive operation (login, registration, etc.)
2. Call RequestCaptcha with difficulty and TTL
3. Receive challenge_id, challenge bytes, and difficulty
4. Solve: Find nonce where SHA256(challenge + nonce) has required leading zero bits
5. Call VerifyCaptcha with challenge_id and nonce
6. Check result: Pass/Fail/NotFound
7. If Pass, proceed with the protected operation

Solving the Challenge

Frontend must implement the proof-of-work algorithm:

Hash function: SHA256
Input: challenge_bytes + nonce_bytes (nonce as 8-byte big-endian u64)
Goal: Find nonce where hash has difficulty leading zero bits
Method: Brute force increment nonce from 0 until valid hash found

JavaScript/TypeScript developers should use the Web Crypto API or a library like crypto-js for SHA256 hashing. For performance-critical cases, consider using WebAssembly.

Difficulty Guidelines

Difficulty 19-22 (Very Easy):

Solves in <100ms on modern devices
Use for non-critical operations or mobile clients

Difficulty 23-26 (Easy to Medium):

Solves in 100ms - 1 second
Good balance for most use cases (login, registration)

Difficulty 27-30 (Medium to Hard):

Solves in 1-10 seconds
Use for high-value operations (password reset, account deletion)

Difficulty 31-35 (Very Hard):

Solves in 10+ seconds
Use sparingly, mainly for administrative or highly sensitive operations

Note: Difficulty is exponential - each +1 doubles the average solve time.

TTL Recommendations

Set TTL based on expected solve time plus user think time:

Easy challenges: 30-60 seconds
Medium challenges: 60-120 seconds
Hard challenges: 120-300 seconds

Too short: Users may timeout while solving Too long: Increases Redis memory usage and replay attack window

Verification Results

Three possible outcomes from VerifyCaptcha:

Pass: Solution is correct, challenge consumed and deleted
Fail: Solution is incorrect, challenge remains valid (can retry)
NotFound: Challenge expired, already used, or never existed

Frontend should handle NotFound by requesting a new challenge, and Fail by continuing to solve or requesting a new one.

Architecture

Redis-Backed Storage

Challenges are stored in Redis with keys: hashcash:{hex(challenge_id)}

Each challenge contains:

16-byte unique ID (random)
32-byte random challenge data
Difficulty value (19-35)
TTL expiration

Redis automatically cleans up expired challenges, and successful verifications delete them immediately to prevent replay attacks.

Stateless and Distributed

The system is stateless from the application perspective - all state lives in Redis. This allows multiple server instances to handle challenge requests and verifications without coordination.

When to Use Hashcash

Good use cases:

Registration and login endpoints
Password reset requests
Anonymous or pre-authentication operations
Expensive public endpoints (email sending, report generation)
Rate limiting supplement for unauthenticated users

When NOT to use:

Authenticated user operations (use rate limiting instead)
High-frequency read operations
Mobile-first applications (difficulty must be tuned lower)
Accessibility-critical features (consider alternatives)

Implementation Notes

Uses the Processor pattern for challenge creation and verification
Challenge IDs must be exactly 16 bytes
Difficulty validation: must be in range [19, 35] inclusive
TTL validation: must be at least 1 second
One challenge = one verification (single-use)
Verification is atomic: check and delete happen together
All cryptographic operations use SHA256

Anti-XSS

The Anti-XSS utilities provide input sanitization functions to protect against cross-site scripting (XSS) attacks. These are synchronous, lightweight functions designed to sanitize user-generated content before storage or display.

Purpose

XSS attacks occur when malicious scripts are injected into web pages through user input. The anti-XSS utilities help prevent these attacks by removing or encoding potentially dangerous content while preserving legitimate formatting and functionality.

Sanitization Functions

anti_xss_text

Basic HTML entity encoding for plain text content. Converts special characters to their HTML entity equivalents to prevent HTML/script injection.

Encoded characters:

<, >, &, ", ', /

When to use:

Plain text user inputs (usernames, comments, descriptions)
Content that should contain no HTML or markdown
Simple text fields where formatting is not needed

anti_xss_markdown

Sanitizes markdown while preserving safe formatting. Uses allowlist-based filtering to keep legitimate markdown features while blocking dangerous content.

Features:

Strips all HTML tags completely
Validates image URLs against domain allowlist (configurable in code)
Validates link schemes (allows http, https, mailto only)
Preserves safe markdown formatting (headers, lists, emphasis, etc.)
Removes images from disallowed domains
Strips URLs from links with dangerous schemes (but keeps link text)

When to use:

Markdown content in announcements
Support ticket descriptions
User comments that support formatting
Any user-generated markdown that will be rendered

Domain Allowlist: The function checks image URLs against ALLOWED_IMAGE_DOMAINS constant. Modify this list in the source code to add trusted image hosting services.

anti_xss_enhanced

Advanced protection that actively detects and removes script injections before encoding. Combines pattern matching with entity encoding for multi-layered defense.

Protected against:

<script> tags (including multiline)
Dangerous URL schemes (javascript:, vbscript:, data:)
Event handler attributes (onclick, onload, etc.)
Embedded objects (<iframe>, <object>, <embed>)

Process:

Detects and removes script patterns using regex
Applies HTML entity encoding to remaining content

When to use:

Rich text content from untrusted sources
Content that might contain complex HTML
Extra protection layer for high-risk inputs
Any content where XSS risk is elevated

Integration

Backend Usage

These are utility functions exported from the shield::utils::anti_xss module. Call them directly in your service logic before storing or processing user input:

use shield::utils::anti_xss::{anti_xss_text, anti_xss_markdown, anti_xss_enhanced};

// Sanitize before storage
let clean_username = anti_xss_text(&user_input);
let clean_announcement = anti_xss_markdown(&announcement_text);
let clean_content = anti_xss_enhanced(&rich_text);

Frontend Considerations

Backend sanitization is the primary defense, but frontend should also:

Validate input before submission
Use appropriate HTML escaping when rendering
Leverage framework-level XSS protection (React’s JSX escaping, Vue’s template binding, etc.)
Never use innerHTML or equivalent with user content
Use markdown renderers with built-in sanitization

When to Apply Sanitization

Always sanitize:

User-submitted text, markdown, or HTML
Content from external APIs or third-party sources
Any data that will be displayed to other users
Content stored in databases that renders on frontend

Timing:

Before storage (recommended): Sanitize once when data enters the system
Before display: Sanitize when rendering if storage is raw

Defense in depth: For high-risk scenarios, apply multiple layers:

Frontend validation
Backend sanitization (these functions)
Frontend rendering protection (framework escaping)

Architecture Notes

Synchronous and Lightweight

All three functions are synchronous and suitable for use in request handlers. They don’t perform I/O or heavy computation, making them safe to call inline during request processing.

Regex-Based Detection

The sanitization uses regex patterns for detection. The patterns are compiled once using lazy_static for performance. Complex patterns (multiline, case-insensitive) ensure robust detection of script injection attempts.

No External Dependencies

The utilities rely only on standard Rust regex and URL parsing libraries. No external sanitization services or heavyweight parsers are required.

Limitations

These utilities provide solid protection for common XSS vectors, but they are not a complete security solution:

Not a WAF: These functions handle input sanitization, not request-level filtering
Not HTML parsing: They use regex-based pattern matching, not full HTML/markdown parsers
Allowlist maintenance: Image domain allowlist must be manually maintained in code
Markdown edge cases: Complex or malformed markdown might not be handled perfectly

Additional security measures:

Use Content Security Policy (CSP) headers in frontend
Apply proper output encoding in templates
Enable framework-level XSS protection
Regular security audits of user-facing features
Keep dependencies updated

Implementation Notes

Functions are pure and stateless (no side effects)
All functions return new strings (input is not modified)
Empty strings are handled safely
Regex compilation errors have fallback patterns
Domain checking for images uses full URL parsing
Link text is preserved even when URL is removed

Basic Design

Admin Dashboard

User Web Application

`helium-server` Crate

The Helium server is designed as a multi-mode worker system that can run different components independently or together, enabling flexible deployment strategies. Each worker mode serves a specific purpose in the overall system architecture.

Architecture

Worker Modes

The server supports six distinct worker modes:

Worker Mode	Port	Description	Use Case
`grpc`	50051	gRPC API server	Main API for client applications and admin panels
`subscribe_api`	8080	RESTful subscription API	Public subscription endpoints
`webhook_api`	8081	RESTful webhook handler	Payment provider callbacks, third-party integrations
`consumer`	-	Background message consumer	Processing async tasks from message queue
`mailer`	-	Email service worker	Sending emails and notifications
`cron_executor`	-	Scheduled task executor	Running periodic maintenance tasks

Dependencies

The server requires three core infrastructure components:

PostgreSQL: Primary database for persistent data
Redis: Caching, session storage, and temporary data
RabbitMQ (AMQP): Message queuing for async processing

Module Integration

The server integrates all business logic modules:

auth: Authentication and authorization
shop: E-commerce and billing
telecom: VPN node management and traffic handling
market: Affiliate and marketing systems
notification: Announcements and messaging
support: Customer support tickets
manage: Administrative functions
shield: Security and anti-abuse measures

Deployment Guide

Prerequisites

PostgreSQL, Redis, and RabbitMQ servers accessible
SQLx CLI: cargo install sqlx-cli --no-default-features --features postgres
Environment variables configured (see below)

Environment Configuration

The server is configured entirely through environment variables:

Required Variables

# Worker mode selection
WORK_MODE="grpc"  # or subscribe_api, webhook_api, consumer, mailer, cron_executor

# Database connection
DATABASE_URL="postgres://user:password@localhost/helium_db"

# Message queue connection
MQ_URL="amqp://user:password@localhost:5672/"

# Redis connection
REDIS_URL="redis://localhost:6379"

Optional Variables

# Server listen address (for API workers)
LISTEN_ADDR="0.0.0.0:50051"  # grpc mode default
LISTEN_ADDR="0.0.0.0:8080"   # subscribe_api mode default
LISTEN_ADDR="0.0.0.0:8081"   # webhook_api mode default

# Cron executor scan interval (seconds)
SCAN_INTERVAL="60"  # cron_executor mode only

# OpenTelemetry Collector endpoint (optional, for observability)
OTEL_COLLECTOR="http://otel-collector:4317"  # See Observability guide

Note: For comprehensive observability with distributed tracing and metrics, see the Observability with OpenTelemetry guide.

Database Migration

⚠️ CRITICAL: Database migrations must be run before starting the application.

# Install SQLx CLI
cargo install sqlx-cli --no-default-features --features postgres

# Apply all pending migrations
sqlx migrate run --database-url $DATABASE_URL

# Verify migration status
sqlx migrate info --database-url $DATABASE_URL

Basic Deployment

Running the Server

# Apply database migrations first
sqlx migrate run --database-url $DATABASE_URL

# Start the server with desired worker mode
WORK_MODE=grpc ./helium-server

Multiple Worker Deployment

For production, run different worker modes as separate processes:

# Terminal 1: Main gRPC API
WORK_MODE=grpc ./helium-server

# Terminal 2: Background consumer
WORK_MODE=consumer ./helium-server

# Terminal 3: Email worker
WORK_MODE=mailer ./helium-server

# Terminal 4: Cron jobs
WORK_MODE=cron_executor ./helium-server

Logging

The server uses structured logging:

# Enable debug logging
RUST_LOG=debug ./helium-server

# Production logging (default)
RUST_LOG=info ./helium-server

Developer Guide

Project Structure

server/
├── Cargo.toml              # Dependencies and metadata
├── src/
│   ├── main.rs             # Entry point and startup logic
│   ├── worker/             # Worker mode implementations
│   │   ├── mod.rs          # Worker configuration and dispatch
│   │   ├── grpc.rs         # gRPC server implementation
│   │   ├── consumer.rs     # Background message consumer
│   │   ├── mailer.rs       # Email service worker
│   │   ├── cron_executor.rs # Scheduled task executor
│   │   ├── subscribe_api.rs # Subscription REST API
│   │   └── webhook_api.rs  # Webhook REST API
│   └── hooks/              # Extension points (currently unused)
│       └── mod.rs

Building from Source

# Development build
cd server
cargo build

# Release build (optimized)
cargo build --release

# Run with specific worker mode
WORK_MODE=grpc cargo run

Adding New Worker Modes

Create worker implementation:

// src/worker/new_worker.rs
pub struct NewWorker {
    // worker fields
}

impl NewWorker {
    pub async fn initialize(args: YourArgs) -> anyhow::Result<Self> {
        // initialization logic
    }

    pub async fn run(&self) -> anyhow::Result<()> {
        // worker main loop
    }
}

Add to worker configuration:

// src/worker/mod.rs
pub enum WorkerArgs {
    // existing variants...
    NewWorker(YourArgs),
}

impl WorkerArgs {
    pub fn load_from_env() -> anyhow::Result<Self> {
        match work_mode.as_str() {
            // existing modes...
            "new_worker" => {
                // parse environment variables
                Ok(WorkerArgs::NewWorker(args))
            }
        }
    }

    pub async fn execute_worker(self) -> anyhow::Result<()> {
        match self {
            // existing modes...
            WorkerArgs::NewWorker(args) => {
                let worker = NewWorker::initialize(args).await?;
                worker.run().await
            }
        }
    }
}

gRPC Service Development

The gRPC worker automatically integrates all modules. To add new services:

Implement your service in the appropriate module (e.g., modules/your_module/)
Add to gRPC worker:

// src/worker/grpc.rs
impl GrpcWorker {
    pub async fn initialize(args: GrpcWorkModeArgs) -> Result<Self, anyhow::Error> {
        // ... existing initialization ...

        let your_service = YourService::new(database_processor.clone());

        Ok(Self {
            // ... existing fields ...
            your_service,
        })
    }

    pub fn server_ready(self) -> Router<...> {
        tonic::transport::server::Server::builder()
            // ... existing services ...
            .add_service(YourServiceServer::new(self.your_service))
    }
}

Database Migrations

Database schema is managed through SQLx migrations in the migrations/ directory. When adding new features:

Create migration files:

# Create new migration
sqlx migrate add your_feature_name

# This creates:
# migrations/TIMESTAMP_your_feature_name.up.sql
# migrations/TIMESTAMP_your_feature_name.down.sql

Run migrations:

# Apply migrations
sqlx migrate run --database-url $DATABASE_URL

# Revert last migration
sqlx migrate revert --database-url $DATABASE_URL

Testing

# Run all tests
cargo test

# Run specific module tests
cargo test --package helium-server

# Integration tests with database
DATABASE_URL=postgres://test_db cargo test

Performance Considerations

Memory Usage: Each worker typically uses 40-200MB RAM
CPU Efficiency: Single-core performance optimized, can handle 1000+ RPS
Connection Pooling: Database connections are shared across services
Async Processing: All I/O operations are non-blocking

Troubleshooting

Common Issues

Service won’t start:

# Check environment variables
env | grep -E "(DATABASE_URL|MQ_URL|REDIS_URL|WORK_MODE)"

# Verify database migrations are applied
sqlx migrate info --database-url $DATABASE_URL

Database migration issues:

# Check migration status
sqlx migrate info --database-url $DATABASE_URL

# Force apply migrations (if stuck)
sqlx migrate run --database-url $DATABASE_URL

# Revert last migration if needed
sqlx migrate revert --database-url $DATABASE_URL

# Reset database (CAUTION: destroys all data)
sqlx database reset --database-url $DATABASE_URL

Performance issues:

# Enable request tracing
RUST_LOG=helium_server=trace ./helium-server

# Profile with flamegraph
cargo flamegraph --bin helium-server

Logs and Debugging

# Debug logging
RUST_LOG=debug ./helium-server

# Trace specific modules
RUST_LOG=helium_server::worker::grpc=trace,info ./helium-server

Configuration Validation

Ensure all required environment variables are properly set:

# Validate configuration script
#!/bin/bash
set -e

echo "Validating Helium server configuration..."

# Check required variables
: "${WORK_MODE:?WORK_MODE not set}"
: "${DATABASE_URL:?DATABASE_URL not set}"
: "${MQ_URL:?MQ_URL not set}"
: "${REDIS_URL:?REDIS_URL not set}"

# Validate work mode
case "$WORK_MODE" in
  grpc|subscribe_api|webhook_api|consumer|mailer|cron_executor)
    echo "✓ Valid WORK_MODE: $WORK_MODE"
    ;;
  *)
    echo "✗ Invalid WORK_MODE: $WORK_MODE"
    exit 1
    ;;
esac

# Check if migrations are applied
if command -v sqlx >/dev/null 2>&1; then
  if sqlx migrate info --database-url "$DATABASE_URL" | grep -q "pending"; then
    echo "⚠ Warning: Pending database migrations found"
    echo "Run: sqlx migrate run --database-url $DATABASE_URL"
  else
    echo "✓ Database migrations are up to date"
  fi
else
  echo "⚠ Warning: sqlx CLI not found - cannot verify migrations"
  echo "Install with: cargo install sqlx-cli --no-default-features --features postgres"
fi

echo "Configuration validation complete!"

External Dependencies

The Helium system requires several external services to function properly. The Helium application itself runs in Docker containers, but the core infrastructure dependencies (PostgreSQL, Redis, RabbitMQ) should be provisioned as external managed services for production deployments.

While some dependencies are core infrastructure requirements, others are module-specific and may be optional depending on your deployment configuration.

Core Infrastructure Dependencies

These dependencies are required for all Helium deployments:

1. PostgreSQL Database

Purpose: Primary data store for all application data Version: PostgreSQL 12+ recommended Configuration:

Environment variable: DATABASE_URL
Format: postgres://user:password@host:port/database
Example: postgres://helium:password@localhost:5432/helium_db

Database Schema:

⚠️ CRITICAL: SQLx migrations must be run before starting the application
All database schema changes are managed through SQLx migrations in the /migrations directory
Use sqlx migrate run --database-url $DATABASE_URL to apply migrations

External Service Requirements:

NOT containerized - PostgreSQL should run as an external managed service
Recommended: Use cloud-managed PostgreSQL (AWS RDS, Google Cloud SQL, Azure Database, etc.)
Alternative: Dedicated PostgreSQL server with proper backup and high availability setup

2. Redis

Purpose: Caching, session storage, and configuration store Version: Redis 6+ recommended Configuration:

Environment variable: REDIS_URL
Format: redis://host:port or redis://user:password@host:port
Example: redis://localhost:6379

Usage:

Session management and authentication tokens
Configuration caching across modules
Temporary data storage (OAuth challenges, etc.)

External Service Requirements:

NOT containerized - Redis should run as an external managed service
Recommended: Use cloud-managed Redis (AWS ElastiCache, Google Memorystore, Azure Cache, etc.)
Alternative: Dedicated Redis server with persistence and clustering for production

3. RabbitMQ

Purpose: Message queue for asynchronous processing between modules Version: RabbitMQ 3.8+ recommended Configuration:

Environment variable: MQ_URL
Format: amqp://user:password@host:port/
Example: amqp://helium:password@localhost:5672/

Usage:

Inter-module communication
Background job processing
Event-driven architecture support

External Service Requirements:

NOT containerized - RabbitMQ should run as an external managed service
Recommended: Use cloud-managed message queues (AWS MQ, Google Cloud Pub/Sub, Azure Service Bus)
Alternative: Dedicated RabbitMQ cluster with proper clustering and high availability

Module-Specific Dependencies

These dependencies are required only when using specific modules:

Auth Module - OAuth Providers (Optional)

Purpose: Social authentication (Google, Microsoft, GitHub, Discord) Required: Only if OAuth authentication is enabled Configuration: Stored in database/Redis configuration

Supported Providers:

Google OAuth 2.0
Microsoft Azure AD
GitHub OAuth
Discord OAuth

Setup Requirements:

Create OAuth applications with each provider
Configure redirect URIs to your Helium deployment
Store client ID and secret in the system configuration
Configure OAuth provider settings via the management interface

Configuration Structure:

{
  "auth": {
    "oauth_providers": {
      "providers": [
        {
          "name": "google",
          "client_id": "your-client-id",
          "client_secret": "your-client-secret",
          "redirect_uri": "https://your-domain.com/auth/oauth/callback"
        }
      ],
      "challenge_expiration": "5m"
    }
  }
}

Mailer Module - SMTP Server (Required for Email)

Purpose: Email delivery for user notifications, verification, etc. Required: When email functionality is needed Configuration: Stored in database/Redis configuration

SMTP Configuration:

{
  "mailer": {
    "host": "smtp.gmail.com",
    "port": 587,
    "username": "your-email@gmail.com",
    "password": "your-app-password",
    "sender": "noreply@your-domain.com",
    "starttls": true
  }
}

Supported SMTP Features:

STARTTLS encryption
Plain authentication
Custom sender addresses
HTML email templates

Common SMTP Providers:

Gmail: smtp.gmail.com:587 (requires app passwords)
Outlook/Hotmail: smtp-mail.outlook.com:587
SendGrid: smtp.sendgrid.net:587
Mailgun: smtp.mailgun.org:587
Amazon SES: email-smtp.region.amazonaws.com:587

Shop Module - Epay Payment Provider (Required for Payments)

Purpose: Payment processing for e-commerce functionality Required: When payment processing is needed Configuration: Stored in database as epay provider credentials

Epay Provider Setup:

Register with an Epay-compatible payment provider
Obtain merchant credentials (PID, Key, Merchant URL)
Configure webhook endpoints for payment notifications
Add provider credentials via the management interface

Supported Payment Methods:

Alipay (alipay)
WeChat Pay (wxpay)
USDT cryptocurrency (usdt)

Configuration Requirements:

{
  "shop": {
    "epay_notify_url": "https://your-domain.com/api/shop/epay/callback",
    "epay_return_url": "https://your-domain.com/payment/success",
    "max_unpaid_orders": 5,
    "auto_cancel_after": "30m"
  }
}

Epay Provider Database Entry:

INSERT INTO epay_provider_credentials (
  display_name,
  enabled_channels,
  key,
  pid,
  merchant_url
) VALUES (
  'My Payment Provider',
  ['alipay', 'wxpay'],
  'your-merchant-key',
  1234,
  'https://pay.provider.com/submit.php'
);

Development Dependencies

These are required for building and developing the project:

Protocol Buffers Compiler

Purpose: Compiling .proto files for gRPC services Installation:

Ubuntu/Debian: apt-get install protobuf-compiler
macOS: brew install protobuf
Already included in Docker build process

SQLx CLI

Purpose: Database migration management Installation: cargo install sqlx-cli --no-default-features --features postgres Usage:

Apply migrations: sqlx migrate run
Create new migration: sqlx migrate add <name>

Docker/Kubernetes Deployment Considerations

What Should Be Containerized

✅ Containerize:

Helium server application (helium-server)
Application-specific components and workers

❌ Do NOT Containerize:

PostgreSQL - Use external managed database services
Redis - Use external managed cache services
RabbitMQ - Use external managed message queue services

Infrastructure Handled by Platform

When deploying with Docker and Kubernetes, these infrastructure concerns are handled by the orchestration platform:

Load Balancers: Kubernetes ingress controllers handle load balancing
TLS Certificates: cert-manager or similar tools handle SSL/TLS
Service Discovery: Kubernetes DNS handles service discovery
Health Checks: Kubernetes probes handle application health monitoring
Logging: Container runtime and logging drivers handle log aggregation

Recommended Managed Services by Cloud Provider

AWS:

PostgreSQL: Amazon RDS for PostgreSQL
Redis: Amazon ElastiCache for Redis
RabbitMQ: Amazon MQ for RabbitMQ

Google Cloud:

PostgreSQL: Cloud SQL for PostgreSQL
Redis: Memorystore for Redis
RabbitMQ: Cloud Pub/Sub (alternative) or third-party RabbitMQ

Azure:

PostgreSQL: Azure Database for PostgreSQL
Redis: Azure Cache for Redis
RabbitMQ: Azure Service Bus (alternative) or third-party RabbitMQ

Environment Variables for Containers

apiVersion: apps/v1
kind: Deployment
metadata:
  name: helium-server
spec:
  template:
    spec:
      containers:
        - name: helium-server
          image: helium-server:latest
          env:
            - name: WORK_MODE
              value: "grpc"
            - name: DATABASE_URL
              valueFrom:
                secretKeyRef:
                  name: helium-secrets
                  key: database-url
            - name: REDIS_URL
              valueFrom:
                secretKeyRef:
                  name: helium-secrets
                  key: redis-url
            - name: MQ_URL
              valueFrom:
                secretKeyRef:
                  name: helium-secrets
                  key: rabbitmq-url

Security Considerations

Credentials Management

Never store credentials in plain text
Use Kubernetes secrets or similar secure storage
Rotate credentials regularly
Use dedicated service accounts with minimal permissions

Network Security

Database: Restrict access to application subnets only
Redis: Enable authentication and restrict network access
RabbitMQ: Use strong passwords and enable TLS
SMTP: Use app passwords or OAuth tokens when available

OAuth Security

Use HTTPS for all OAuth redirect URIs
Validate redirect URI domains strictly
Use state parameter for CSRF protection (handled automatically)

Troubleshooting

Database Connection Issues

# Test database connectivity
psql $DATABASE_URL -c "SELECT version();"

# Check migration status
sqlx migrate info --database-url $DATABASE_URL

Redis Connection Issues

# Test Redis connectivity
redis-cli -u $REDIS_URL ping

# Check Redis memory usage
redis-cli -u $REDIS_URL info memory

RabbitMQ Connection Issues

# Check queue status
rabbitmqctl list_queues

# Check connection status
rabbitmqctl list_connections

SMTP Testing

The mailer module provides test endpoints and logging to help diagnose SMTP issues. Check application logs for detailed SMTP connection and authentication errors.

Epay Integration Issues

Verify webhook URLs are accessible from the internet
Check payment provider’s callback logs
Ensure merchant credentials are correctly configured
Validate signature verification in callback processing

Optional Observability Stack

OpenTelemetry & Grafana Stack (Optional)

Purpose: Comprehensive observability with distributed tracing, metrics, and log aggregation Required: No - completely optional enhancement Configuration: OTEL_COLLECTOR environment variable

Components:

OpenTelemetry Collector: Telemetry data collection and routing
Grafana Tempo: Distributed tracing backend
Prometheus: Metrics storage and querying
Grafana Loki: Log aggregation
Grafana: Unified visualization dashboard

When to Use:

Production deployments requiring detailed performance analysis
Multi-instance deployments needing distributed tracing
Teams requiring centralized observability dashboards
Troubleshooting complex performance issues

Deployment:

NOT containerized with application - Deploy as separate Kubernetes workloads or use Grafana Cloud
Recommended: Deploy Grafana stack in dedicated observability namespace
Alternative: Use managed services (Grafana Cloud, Datadog, New Relic)

Note: Helium automatically falls back to basic structured logging if OpenTelemetry is not configured. See the comprehensive Observability with OpenTelemetry guide for full setup instructions.

Summary

Dependency	Required	Purpose	Configuration	Deployment
PostgreSQL	Yes	Primary database	`DATABASE_URL`	External managed service
Redis	Yes	Caching/sessions	`REDIS_URL`	External managed service
RabbitMQ	Yes	Message queuing	`MQ_URL`	External managed service
SMTP Server	Conditional	Email delivery	Database config	External service
OAuth Providers	Optional	Social auth	Database config	External providers
Epay Provider	Conditional	Payment processing	Database config	External service
Observability	Optional	Tracing & metrics	`OTEL_COLLECTOR`	External stack/cloud

Next Steps: After setting up these dependencies, proceed to the Helium Server Deployment Guide for detailed deployment instructions.

Observability with OpenTelemetry

Helium server includes optional OpenTelemetry (OTel) integration for comprehensive observability. This integration is completely optional — the server will work perfectly fine without it using basic structured logging.

What is OpenTelemetry?

OpenTelemetry provides distributed tracing, metrics collection, and contextual logging for production systems. Use it when:

Running multiple worker instances requiring distributed tracing
Need detailed performance analysis and troubleshooting
Want centralized observability dashboards

Skip it for simple deployments, development environments, or when basic logging is sufficient.

Configuration

Enable OpenTelemetry by setting the OTEL_COLLECTOR environment variable:

export OTEL_COLLECTOR="http://otel-collector:4317"
./helium-server

If not set or initialization fails, the server automatically falls back to basic logging.

Service Names

Each worker mode reports with a distinct service name:

Worker Mode	Service Name
`grpc`	`Helium.grpc`
`subscribe_api`	`Helium.subscribe-api`
`webhook_api`	`Helium.webhook-api`
`consumer`	`Helium.consumer`
`mailer`	`Helium.mailer`
`cron_executor`	`Helium.cron-executor`

Recommended Stack: Grafana Observability

For production deployments, we recommend the Grafana observability stack — an open-source, Kubernetes-native solution with unified dashboards for traces, metrics, and logs.

Components

OpenTelemetry Collector: Receives and routes telemetry
Grafana Tempo: Distributed tracing storage
Prometheus: Metrics collection
Grafana Loki: Log aggregation
Grafana: Unified visualization

Deployment

Deploy the Grafana stack alongside your Kubernetes cluster:

1. Add Helm Repositories

helm repo add grafana https://grafana.github.io/helm-charts
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update

2. Create Namespace

kubectl create namespace observability

3. Deploy OpenTelemetry Collector

Create otel-collector-values.yaml:

mode: deployment

config:
  receivers:
    otlp:
      protocols:
        grpc:
          endpoint: 0.0.0.0:4317
        http:
          endpoint: 0.0.0.0:4318

  processors:
    batch:
      timeout: 10s
      send_batch_size: 1024

  exporters:
    # Traces to Tempo
    otlp/tempo:
      endpoint: tempo.observability.svc.cluster.local:4317
      tls:
        insecure: true

    # Metrics to Prometheus
    prometheus:
      endpoint: 0.0.0.0:8889
      namespace: helium

    # Logs to Loki
    loki:
      endpoint: http://loki.observability.svc.cluster.local:3100/loki/api/v1/push

  service:
    pipelines:
      traces:
        receivers: [otlp]
        processors: [batch]
        exporters: [otlp/tempo]

      metrics:
        receivers: [otlp]
        processors: [batch]
        exporters: [prometheus]

      logs:
        receivers: [otlp]
        processors: [batch]
        exporters: [loki]

ports:
  otlp-grpc:
    enabled: true
    containerPort: 4317
    servicePort: 4317
    protocol: TCP
  otlp-http:
    enabled: true
    containerPort: 4318
    servicePort: 4318
    protocol: TCP
  metrics:
    enabled: true
    containerPort: 8889
    servicePort: 8889
    protocol: TCP

helm install otel-collector grafana/opentelemetry-collector \
  --namespace observability \
  --values otel-collector-values.yaml

4. Deploy Tempo, Loki, and Prometheus

# Tempo for traces
helm install tempo grafana/tempo \
  --namespace observability \
  --set tempo.receivers.otlp.protocols.grpc.endpoint=0.0.0.0:4317

# Loki for logs
helm install loki grafana/loki-stack \
  --namespace observability \
  --set loki.enabled=true \
  --set promtail.enabled=false

5. Deploy Prometheus

helm install prometheus prometheus-community/kube-prometheus-stack \
  --namespace observability \
  --set grafana.enabled=false

6. Deploy Grafana

helm install grafana grafana/grafana \
  --namespace observability \
  --set adminPassword=changeme

Configure data sources in Grafana to connect Tempo, Prometheus, and Loki.

Troubleshooting

Server logs show “Failed to initialize OpenTelemetry”

Check that the OTel Collector is reachable at the configured endpoint. The server will automatically fall back to basic logging.

Missing traces in Grafana

Verify the data pipeline: Helium → OTel Collector → Tempo. Check logs at each stage.

Performance impact

OpenTelemetry adds minimal overhead: < 2% CPU, ~10-20MB memory, < 1ms latency per request.

Disabling OpenTelemetry

Simply unset the OTEL_COLLECTOR variable — the server automatically falls back to basic logging.

Summary

OpenTelemetry in Helium is completely optional:

Set OTEL_COLLECTOR to enable, leave unset to use basic logging
Automatic fallback if initialization fails
Recommended for production with multiple instances
Grafana stack provides open-source, Kubernetes-native observability

For detailed Helm deployment configurations, refer to the official Grafana Helm charts documentation.

Health Checks for Kubernetes

Helium server provides HTTP health check endpoints designed for Kubernetes liveness and readiness probes. These endpoints run on a separate internal port (default: 9090) and are enabled for all worker modes.

Overview

Health checks help Kubernetes determine:

Liveness: Is the container alive and should it be restarted if it becomes unresponsive?
Readiness: Is the container ready to handle requests?

Helium implements both probe types on a dedicated HTTP server that runs alongside each worker mode.

Endpoints

Liveness Probe: `/healthz`

Returns 200 OK with a JSON response if the server is running:

{
  "status": "ok"
}

This endpoint always returns success if the health check server is responding. Kubernetes uses this to determine if the container should be restarted.

Readiness Probe: `/readyz`

Checks connectivity to all dependencies before returning status:

Success Response (200 OK):

{
  "status": "ok",
  "database": "ok",
  "redis": "ok",
  "rabbitmq": "ok"
}

Failure Response (503 Service Unavailable):

{
  "status": "error",
  "database": "ok",
  "redis": "error",
  "rabbitmq": "ok",
  "error": "Redis error: Connection refused"
}

The readiness probe checks:

PostgreSQL: Executes a simple query (SELECT 1)
Redis: Sends a PING command
RabbitMQ: Validates connection pool status

All worker modes check the same three dependencies.

Configuration

Health Check Port

Set the HEALTH_CHECK_PORT environment variable to customize the port (default: 9090):

export HEALTH_CHECK_PORT=9090

This port should be:

Internal only: Not exposed to external traffic
Accessible by Kubernetes: For probe requests
Different from main service ports: To avoid conflicts

Worker Modes

Health checks are available in all worker modes:

Worker Mode	Main Port	Health Check Port	Dependencies Checked
`grpc`	50051	9090	Database, Redis, RabbitMQ
`subscribe_api`	8080	9090	Database, Redis, RabbitMQ
`webhook_api`	8081	9090	Database, Redis, RabbitMQ
`consumer`	N/A	9090	Database, Redis, RabbitMQ
`mailer`	N/A	9090	Database, Redis, RabbitMQ
`cron_executor`	N/A	9090	Database, Redis, RabbitMQ

Kubernetes Deployment

Example Pod Configuration

Here’s how to configure health checks in your Kubernetes deployment:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: helium-grpc
spec:
  replicas: 3
  selector:
    matchLabels:
      app: helium-grpc
  template:
    metadata:
      labels:
        app: helium-grpc
    spec:
      containers:
      - name: helium-grpc
        image: helium-server:latest
        env:
        - name: WORK_MODE
          value: "grpc"
        - name: LISTEN_ADDR
          value: "0.0.0.0:50051"
        - name: HEALTH_CHECK_PORT
          value: "9090"
        - name: DATABASE_URL
          valueFrom:
            secretKeyRef:
              name: helium-secrets
              key: database-url
        - name: REDIS_URL
          valueFrom:
            secretKeyRef:
              name: helium-secrets
              key: redis-url
        - name: MQ_URL
          valueFrom:
            secretKeyRef:
              name: helium-secrets
              key: mq-url
        ports:
        - name: grpc
          containerPort: 50051
          protocol: TCP
        - name: health
          containerPort: 9090
          protocol: TCP
        livenessProbe:
          httpGet:
            path: /healthz
            port: health
          initialDelaySeconds: 10
          periodSeconds: 10
          timeoutSeconds: 5
          failureThreshold: 3
        readinessProbe:
          httpGet:
            path: /readyz
            port: health
          initialDelaySeconds: 5
          periodSeconds: 5
          timeoutSeconds: 5
          failureThreshold: 3

Probe Configuration Guidelines

Liveness Probe:

initialDelaySeconds: 10-30 seconds (allow time for startup)
periodSeconds: 10-30 seconds (check periodically)
timeoutSeconds: 5 seconds
failureThreshold: 3 (restart after 3 consecutive failures)

Readiness Probe:

initialDelaySeconds: 5-10 seconds (faster than liveness)
periodSeconds: 5-10 seconds (check more frequently)
timeoutSeconds: 5 seconds
failureThreshold: 3 (mark unready after 3 failures)

Service Configuration

For API worker modes (grpc, subscribe_api, webhook_api), configure a Service:

apiVersion: v1
kind: Service
metadata:
  name: helium-grpc
spec:
  type: ClusterIP
  ports:
  - name: grpc
    port: 50051
    targetPort: grpc
    protocol: TCP
  selector:
    app: helium-grpc

Note: The health check port (9090) is not exposed in the Service. It’s only for Kubernetes probes.

Worker Mode Behavior

API Modes (grpc, subscribe_api, webhook_api)

For API modes, the health check server runs alongside the main API server:

When the main server exits, the health check server is immediately terminated
Process exits when either server fails
Ensures no “zombie” containers serving health checks without handling requests

Background Worker Modes (consumer, mailer, cron_executor)

For background workers, the health check server runs continuously:

Liveness probe confirms the worker process is alive
Readiness probe ensures dependencies are accessible
Worker loops indefinitely alongside health check server

Troubleshooting

Health Check Server Not Starting

Symptom: Probes fail immediately with connection errors

Solutions:

Check logs for health check server errors
Verify HEALTH_CHECK_PORT is not already in use
Ensure the port is accessible within the pod

Readiness Probe Failing

Symptom: Pod remains in “Not Ready” state

Solutions:

Check which dependency is failing in the /readyz response
Verify connection strings (DATABASE_URL, REDIS_URL, MQ_URL)
Ensure network policies allow pod access to dependencies
Check if dependencies are healthy

Example debugging:

# Forward health check port to local machine
kubectl port-forward pod/helium-grpc-xyz 9090:9090

# Check readiness endpoint
curl http://localhost:9090/readyz

Liveness Probe Causing Restart Loop

Symptom: Pod repeatedly restarts with liveness probe failures

Solutions:

Increase initialDelaySeconds (worker may need more startup time)
Increase failureThreshold (allow more failures before restart)
Check if worker is deadlocked or stuck (examine logs before restart)

Worker Exits But Pod Stays Running

Symptom: Container appears healthy but doesn’t process requests

This should not happen with the current implementation:

API workers: Health check is aborted when main server exits
Background workers: Return from execute_worker() causes process exit

If this occurs, file a bug report.

Security Considerations

Port Exposure

The health check port (9090) should never be exposed externally:

Don’t create Ingress rules for health check endpoints
Don’t expose the health check port in the Service definition
Use network policies to restrict access to Kubernetes control plane only

Sensitive Information

Health check responses contain minimal information:

No version numbers
No internal IPs or hostnames
No authentication tokens
Only dependency status (ok/error)

Error messages may contain connection details. Ensure logs are secured appropriately.

Best Practices

Use separate ports: Never combine health checks with main service endpoints
Set appropriate timeouts: Balance between quick detection and false positives
Monitor probe metrics: Track probe success rates in your observability stack
Test locally: Use port-forwarding to verify health checks before deployment
Align with dependencies: If using a sidecar proxy (Istio, Linkerd), configure startup probes

Summary

Helium’s health check endpoints provide robust Kubernetes integration:

Liveness probe (/healthz): Detects unresponsive containers
Readiness probe (/readyz): Ensures dependencies are healthy
Separate port (default 9090): Isolated from main services
All worker modes: Consistent behavior across deployment types
Process lifecycle: Ensures clean exits, no zombie containers

Configure these probes in your Kubernetes deployments to enable automatic recovery and load balancing.

Docker-based Deployment

The Helium system is designed with a multi-worker architecture that can be deployed using containers. Each worker type serves a specific purpose and has different scaling requirements. This deployment approach provides:

Scalability: Independent scaling of different worker types based on load
Reliability: Fault isolation between different services
Flexibility: Easy deployment across different environments
Maintainability: Simplified updates and rollbacks

Prerequisites

Before proceeding with this guide, ensure you have:

External dependencies configured (see External Dependencies)
Docker or container runtime installed
Kubernetes cluster (for Kubernetes deployment)
Basic understanding of containerization concepts

Container Architecture

Worker Types and Scaling Patterns

The Helium server supports six distinct worker modes, each with specific scaling characteristics:

Worker Mode	Port	Scaling	Description
`grpc`	50051	✅ Horizontal	Main gRPC API server - can be load balanced
`subscribe_api`	8080	✅ Horizontal	RESTful subscription API - can be load balanced
`webhook_api`	8081	✅ Horizontal	Webhook handler for payments - can be load balanced
`consumer`	-	✅ Horizontal	Background message consumer - multiple instances supported
`mailer`	-	⚠️ Single preferred	Email service - not recommended >1 instance
`cron_executor`	-	🚫 Single only	Scheduled tasks - MUST be exactly 1 instance

Scaling Constraints

⚠️ Critical Scaling Limitations

mailer Worker:

Recommendation: Deploy as single instance only
Reason: Relies on SMTP server connections and may cause email delivery issues with multiple instances
Impact: Multiple mailer instances can lead to duplicate emails or SMTP rate limiting

cron_executor Worker:

Requirement: MUST have exactly one instance
Reason: Scans the database to check for scheduled tasks in the queue
Impact: Multiple instances will cause duplicate task execution and potential data corruption

✅ Scalable Workers

API Workers (grpc, subscribe_api, webhook_api):

Can be horizontally scaled based on traffic demands
Support standard load balancing techniques
Share state through external Redis and PostgreSQL

consumer Worker:

Can run multiple instances for processing message queues
Automatically distributes work through RabbitMQ

Docker Image

Building the Docker Image

The project includes a multi-stage Dockerfile optimized for production:

# Build the Docker image
docker build -t helium-server:latest .

# Tag for registry
docker tag helium-server:latest your-registry/helium-server:v1.0.0

# Push to registry
docker push your-registry/helium-server:v1.0.0

Image Characteristics

Base Image: gcr.io/distroless/cc for minimal attack surface
Size: ~50MB final image
Architecture: Multi-arch support (amd64, arm64)
Security: Non-root user, minimal dependencies

Environment Variables

Configure containers using these environment variables:

# Required - Worker mode selection
WORK_MODE=grpc  # grpc, subscribe_api, webhook_api, consumer, mailer, cron_executor

# Required - Database connections
DATABASE_URL=postgres://user:password@postgres-host:5432/helium_db
REDIS_URL=redis://redis-host:6379
MQ_URL=amqp://user:password@rabbitmq-host:5672/

# Optional - Server configuration
LISTEN_ADDR=0.0.0.0:50051  # For API workers
SCAN_INTERVAL=60           # For cron_executor only
RUST_LOG=info             # Logging level

Docker Compose Deployment

For development or simple production setups:

version: "3.8"

services:
  # Main gRPC API (scalable)
  helium-grpc:
    image: helium-server:latest
    ports:
      - "50051:50051"
    environment:
      WORK_MODE: grpc
      DATABASE_URL: postgres://helium:password@postgres:5432/helium_db
      REDIS_URL: redis://redis:6379
      MQ_URL: amqp://helium:password@rabbitmq:5672/
      LISTEN_ADDR: 0.0.0.0:50051
    depends_on:
      - postgres
      - redis
      - rabbitmq
    restart: unless-stopped
    deploy:
      replicas: 2 # Can be scaled horizontally

  # Subscription API (scalable)
  helium-subscribe-api:
    image: helium-server:latest
    ports:
      - "8080:8080"
    environment:
      WORK_MODE: subscribe_api
      DATABASE_URL: postgres://helium:password@postgres:5432/helium_db
      REDIS_URL: redis://redis:6379
      MQ_URL: amqp://helium:password@rabbitmq:5672/
      LISTEN_ADDR: 0.0.0.0:8080
    depends_on:
      - postgres
      - redis
      - rabbitmq
    restart: unless-stopped
    deploy:
      replicas: 2 # Can be scaled horizontally

  # Webhook API (scalable)
  helium-webhook-api:
    image: helium-server:latest
    ports:
      - "8081:8081"
    environment:
      WORK_MODE: webhook_api
      DATABASE_URL: postgres://helium:password@postgres:5432/helium_db
      REDIS_URL: redis://redis:6379
      MQ_URL: amqp://helium:password@rabbitmq:5672/
      LISTEN_ADDR: 0.0.0.0:8081
    depends_on:
      - postgres
      - redis
      - rabbitmq
    restart: unless-stopped
    deploy:
      replicas: 2 # Can be scaled horizontally

  # Background consumer (scalable)
  helium-consumer:
    image: helium-server:latest
    environment:
      WORK_MODE: consumer
      DATABASE_URL: postgres://helium:password@postgres:5432/helium_db
      REDIS_URL: redis://redis:6379
      MQ_URL: amqp://helium:password@rabbitmq:5672/
    depends_on:
      - postgres
      - redis
      - rabbitmq
    restart: unless-stopped
    deploy:
      replicas: 3 # Can run multiple instances

  # Mailer service (single instance recommended)
  helium-mailer:
    image: helium-server:latest
    environment:
      WORK_MODE: mailer
      DATABASE_URL: postgres://helium:password@postgres:5432/helium_db
      REDIS_URL: redis://redis:6379
      MQ_URL: amqp://helium:password@rabbitmq:5672/
    depends_on:
      - postgres
      - redis
      - rabbitmq
    restart: unless-stopped
    deploy:
      replicas: 1 # SINGLE INSTANCE ONLY

  # Cron executor (must be single instance)
  helium-cron:
    image: helium-server:latest
    environment:
      WORK_MODE: cron_executor
      DATABASE_URL: postgres://helium:password@postgres:5432/helium_db
      REDIS_URL: redis://redis:6379
      MQ_URL: amqp://helium:password@rabbitmq:5672/
      SCAN_INTERVAL: 60
    depends_on:
      - postgres
      - redis
      - rabbitmq
    restart: unless-stopped
    deploy:
      replicas: 1 # MUST BE EXACTLY 1

  # External dependencies (for development only)
  postgres:
    image: postgres:15
    environment:
      POSTGRES_USER: helium
      POSTGRES_PASSWORD: password
      POSTGRES_DB: helium_db
    volumes:
      - postgres_data:/var/lib/postgresql/data
    ports:
      - "5432:5432"

  redis:
    image: redis:7
    ports:
      - "6379:6379"
    volumes:
      - redis_data:/data

  rabbitmq:
    image: rabbitmq:3-management
    environment:
      RABBITMQ_DEFAULT_USER: helium
      RABBITMQ_DEFAULT_PASS: password
    ports:
      - "5672:5672"
      - "15672:15672"
    volumes:
      - rabbitmq_data:/var/lib/rabbitmq

volumes:
  postgres_data:
  redis_data:
  rabbitmq_data:

Kubernetes Deployment

For production Kubernetes deployments:

Namespace and ConfigMap

apiVersion: v1
kind: Namespace
metadata:
  name: helium-system
---
apiVersion: v1
kind: ConfigMap
metadata:
  name: helium-config
  namespace: helium-system
data:
  RUST_LOG: "info"
  SCAN_INTERVAL: "60"

Secrets

apiVersion: v1
kind: Secret
metadata:
  name: helium-secrets
  namespace: helium-system
type: Opaque
stringData:
  database-url: "postgres://helium:password@postgres-service:5432/helium_db"
  redis-url: "redis://redis-service:6379"
  rabbitmq-url: "amqp://helium:password@rabbitmq-service:5672/"

gRPC API Deployment (Scalable)

apiVersion: apps/v1
kind: Deployment
metadata:
  name: helium-grpc
  namespace: helium-system
spec:
  replicas: 3 # Can be scaled horizontally
  selector:
    matchLabels:
      app: helium-grpc
  template:
    metadata:
      labels:
        app: helium-grpc
    spec:
      containers:
        - name: helium-server
          image: your-registry/helium-server:v1.0.0
          ports:
            - containerPort: 50051
          env:
            - name: WORK_MODE
              value: "grpc"
            - name: LISTEN_ADDR
              value: "0.0.0.0:50051"
            - name: DATABASE_URL
              valueFrom:
                secretKeyRef:
                  name: helium-secrets
                  key: database-url
            - name: REDIS_URL
              valueFrom:
                secretKeyRef:
                  name: helium-secrets
                  key: redis-url
            - name: MQ_URL
              valueFrom:
                secretKeyRef:
                  name: helium-secrets
                  key: rabbitmq-url
          envFrom:
            - configMapRef:
                name: helium-config
          resources:
            requests:
              memory: "128Mi"
              cpu: "100m"
            limits:
              memory: "512Mi"
              cpu: "500m"
          livenessProbe:
            tcpSocket:
              port: 50051
            initialDelaySeconds: 30
            periodSeconds: 10
          readinessProbe:
            tcpSocket:
              port: 50051
            initialDelaySeconds: 5
            periodSeconds: 5
---
apiVersion: v1
kind: Service
metadata:
  name: helium-grpc-service
  namespace: helium-system
spec:
  selector:
    app: helium-grpc
  ports:
    - port: 50051
      targetPort: 50051
  type: ClusterIP

Consumer Deployment (Scalable)

apiVersion: apps/v1
kind: Deployment
metadata:
  name: helium-consumer
  namespace: helium-system
spec:
  replicas: 3 # Can run multiple instances
  selector:
    matchLabels:
      app: helium-consumer
  template:
    metadata:
      labels:
        app: helium-consumer
    spec:
      containers:
        - name: helium-server
          image: your-registry/helium-server:v1.0.0
          env:
            - name: WORK_MODE
              value: "consumer"
            - name: DATABASE_URL
              valueFrom:
                secretKeyRef:
                  name: helium-secrets
                  key: database-url
            - name: REDIS_URL
              valueFrom:
                secretKeyRef:
                  name: helium-secrets
                  key: redis-url
            - name: MQ_URL
              valueFrom:
                secretKeyRef:
                  name: helium-secrets
                  key: rabbitmq-url
          envFrom:
            - configMapRef:
                name: helium-config
          resources:
            requests:
              memory: "256Mi"
              cpu: "200m"
            limits:
              memory: "1Gi"
              cpu: "1000m"
          livenessProbe:
            exec:
              command:
                - /bin/sh
                - -c
                - "ps aux | grep helium-server | grep -v grep"
            initialDelaySeconds: 30
            periodSeconds: 30

Mailer Deployment (Single Instance)

apiVersion: apps/v1
kind: Deployment
metadata:
  name: helium-mailer
  namespace: helium-system
spec:
  replicas: 1 # SINGLE INSTANCE ONLY
  strategy:
    type: Recreate # Prevent multiple instances during updates
  selector:
    matchLabels:
      app: helium-mailer
  template:
    metadata:
      labels:
        app: helium-mailer
    spec:
      containers:
        - name: helium-server
          image: your-registry/helium-server:v1.0.0
          env:
            - name: WORK_MODE
              value: "mailer"
            - name: DATABASE_URL
              valueFrom:
                secretKeyRef:
                  name: helium-secrets
                  key: database-url
            - name: REDIS_URL
              valueFrom:
                secretKeyRef:
                  name: helium-secrets
                  key: redis-url
            - name: MQ_URL
              valueFrom:
                secretKeyRef:
                  name: helium-secrets
                  key: rabbitmq-url
          envFrom:
            - configMapRef:
                name: helium-config
          resources:
            requests:
              memory: "128Mi"
              cpu: "100m"
            limits:
              memory: "512Mi"
              cpu: "500m"

Cron Executor Deployment (Singleton)

apiVersion: apps/v1
kind: Deployment
metadata:
  name: helium-cron
  namespace: helium-system
spec:
  replicas: 1 # MUST BE EXACTLY 1
  strategy:
    type: Recreate # Ensure no overlap during updates
  selector:
    matchLabels:
      app: helium-cron
  template:
    metadata:
      labels:
        app: helium-cron
    spec:
      containers:
        - name: helium-server
          image: your-registry/helium-server:v1.0.0
          env:
            - name: WORK_MODE
              value: "cron_executor"
            - name: DATABASE_URL
              valueFrom:
                secretKeyRef:
                  name: helium-secrets
                  key: database-url
            - name: REDIS_URL
              valueFrom:
                secretKeyRef:
                  name: helium-secrets
                  key: redis-url
            - name: MQ_URL
              valueFrom:
                secretKeyRef:
                  name: helium-secrets
                  key: rabbitmq-url
            - name: SCAN_INTERVAL
              value: "60"
          envFrom:
            - configMapRef:
                name: helium-config
          resources:
            requests:
              memory: "128Mi"
              cpu: "50m"
            limits:
              memory: "256Mi"
              cpu: "200m"
          livenessProbe:
            exec:
              command:
                - /bin/sh
                - -c
                - "ps aux | grep helium-server | grep -v grep"
            initialDelaySeconds: 60
            periodSeconds: 30

Horizontal Pod Autoscaler (HPA)

For scalable workers, configure automatic scaling:

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: helium-grpc-hpa
  namespace: helium-system
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: helium-grpc
  minReplicas: 2
  maxReplicas: 10
  metrics:
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 70
    - type: Resource
      resource:
        name: memory
        target:
          type: Utilization
          averageUtilization: 80

Load Balancer Configuration

Ingress for API Services

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: helium-ingress
  namespace: helium-system
  annotations:
    nginx.ingress.kubernetes.io/backend-protocol: "GRPC"
    cert-manager.io/cluster-issuer: "letsencrypt-prod"
spec:
  tls:
    - hosts:
        - api.your-domain.com
      secretName: helium-tls
  rules:
    - host: api.your-domain.com
      http:
        paths:
          - path: /
            pathType: Prefix
            backend:
              service:
                name: helium-grpc-service
                port:
                  number: 50051

Service Mesh Configuration

For advanced deployments with service mesh (Istio):

apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
  name: helium-grpc-vs
  namespace: helium-system
spec:
  hosts:
    - api.your-domain.com
  gateways:
    - helium-gateway
  http:
    - match:
        - uri:
            prefix: /
      route:
        - destination:
            host: helium-grpc-service
            port:
              number: 50051
          weight: 100
      fault:
        delay:
          percentage:
            value: 0.1
          fixedDelay: 5s

Database Migration

Database migrations must be run before starting any workers:

Migration Job

apiVersion: batch/v1
kind: Job
metadata:
  name: helium-migration
  namespace: helium-system
spec:
  template:
    spec:
      containers:
        - name: migration
          image: your-registry/helium-server:v1.0.0
          command: ["sqlx", "migrate", "run"]
          env:
            - name: DATABASE_URL
              valueFrom:
                secretKeyRef:
                  name: helium-secrets
                  key: database-url
      restartPolicy: Never
  backoffLimit: 3

Init Container for Workers

Add to all worker deployments:

spec:
  template:
    spec:
      initContainers:
        - name: wait-for-migration
          image: postgres:15
          command:
            [
              "sh",
              "-c",
              "until pg_isready -h postgres-service -p 5432; do echo waiting for database; sleep 2; done;",
            ]
          env:
            - name: DATABASE_URL
              valueFrom:
                secretKeyRef:
                  name: helium-secrets
                  key: database-url

Monitoring and Observability

Health Checks

Configure appropriate health checks for each worker type:

# For API workers (gRPC, REST)
livenessProbe:
  tcpSocket:
    port: 50051
  initialDelaySeconds: 30
  periodSeconds: 10

# For background workers (consumer, mailer, cron)
livenessProbe:
  exec:
    command:
    - /bin/sh
    - -c
    - "ps aux | grep helium-server | grep -v grep"
  initialDelaySeconds: 30
  periodSeconds: 30

Logging Configuration

env:
  - name: RUST_LOG
    value: "info,helium_server=debug" # Adjust as needed

Metrics Collection

Use Prometheus for metrics collection:

apiVersion: v1
kind: Service
metadata:
  name: helium-metrics
  namespace: helium-system
  annotations:
    prometheus.io/scrape: "true"
    prometheus.io/port: "8080"
    prometheus.io/path: "/metrics"
spec:
  selector:
    app: helium-grpc
  ports:
    - port: 8080
      name: metrics

Troubleshooting

Common Issues

Pod Crash Loop:

# Check logs
kubectl logs -n helium-system deployment/helium-grpc

# Check events
kubectl get events -n helium-system --sort-by='.metadata.creationTimestamp'

# Verify environment variables
kubectl exec -n helium-system deployment/helium-grpc -- env | grep -E "(DATABASE_URL|REDIS_URL|MQ_URL)"

Multiple Cron Executors:

# Check for multiple cron instances (should show only 1)
kubectl get pods -n helium-system -l app=helium-cron

# Check cron logs for conflicts
kubectl logs -n helium-system -l app=helium-cron --tail=100

Database Connection Issues:

# Test database connectivity
kubectl run -i --tty --rm debug --image=postgres:15 --restart=Never -- \
  psql postgresql://user:password@postgres-service:5432/helium_db -c "SELECT version();"

# Check migration status
kubectl exec -n helium-system deployment/helium-grpc -- \
  sqlx migrate info --database-url $DATABASE_URL

Performance Tuning

Resource Limits:

API workers: 200-500m CPU, 256Mi-1Gi RAM per pod
Consumer workers: 500m-1 CPU, 512Mi-2Gi RAM per pod
Mailer/Cron: 100-200m CPU, 128-512Mi RAM per pod

Scaling Guidelines:

Start with 2-3 replicas for API workers
Scale consumers based on message queue depth
Monitor CPU/memory usage and adjust limits accordingly

External Dependencies

Refer to the External Dependencies Guide for detailed information about:

PostgreSQL setup and configuration
Redis configuration and clustering
RabbitMQ setup and management
SMTP server configuration
OAuth provider setup
Payment provider integration

Configuration Management

Refer to the Configuration Guide for:

Environment variable reference
Configuration file formats
Runtime configuration updates
Security best practices

Next Steps

After successful deployment:

Configure monitoring and alerting
Set up backup procedures for stateful data
Implement CI/CD pipelines for automated deployments
Configure log aggregation and analysis
Plan disaster recovery procedures

For specific configuration details, see the Helium Server Configuration guide.

Configuration Guide

This document provides comprehensive configuration information for operators deploying the Helium project. The system uses a combination of environment variables for server configuration and JSON configurations stored in the database for module-specific settings.

Environment Variables
Module Configurations
Infrastructure Dependencies
Configuration Templates

Environment Variables

The Helium server is configured entirely through environment variables. These control the server behavior and connectivity to external services.

Required Environment Variables

All worker modes require these variables:

# Worker mode selection (REQUIRED)
WORK_MODE="grpc"  # Options: grpc, subscribe_api, webhook_api, consumer, mailer, cron_executor

# Database connection (REQUIRED)
DATABASE_URL="postgres://user:password@localhost:5432/helium_db"

# Redis connection (REQUIRED)
REDIS_URL="redis://localhost:6379"

# RabbitMQ connection (REQUIRED)
MQ_URL="amqp://user:password@localhost:5672/"

Worker Mode Options

Worker Mode	Port	Description	Use Case
`grpc`	50051	gRPC API server	Main API for client applications and admin panels
`subscribe_api`	8080	RESTful subscription API	Public subscription endpoints
`webhook_api`	8081	RESTful webhook handler	Payment provider callbacks, third-party integrations
`consumer`	-	Background message consumer	Processing async tasks from message queue
`mailer`	-	Email service worker	Sending emails and notifications
`cron_executor`	-	Scheduled task executor	Running periodic maintenance tasks

Optional Environment Variables

# Server listen addresses (for API workers)
LISTEN_ADDR="0.0.0.0:50051"  # Default for grpc mode
LISTEN_ADDR="0.0.0.0:8080"   # Default for subscribe_api mode
LISTEN_ADDR="0.0.0.0:8081"   # Default for webhook_api mode

# Cron executor configuration
SCAN_INTERVAL="60"  # Scan interval in seconds (cron_executor mode only)

# Logging configuration
RUST_LOG="info"  # Options: error, warn, info, debug, trace

Module Configurations

Module configurations are stored as JSON in the PostgreSQL database in the application__config table. Each module has its own configuration key and JSON structure.

Note: All duration values are represented as strings containing the number of seconds (e.g., "300" for 5 minutes, "1800" for 30 minutes).

Auth Module (`auth`)

Key: "auth"

The authentication module handles user registration, login, JWT tokens, and OAuth providers.

{
  "email_provider": {
    "register_domain": {
      "enable_white_list": false,
      "white_list": [],
      "enable_black_list": false,
      "black_list": []
    },
    "otp_expire_after": "300",
    "delete_otp_before": "7200",
    "magic_link_expire_after": "1800",
    "magic_link_delete_before": "14400",
    "resend_interval": "30"
  },
  "jwt": {
    "secret": "your-jwt-secret-key-32-characters-long",
    "refresh_token_expiration": "2592000",
    "access_token_expiration": "900",
    "issuer": "https://your-domain.com",
    "access_audience": "helium_cloud",
    "refresh_audience": "helium_cloud_auth"
  },
  "oauth_providers": {
    "providers": [
      {
        "name": "Google",
        "client_id": "your-google-client-id",
        "client_secret": "your-google-client-secret",
        "redirect_uri": "https://your-domain.com/auth/oauth/google/callback"
      },
      {
        "name": "GitHub",
        "client_id": "your-github-client-id",
        "client_secret": "your-github-client-secret",
        "redirect_uri": "https://your-domain.com/auth/oauth/github/callback"
      }
    ],
    "challenge_expiration": "300"
  },
  "default_user_group": 1
}

Configuration Details:

email_provider.register_domain: Controls which email domains are allowed for registration
email_provider.otp_expire_after: How long OTP codes remain valid (in seconds, default: “300” = 5 minutes)
email_provider.resend_interval: Minimum time between resend attempts (in seconds, default: “30” = 30 seconds)
jwt.secret: CRITICAL: Must be a secure random string for production
jwt.*_expiration: Token lifetime settings (in seconds, default: “2592000” = 30 days for refresh, “900” = 15 minutes for access)
oauth_providers.providers: List of OAuth providers with their credentials
default_user_group: Default group ID assigned to new users

Telecom Module (`telecom`)

Key: "telecom"

The telecom module manages VPN nodes, subscription links, and proxy synchronization.

{
  "node_health_check": {
    "offline_timeout": "600"
  },
  "subscribe_link": {
    "endpoints": [
      {
        "url_template": "https://subscribe.your-domain.com/subscribe/{SUBSCRIBE_TOKEN}",
        "endpoint_name": "primary"
      },
      {
        "url_template": "https://backup.your-domain.com/subscribe/{SUBSCRIBE_TOKEN}",
        "endpoint_name": "backup"
      }
    ]
  },
  "uni_proxy_sync": {
    "push_interval": "30",
    "pull_interval": "60"
  },
  "vpn_server_token": "secure-random-token-for-vpn-servers"
}

Configuration Details:

node_health_check.offline_timeout: Time before marking nodes as offline (in seconds, default: “600” = 10 minutes)
subscribe_link.endpoints: List of subscription endpoints for client configuration
uni_proxy_sync.push_interval: How often to push traffic data (in seconds, default: “30” = 30 seconds)
uni_proxy_sync.pull_interval: How often to pull user info (in seconds, default: “60” = 1 minute)
vpn_server_token: CRITICAL: Secure token for VPN server authentication

Shop Module (`shop`)

Key: "shop"

The shop module handles e-commerce functionality, orders, and payment processing.

{
  "max_unpaid_orders": 5,
  "auto_cancel_after": "1800",
  "epay_notify_url": "https://your-domain.com/api/webhook/epay/notify",
  "epay_return_url": "https://your-domain.com/payment/success"
}

Configuration Details:

max_unpaid_orders: Maximum unpaid orders per user (default: 5)
auto_cancel_after: Time before auto-canceling unpaid orders (in seconds, default: “1800” = 30 minutes)
epay_notify_url: REQUIRED: Server-to-server notification endpoint for payment providers
epay_return_url: REQUIRED: User return URL after payment completion

Mailer Module (`mailer`)

Key: "mailer"

The mailer module handles email delivery through SMTP.

{
  "host": "smtp.gmail.com",
  "port": 587,
  "username": "your-smtp-username",
  "password": "your-smtp-password",
  "sender": "noreply@your-domain.com",
  "starttls": true
}

Configuration Details:

host: SMTP server hostname
port: SMTP server port (typically 587 for STARTTLS, 465 for SSL)
username/password: SMTP authentication credentials
sender: Email address used as sender
starttls: Enable STARTTLS encryption (recommended: true)

Admin Management Module (`admin-jwt`)

Key: "admin-jwt"

Controls JWT tokens for administrative access.

{
  "secret": "admin-jwt-secret-key-32-characters-long",
  "token_expiration": "864000",
  "issuer": "https://admin.your-domain.com",
  "audience": "HeliumAdmin"
}

Configuration Details:

secret: CRITICAL: Secure secret for admin JWT signing
token_expiration: Admin token lifetime (in seconds, default: “864000” = 10 days)
issuer: JWT issuer for admin tokens
audience: JWT audience for admin tokens

Market Module (`affiliate`)

Key: "affiliate"

Controls the affiliate marketing system.

{
  "max_invite_code_per_user": 10,
  "default_reward_rate": "0.1",
  "default_trigger_time_per_user": 3
}

Configuration Details:

max_invite_code_per_user: Maximum invite codes per user (default: 10)
default_reward_rate: Default affiliate commission rate (default: 10%)
default_trigger_time_per_user: Required referrals before earning (default: 3)

Infrastructure Dependencies

PostgreSQL Database

Required Version: PostgreSQL 12+

Configuration:

Environment variable: DATABASE_URL
Format: postgres://user:password@host:port/database

Important Notes:

⚠️ CRITICAL: Run migrations before starting: sqlx migrate run --database-url $DATABASE_URL
Use external managed PostgreSQL service for production (AWS RDS, Google Cloud SQL, etc.)
Ensure proper backup and high availability configuration

Redis

Required Version: Redis 6+

Configuration:

Environment variable: REDIS_URL
Format: redis://host:port or redis://user:password@host:port

Usage:

Session storage and authentication tokens
Module configuration caching
Temporary data (OAuth challenges, OTP codes)

RabbitMQ (AMQP)

Configuration:

Environment variable: MQ_URL
Format: amqp://user:password@host:port/

Usage:

Asynchronous task processing
Email sending queue
Inter-module communication

Configuration Templates

Development Environment

# .env file for development
WORK_MODE=grpc
DATABASE_URL=postgres://helium:password@localhost:5432/helium_dev
REDIS_URL=redis://localhost:6379
MQ_URL=amqp://guest:guest@localhost:5672/
LISTEN_ADDR=0.0.0.0:50051
RUST_LOG=debug

Production Environment

# Production environment variables
WORK_MODE=grpc
DATABASE_URL=postgres://helium_user:secure_password@db.example.com:5432/helium_prod
REDIS_URL=redis://redis.example.com:6379
MQ_URL=amqp://helium_user:secure_password@mq.example.com:5672/
LISTEN_ADDR=0.0.0.0:50051
RUST_LOG=info

Multi-Worker Deployment

For production, run multiple worker processes:

# API Server (can be scaled horizontally)
WORK_MODE=grpc ./helium-server &

# Background Tasks (can be scaled)
WORK_MODE=consumer ./helium-server &

# Email Processing (single instance recommended)
WORK_MODE=mailer ./helium-server &

# Scheduled Tasks (MUST be single instance)
WORK_MODE=cron_executor ./helium-server &

Configuration Management

To update module configurations:

Via Database: Insert/update records in the application__config table
Via Admin API: Use the management gRPC API to update configurations
Configuration Sync: The system automatically syncs configurations from PostgreSQL to Redis cache

Example SQL for updating auth configuration:

INSERT INTO application__config (key, content)
VALUES ('auth', '{"jwt": {"secret": "new-secret"}, ...}')
ON CONFLICT (key) DO UPDATE SET
  content = EXCLUDED.content,
  updated_at = NOW();

Security Considerations

⚠️ Critical Configuration Security:

JWT Secrets: Use cryptographically secure random strings (32+ characters)
VPN Server Token: Generate secure random tokens for server authentication
Database Credentials: Use strong passwords and restrict database access
SMTP Credentials: Use application-specific passwords, not primary account passwords
OAuth Secrets: Keep OAuth client secrets secure and rotate them regularly

Troubleshooting

Common Configuration Issues

Database Connection: Verify PostgreSQL accessibility and credentials
Redis Connection: Check Redis server status and network connectivity
RabbitMQ Connection: Ensure RabbitMQ server is running and accessible
Email Delivery: Test SMTP configuration with your email provider
OAuth Issues: Verify client IDs, secrets, and redirect URIs match provider settings

Validation Commands

# Test database connection
sqlx migrate info --database-url $DATABASE_URL

# Test Redis connection
redis-cli -u $REDIS_URL ping

# Test RabbitMQ connection
rabbitmqctl status  # on RabbitMQ server

Helium CLI

The Helium CLI (helium-cli) is a comprehensive administrative tool that allows operators to:

Initialize system configurations with default values
Manage admin accounts (create, list, view, delete)
Validate configuration files before deployment
Interact with both PostgreSQL database and Redis cache

Installation

The CLI is built as part of the main Helium project. After building the project:

cargo build --release --bin helium-cli

The binary will be available at target/release/helium-cli.

Global Configuration

The CLI requires database and Redis connections to function. These can be configured via:

Environment Variables

export DATABASE_URL="postgresql://user:password@localhost/helium"
export REDIS_URL="redis://localhost:6379"

Command Line Arguments

helium-cli --database-url "postgresql://user:password@localhost/helium" \
           --redis-url "redis://localhost:6379" \
           <command>

Verbose Logging

Enable detailed logging for troubleshooting:

helium-cli --verbose <command>

Commands

Configuration Management

Initialize All Configurations

helium-cli init-config

This command initializes all system configurations with their default values. It:

Creates default configurations for all modules in the database
Updates Redis cache with the configurations
Handles the following configuration types:
- Auth: Authentication and authorization settings
- Admin JWT: JWT configuration for admin authentication
- Telecom: Telecom service configurations
- Shop: E-commerce and shop settings
- Market: Affiliate and marketing configurations
- Mailer: SMTP and email service settings

Example Output:

Initializing 6 configuration types...

Initializing Auth config... ✓ Success
Initializing Admin JWT config... ✓ Success
Initializing Telecom config... ✓ Success
Initializing Shop config... ✓ Success
Initializing Market/Affiliate config... ✓ Success
Initializing Mailer config... ✓ Success

Configuration initialization completed:
  ✓ Successful: 6

Use Cases:

Initial deployment setup
Resetting configurations to defaults
Disaster recovery scenarios

Validate Configuration Files

helium-cli validate-config --config-type <TYPE> <config-file.json>

Validates a JSON configuration file against the specified configuration schema.

Supported Configuration Types:

auth - Authentication configuration
admin-jwt / admin_jwt - Admin JWT configuration
telecom - Telecom service configuration
shop - Shop/e-commerce configuration
market / affiliate - Marketing/affiliate configuration
mailer - Email service configuration

Examples:

# Validate auth configuration
helium-cli validate-config --config-type auth auth-config.json

# Validate mailer configuration
helium-cli validate-config --config-type mailer smtp-config.json

Example Output:

✓ Configuration file is valid!
  File: auth-config.json
  Type: Auth
  Key: auth

Admin Account Management

List Admin Accounts

helium-cli admin list [--limit <N>] [--offset <N>]

Lists all admin accounts with pagination support.

Options:

--limit <N> - Number of results to return (default: 50)
--offset <N> - Number of results to skip (default: 0)

Example:

# List first 10 admin accounts
helium-cli admin list --limit 10

# List admin accounts with pagination
helium-cli admin list --limit 25 --offset 50

Example Output:

Found 3 admin account(s):

ID                                   Role                 Name                           Email                          Created At
------------------------------------ -------------------- ------------------------------ ------------------------------ --------------------
123e4567-e89b-12d3-a456-426614174000 Super Admin          System Administrator           admin@example.com              2024-01-15T10:30:00Z
234e5678-e89b-12d3-a456-426614174001 Customer Support     Support Team Lead              support@example.com            2024-01-16T14:20:00Z
345e6789-e89b-12d3-a456-426614174002 Moderator            Content Moderator              moderator@example.com          2024-01-17T09:45:00Z

Show Admin Account Details

helium-cli admin show <ADMIN_ID>

Displays detailed information about a specific admin account.

Example:

helium-cli admin show 123e4567-e89b-12d3-a456-426614174000

Example Output:

Admin Account Details:
  ID: 123e4567-e89b-12d3-a456-426614174000
  Name: System Administrator
  Role: Super Admin
  Email: admin@example.com
  Avatar: https://example.com/avatar.jpg
  Created At: 2024-01-15T10:30:00Z

Create Admin Account

helium-cli admin create --name <NAME> --role <ROLE> [--email <EMAIL>] [--avatar <AVATAR_URL>]

Creates a new admin account with the specified details.

Required Options:

--name <NAME> - Display name for the admin
--role <ROLE> - Admin role (see roles below)

Optional Options:

--email <EMAIL> - Admin email address
--avatar <AVATAR_URL> - URL to admin avatar image

Available Roles:

super_admin / superadmin / super-admin - Full system access
moderator - Content moderation privileges
customer_support / customersupport / customer-support - Customer service access
support_bot / supportbot / support-bot - Automated support system access

Examples:

# Create super admin
helium-cli admin create \
  --name "System Administrator" \
  --role super_admin \
  --email "admin@example.com"

# Create customer support account
helium-cli admin create \
  --name "Support Agent" \
  --role customer_support \
  --email "support@example.com" \
  --avatar "https://example.com/avatars/support.jpg"

# Create moderator (minimal info)
helium-cli admin create \
  --name "Content Moderator" \
  --role moderator

Example Output:

Successfully created admin account:
  ID: 456e7890-e89b-12d3-a456-426614174003
  Name: System Administrator
  Role: Super Admin
  Email: admin@example.com
  Avatar: N/A

Delete Admin Account

helium-cli admin delete <ADMIN_ID> [--yes]

Deletes an admin account after confirmation.

Options:

--yes - Skip confirmation prompt (use with caution)

Examples:

# Delete with confirmation prompt
helium-cli admin delete 123e4567-e89b-12d3-a456-426614174000

# Delete without confirmation (automated scripts)
helium-cli admin delete 123e4567-e89b-12d3-a456-426614174000 --yes

Example Interactive Flow:

Admin account to delete:
  ID: 123e4567-e89b-12d3-a456-426614174000
  Name: Old Administrator
  Role: Super Admin
  Email: old-admin@example.com

Are you sure you want to delete this admin account? [y/N]: y
Successfully deleted admin account: 123e4567-e89b-12d3-a456-426614174000

Common Use Cases

Initial Deployment

Set up environment variables:

export DATABASE_URL="postgresql://helium:password@localhost/helium"
export REDIS_URL="redis://localhost:6379"

Initialize system configurations:
```
helium-cli init-config
```

Create initial super admin:

helium-cli admin create \
  --name "System Administrator" \
  --role super_admin \
  --email "admin@yourcompany.com"

Configuration Management Workflow

Prepare configuration file: Create a JSON file with your custom configuration.

Validate before deployment:

helium-cli validate-config --config-type auth ./configs/auth-config.json

Deploy configuration: Use the web interface or API to upload the validated configuration.

Admin Account Maintenance

Regular audit of admin accounts:
```
helium-cli admin list --limit 100
```

Create specialized support accounts:

# Customer support team
helium-cli admin create --name "Support Team A" --role customer_support

# Content moderation team
helium-cli admin create --name "Moderator Team B" --role moderator

Remove inactive accounts:

helium-cli admin delete <inactive-admin-id>

Error Handling

The CLI provides comprehensive error messages and logging:

Database Connection Issues: Check DATABASE_URL and database availability
Redis Connection Issues: Verify REDIS_URL and Redis service status
Configuration Validation Errors: Review JSON syntax and required fields
Admin Role Errors: Ensure role names match supported values exactly

Security Considerations

Environment Variables: Use secure methods to set database credentials
Admin Creation: Be selective with super_admin role assignments
Account Deletion: Always verify admin identity before deletion
Logging: Be aware that verbose mode may log sensitive information

Troubleshooting

Common Issues

“DATABASE_URL must be provided”

Set the DATABASE_URL environment variable or use --database-url flag

“Failed to connect to database”

Verify PostgreSQL is running and accessible
Check connection string format and credentials
Ensure the database exists

“Invalid admin role”

Use exact role names: super_admin, moderator, customer_support, support_bot
Role names are case-insensitive but must match supported variants

“Configuration validation failed”

Check JSON syntax with a JSON validator
Ensure all required fields are present
Verify field types match expected schema

Getting Help

Use the built-in help system:

# General help
helium-cli --help

# Command-specific help
helium-cli admin --help
helium-cli admin create --help

Integration with Deployment Scripts

The CLI is designed to work well in automated deployment scenarios:

#!/bin/bash
set -e

# Set environment
export DATABASE_URL="$HELIUM_DB_URL"
export REDIS_URL="$HELIUM_REDIS_URL"

# Initialize configurations
echo "Initializing Helium configurations..."
helium-cli init-config

# Create admin account if it doesn't exist
echo "Creating admin account..."
helium-cli admin create \
  --name "$ADMIN_NAME" \
  --role super_admin \
  --email "$ADMIN_EMAIL" || true

echo "Deployment initialization complete!"

This CLI tool is essential for proper Helium deployment and ongoing operational management. Use it as part of your deployment automation and regular maintenance procedures.

Migrate From SS-Panel UIM

This guide walks Helium operators through migrating an existing SS-Panel UIM deployment. The migration intentionally happens in two isolated passes so you can export data from the legacy MariaDB instance without touching the new Helium PostgreSQL database until you are ready.

At a high level:

mariadb-pass reads all data from the SS-Panel MariaDB schema and saves it to a local rkyv archive.
postgre-pass consumes that rkyv archive and writes normalized data into Helium’s PostgreSQL schema.

Because Helium normally targets PostgreSQL, the first pass uses a dedicated crate that bundles the MySQL client driver and builds separately from the rest of the project.

What Gets Migrated

The migration transfers the following SS-Panel data into Helium’s schema:

User Accounts

Email and password hashes (preserved as-is for seamless login)
User names and registration timestamps
Last active timestamps
Account balances (available balance for purchasing)
Referral relationships (affiliate ref_by links)
Traffic usage (upload/download totals)
VMess UUIDs (for node authentication)
Subscribe tokens (subscription links)
Invite codes (user-specific invite codes)

Helium creates corresponding entries in:

auth.user_account (login credentials)
auth.user_profile (profile metadata)
shop.user_balance (financial data)
market.affiliate_user_policy (referral relationships)
telecom.user_nodes_token (node authentication tokens)

Products → Packages

SS-Panel products are converted to Helium packages with:

Package name
Price
Duration (time allowance in days)
Bandwidth quota

These populate the telecom.package table.

Orders → Package Queues

Historical purchase orders are replayed into Helium’s package queue system:

Order status (activated vs. pending)
Creation and update timestamps
Associated product/package

Orders are inserted into telecom.package_queue to preserve user entitlements and purchase history.

Nodes → Node Servers & Clients

SS-Panel nodes are split into two Helium entities:

Node servers (telecom.node_server): server address, rate, class
Node clients (telecom.node_client): protocol configurations (VMess, WebSocket, gRPC)

Each node’s custom configuration (ports, security, network transport) is normalized to Helium’s node client schema.

Data Not Migrated

The following SS-Panel data is not migrated:

Invoices (read but not written to Helium)
Payback records (read but not written)
Admin accounts (must be created manually via helium-cli)
System configurations (initialize via helium-cli init-config)
Announcements and tickets (start fresh in Helium)

Prerequisites

SS-Panel UIM running on MariaDB (or MySQL-compatible) that you can access in read-only mode during export.
A ready Helium PostgreSQL database with migrations applied and no production users yet. Run sqlx migrate run before importing.
Adequate disk space wherever you write the rkyv archive. Expect several hundred megabytes for large installs.
Rust toolchain (same as Helium) and network access to both databases from the machine performing the migration.
Optional: a safe location (e.g., object storage) to back up the generated rkyv file.

Pass 1 – Export From SS-Panel (MariaDB)

The exporter lives in ssp-migrator/mariadb-pass and is compiled with SQLx’s MySQL feature set. Build and run it separately from the main server binaries.

Build the exporter

mariadb-pass uses SQLx’s compile-time query checking. The workspace ships with .sqlx caches for PostgreSQL only, so generic commands such as cargo build --release -p mariadb-pass will fail. You must compile from the crate directory with access to a live SS-Panel database (or export SQLx metadata for MariaDB manually).

cd ssp-migrator/mariadb-pass
SQLX_OFFLINE=false DATABASE_URL="$SSP_DATABASE_URL" cargo build --release

The DATABASE_URL environment variable is required during compilation so SQLx can introspect the MariaDB schema. If you cannot open a direct connection from the build host, generate SQLx data offline with sqlx prepare against MariaDB and commit it alongside the crate before building.

Prepare connection settings

You can pass the database URL directly on the command line or export it as an environment variable. A typical MariaDB connection string looks like:

export SSP_DATABASE_URL="mysql://user:password@legacy-host:3306/sspanel"

Run the exporter

cd ssp-migrator/mariadb-pass
SQLX_OFFLINE=false DATABASE_URL="$SSP_DATABASE_URL" cargo run --release -- \
  --database-url "$SSP_DATABASE_URL" \
  --output-file /tmp/helium-migration.rkyv

The command performs several steps internally:

Streams each SS-Panel entity (users, products, orders, nodes, etc.) in batches.
Normalizes relationships to Helium’s intermediate structs.
Serializes the result to an rkyv archive (default name migration_data.rkyv).

Monitor the logs for warnings about rows that cannot be converted. The exporter skips invalid records but continues processing.

When the run finishes you should have an archive file similar to /tmp/helium-migration.rkyv. Back it up before moving on.

Pass 2 – Import Into Helium (PostgreSQL)

The importer lives in ssp-migrator/postgre-pass and understands Helium’s canonical schema. Ensure the target PostgreSQL database is empty or freshly provisioned to avoid collisions.

Build the importer

cargo build --release -p postgre-pass

This binary only links the PostgreSQL driver, so it compiles with the same workspace settings as other Helium components.

Prepare connection settings

export HELIUM_DATABASE_URL="postgres://helium:password@new-host:5432/helium_db"

Run the importer

cargo run --release -p postgre-pass -- \
  --rkyv-file /tmp/helium-migration.rkyv \
  --database-url "$HELIUM_DATABASE_URL"

The importer performs conversions aligned with Helium’s modules:

Inserts node servers and clients in the correct dependency order.
Creates packages, affiliate policies, balances, and user accounts.
Replays historical purchases into the package queue so users retain entitlements.

If anything fails, no partial state is left behind—each insert group is committed in dependency order. Fix the reported data issue, rebuild the rkyv archive if necessary, and rerun the importer.

Post-migration Checklist

Confirm the importer logs Migration completed successfully.
Inspect a handful of migrated users in Helium’s admin tools (profiles, balances, active packages).
Verify node configurations in telecom match the expected SS-Panel node inventory.
Rotate user credentials if required by your migration policy (password hashes are imported as-is).
Schedule DNS cutover and client config updates after validating the new deployment.

Troubleshooting

MariaDB TLS or authentication errors: confirm the MariaDB driver accepts your certificates or append parameters (e.g., ?ssl-mode=REQUIRED).
Missing subscribe links or invite codes: the exporter requires these tables to be populated for each user. Reconcile data in SS-Panel before exporting.
Importer stops on unique constraint violations: verify the PostgreSQL database is clean. Drop and recreate the schema, then rerun the importer.
Large datasets: run the exporter on a machine close to the database to reduce latency. You can copy the resulting rkyv file to the environment where the importer runs.

With both passes complete, Helium now has a faithful copy of the SS-Panel data and you can proceed with normal deployment and cutover activities.

Keyboard shortcuts

Helium Documentation