Observability

This document describes the observability features implemented in the telecom module. These features enable users and administrators to monitor system performance, track usage, and maintain visibility into the health of the telecom infrastructure.

User Observability Features

The telecom module provides several APIs that allow users to monitor their usage and the status of their assigned nodes.

Traffic Usage Tracking

The AnalysisService provides comprehensive traffic monitoring capabilities for users through the GetRecentTrafficUsage API.

API Endpoint

gRPC Service: Telecom.GetRecentTrafficUsage
Request: GetRecentTrafficUsageRequest
Response: RecentTrafficUsageResponse

Implementation Details

The service provides traffic data in different time ranges:

Day: Hourly bucketing for the last 24 hours
Week: Daily bucketing for the last 7 days
Month: Daily bucketing for the last 30 days

Key Components:

Raw Traffic: Actual traffic consumed by the user
Billed Traffic: Traffic that was actually charged to the user’s quota (may differ due to traffic multipliers)

// Usage example in service layer
use crate::services::analysis::{AnalysisService, GetRecentTrafficUsage, RecentRange};

let usage_response = analysis_service.process(GetRecentTrafficUsage {
    user_id: user_id,
    range: RecentRange::Day,  // or Week, Month
}).await?;

// Response contains two data sets:
// - usage_response.raw: actual traffic consumed
// - usage_response.actually_billed: traffic charged to quota

Database Queries Used:

GetUserHourlyUsage: For day-range queries
GetUserDailyUsage: For week/month-range queries

Node Status History

Users can monitor the historical status of their assigned proxy nodes through the ListNodeStatusHistory API.

API Endpoint

gRPC Service: Telecom.ListNodeStatusHistory
Request: ListNodeStatusHistoryRequest
Response: ListNodeStatusHistoryReply

Implementation Details

The API provides hourly aggregated node status information:

Online Nodes: Count of nodes that were online in each hour
Offline Nodes: Count of nodes that were offline in each hour
Maintenance Nodes: Count of nodes under maintenance in each hour

// Usage example
use crate::services::analysis::{AnalysisService, ListUserNodeStatusHistory};

let history = analysis_service.process(ListUserNodeStatusHistory {
    start: start_time,
    end: end_time,
    user_id: user_id,
}).await?;

// Each history entry contains:
// - bucket_start: timestamp of the hour
// - online_nodes, offline_nodes, maintenance_nodes: counts for that hour

Data Source: Uses materialized view node_status_hourly_mv for efficient querying.

Node Usage Analytics

The system tracks which nodes users utilize most frequently through the ListUsuallyUsedNodes API.

API Endpoint

gRPC Service: Telecom.ListUsuallyUsedNodes
Request: ListUsuallyUsedNodesRequest
Response: ListUsuallyUsedNodesResponse

Implementation Details

Provides analytics on user’s node usage patterns:

Node Information: ID, name of frequently used nodes
Traffic Statistics: Upload, download, and billed traffic per node

// Usage example
use crate::services::analysis::{AnalysisService, ListUserUsuallyUsedNodes};

let nodes = analysis_service.process(ListUserUsuallyUsedNodes {
    user_id: user_id,
}).await?;

// Each node entry contains:
// - node_client_id, node_name: identification
// - upload, download, billed_traffic: usage statistics

Node List and Status

Users can view their available nodes and their current status through the ListNodes API.

API Endpoint

gRPC Service: Telecom.ListNodes
Request: ListNodesRequest
Response: ListNodesReply

Implementation Details

Provides real-time information about user’s assigned nodes:

Node Details: ID, name, traffic factor, display order
Performance Info: Speed limits, current status
Metadata: Country, location, route class

gRPC Service: NodeServerManage.ListNodeServers
Features:
- Filter by server status (Online/Offline/Maintenance)
- Pagination support (limit/offset)
- Shows server compatibility, status, last online time, and client count

// Usage example in manage service
use crate::services::manage::{AdminListServers, ManageService};

let servers = manage_service.process(AdminListServers {
    limit: 50,
    offset: 0,
    filter_status: Some(NodeServerStatus::Offline), // Optional filter
}).await?;

Show Individual Server Details

gRPC Service: NodeServerManage.ShowNodeServer
Features:
- Complete server configuration
- Current status and performance metrics
- Last online timestamp

Node Client Management

List All Node Clients

Comprehensive view of all proxy node clients.

gRPC Service: NodeClientManage.ListNodeClients
Features:
- Complete client information including server relationships
- Traffic factors and routing configurations
- Status monitoring and metadata

Individual Client Details

gRPC Service: NodeClientManage.ShowNodeClient
Features:
- Detailed client configuration
- Associated server information
- Performance and status metrics

Package Queue Monitoring

Queue Statistics

Monitor package queue health and performance.

gRPC Service: PackageQueueManage.CountQueuedPackages
Features:
- Count of packages by series
- Queue status overview

Package List Management

gRPC Service: PackageQueueManage.ListQueuedPackages
Features:
- Filter by user, order, package, or status
- Pagination support
- Complete package lifecycle visibility

Background Job Monitoring

The telecom module runs several scheduled jobs for system maintenance and monitoring:

Node Health Monitoring (`RefreshServerStatus`)

Purpose: Automatically mark servers as online/offline based on heartbeat
Frequency: Configurable via TelecomConfig.node_health_check.offline_timeout
Implementation: TelecomCronExecutor in cron.rs

Package Expiration Management (`PackageExpiringJob`)

Purpose: Automatically expire packages based on time limits
Frequency: Regular scanning for expired packages
Events: Publishes PackageExpiringEvent to message queue

Traffic Billing Processing (`BillTrafficJob`)

Purpose: Process unbilled traffic usage and publish billing events
Frequency: Regular processing of accumulated traffic data
Events: Publishes UserUsageBillingEvent for each user

Node Status History Recording (`RecordNodeStatusHistoryJob`)

Purpose: Record current status of all node servers for historical analysis
Frequency: Hourly status snapshots
Storage: Populates node_status_history table

Status View Refresh (`RefreshNodeStatusViewJob`)

Purpose: Refresh the materialized view for efficient status queries
Frequency: Regular refresh of node_status_hourly_mv
Optimization: Includes data cleanup and analysis for performance

Publisher: External systems (cron jobs, usage collectors)
Consumer: TelecomBillingHook
Route: telecom.user_usage_billing
Purpose: Record user traffic consumption and trigger package expiration

// Event structure
pub struct UserUsageBillingEvent {
    pub server_id: i32,
    pub user: Uuid,
    pub billed_download: i64,
    pub billed_upload: i64,
    pub time: u64,
}

`PackageExpiringEvent`

Publisher: Telecom billing system
Route: Package expiration processing
Purpose: Handle package lifecycle events

Tracing and Instrumentation

All RPC endpoints and critical services include comprehensive tracing:

Instrumentation: Uses tracing::instrument for observability
Error Logging: Structured error reporting with context
Performance Tracking: Request/response times and error rates

- id: Primary key
- node_server_id: Reference to node server
- status: Online/Offline/Maintenance
- created_at: Timestamp

`user_package_usage`

Tracks user traffic consumption:

- Hourly and daily aggregations
- Raw and billed traffic separation
- User and server associations

Materialized Views

`node_status_hourly_mv`

Optimized view for status history queries:

Hourly aggregations of node status
Efficient querying for analytics
Automatic refresh via cron jobs

Usage Guidelines

For Users

Use GetRecentTrafficUsage to monitor bandwidth consumption
Check ListNodeStatusHistory for node availability patterns
Analyze ListUsuallyUsedNodes to optimize node selection
Monitor ListNodes for real-time node status

For Administrators

Use server management APIs to monitor infrastructure health
Monitor package queues for system performance
Review cron job logs for automated maintenance status
Analyze event streams for system-wide observability

Development Considerations

All APIs follow the Processor pattern [[memory:6079830]]
Database connections use owned types, not static lifetimes [[memory:7107428]]
Comprehensive error handling with structured logging
Event-driven architecture for scalable monitoring
Materialized views for performance-critical queries

This observability framework provides complete visibility into the telecom system’s operation, enabling both users and administrators to monitor, analyze, and optimize the service effectively.

Keyboard shortcuts

Helium Documentation