Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Observability

This document describes the observability features implemented in the telecom module. These features enable users and administrators to monitor system performance, track usage, and maintain visibility into the health of the telecom infrastructure.

User Observability Features

The telecom module provides several APIs that allow users to monitor their usage and the status of their assigned nodes.

Traffic Usage Tracking

The AnalysisService provides comprehensive traffic monitoring capabilities for users through the GetRecentTrafficUsage API.

API Endpoint

  • gRPC Service: Telecom.GetRecentTrafficUsage
  • Request: GetRecentTrafficUsageRequest
  • Response: RecentTrafficUsageResponse

Implementation Details

The service provides traffic data in different time ranges:

  • Day: Hourly bucketing for the last 24 hours
  • Week: Daily bucketing for the last 7 days
  • Month: Daily bucketing for the last 30 days

Key Components:

  • Raw Traffic: Actual traffic consumed by the user
  • Billed Traffic: Traffic that was actually charged to the user’s quota (may differ due to traffic multipliers)
// Usage example in service layer
use crate::services::analysis::{AnalysisService, GetRecentTrafficUsage, RecentRange};

let usage_response = analysis_service.process(GetRecentTrafficUsage {
    user_id: user_id,
    range: RecentRange::Day,  // or Week, Month
}).await?;

// Response contains two data sets:
// - usage_response.raw: actual traffic consumed
// - usage_response.actually_billed: traffic charged to quota

Database Queries Used:

  • GetUserHourlyUsage: For day-range queries
  • GetUserDailyUsage: For week/month-range queries

Node Status History

Users can monitor the historical status of their assigned proxy nodes through the ListNodeStatusHistory API.

API Endpoint

  • gRPC Service: Telecom.ListNodeStatusHistory
  • Request: ListNodeStatusHistoryRequest
  • Response: ListNodeStatusHistoryReply

Implementation Details

The API provides hourly aggregated node status information:

  • Online Nodes: Count of nodes that were online in each hour
  • Offline Nodes: Count of nodes that were offline in each hour
  • Maintenance Nodes: Count of nodes under maintenance in each hour
// Usage example
use crate::services::analysis::{AnalysisService, ListUserNodeStatusHistory};

let history = analysis_service.process(ListUserNodeStatusHistory {
    start: start_time,
    end: end_time,
    user_id: user_id,
}).await?;

// Each history entry contains:
// - bucket_start: timestamp of the hour
// - online_nodes, offline_nodes, maintenance_nodes: counts for that hour

Data Source: Uses materialized view node_status_hourly_mv for efficient querying.

Node Usage Analytics

The system tracks which nodes users utilize most frequently through the ListUsuallyUsedNodes API.

API Endpoint

  • gRPC Service: Telecom.ListUsuallyUsedNodes
  • Request: ListUsuallyUsedNodesRequest
  • Response: ListUsuallyUsedNodesResponse

Implementation Details

Provides analytics on user’s node usage patterns:

  • Node Information: ID, name of frequently used nodes
  • Traffic Statistics: Upload, download, and billed traffic per node
// Usage example
use crate::services::analysis::{AnalysisService, ListUserUsuallyUsedNodes};

let nodes = analysis_service.process(ListUserUsuallyUsedNodes {
    user_id: user_id,
}).await?;

// Each node entry contains:
// - node_client_id, node_name: identification
// - upload, download, billed_traffic: usage statistics

Node List and Status

Users can view their available nodes and their current status through the ListNodes API.

API Endpoint

  • gRPC Service: Telecom.ListNodes
  • Request: ListNodesRequest
  • Response: ListNodesReply

Implementation Details

Provides real-time information about user’s assigned nodes:

  • Node Details: ID, name, traffic factor, display order
  • Performance Info: Speed limits, current status
  • Metadata: Country, location, route class

Admin Observability Features

Administrators have access to comprehensive monitoring and management capabilities for the entire telecom infrastructure.

Server Monitoring

List Node Servers

Admins can monitor all proxy servers in the system.

  • gRPC Service: NodeServerManage.ListNodeServers
  • Features:
    • Filter by server status (Online/Offline/Maintenance)
    • Pagination support (limit/offset)
    • Shows server compatibility, status, last online time, and client count
// Usage example in manage service
use crate::services::manage::{AdminListServers, ManageService};

let servers = manage_service.process(AdminListServers {
    limit: 50,
    offset: 0,
    filter_status: Some(NodeServerStatus::Offline), // Optional filter
}).await?;

Show Individual Server Details

  • gRPC Service: NodeServerManage.ShowNodeServer
  • Features:
    • Complete server configuration
    • Current status and performance metrics
    • Last online timestamp

Node Client Management

List All Node Clients

Comprehensive view of all proxy node clients.

  • gRPC Service: NodeClientManage.ListNodeClients
  • Features:
    • Complete client information including server relationships
    • Traffic factors and routing configurations
    • Status monitoring and metadata

Individual Client Details

  • gRPC Service: NodeClientManage.ShowNodeClient
  • Features:
    • Detailed client configuration
    • Associated server information
    • Performance and status metrics

Package Queue Monitoring

Queue Statistics

Monitor package queue health and performance.

  • gRPC Service: PackageQueueManage.CountQueuedPackages
  • Features:
    • Count of packages by series
    • Queue status overview

Package List Management

  • gRPC Service: PackageQueueManage.ListQueuedPackages
  • Features:
    • Filter by user, order, package, or status
    • Pagination support
    • Complete package lifecycle visibility

Background Job Monitoring

The telecom module runs several scheduled jobs for system maintenance and monitoring:

Node Health Monitoring (RefreshServerStatus)

  • Purpose: Automatically mark servers as online/offline based on heartbeat
  • Frequency: Configurable via TelecomConfig.node_health_check.offline_timeout
  • Implementation: TelecomCronExecutor in cron.rs

Package Expiration Management (PackageExpiringJob)

  • Purpose: Automatically expire packages based on time limits
  • Frequency: Regular scanning for expired packages
  • Events: Publishes PackageExpiringEvent to message queue

Traffic Billing Processing (BillTrafficJob)

  • Purpose: Process unbilled traffic usage and publish billing events
  • Frequency: Regular processing of accumulated traffic data
  • Events: Publishes UserUsageBillingEvent for each user

Node Status History Recording (RecordNodeStatusHistoryJob)

  • Purpose: Record current status of all node servers for historical analysis
  • Frequency: Hourly status snapshots
  • Storage: Populates node_status_history table

Status View Refresh (RefreshNodeStatusViewJob)

  • Purpose: Refresh the materialized view for efficient status queries
  • Frequency: Regular refresh of node_status_hourly_mv
  • Optimization: Includes data cleanup and analysis for performance

Event-Driven Observability

Usage Billing Events

The system processes usage data through asynchronous events:

UserUsageBillingEvent

  • Publisher: External systems (cron jobs, usage collectors)
  • Consumer: TelecomBillingHook
  • Route: telecom.user_usage_billing
  • Purpose: Record user traffic consumption and trigger package expiration
// Event structure
pub struct UserUsageBillingEvent {
    pub server_id: i32,
    pub user: Uuid,
    pub billed_download: i64,
    pub billed_upload: i64,
    pub time: u64,
}

PackageExpiringEvent

  • Publisher: Telecom billing system
  • Route: Package expiration processing
  • Purpose: Handle package lifecycle events

Tracing and Instrumentation

All RPC endpoints and critical services include comprehensive tracing:

  • Instrumentation: Uses tracing::instrument for observability
  • Error Logging: Structured error reporting with context
  • Performance Tracking: Request/response times and error rates

Database Schema for Observability

Core Tables

node_status_history

Stores historical node status data:

- id: Primary key
- node_server_id: Reference to node server
- status: Online/Offline/Maintenance
- created_at: Timestamp

user_package_usage

Tracks user traffic consumption:

- Hourly and daily aggregations
- Raw and billed traffic separation
- User and server associations

Materialized Views

node_status_hourly_mv

Optimized view for status history queries:

  • Hourly aggregations of node status
  • Efficient querying for analytics
  • Automatic refresh via cron jobs

Usage Guidelines

For Users

  1. Use GetRecentTrafficUsage to monitor bandwidth consumption
  2. Check ListNodeStatusHistory for node availability patterns
  3. Analyze ListUsuallyUsedNodes to optimize node selection
  4. Monitor ListNodes for real-time node status

For Administrators

  1. Use server management APIs to monitor infrastructure health
  2. Monitor package queues for system performance
  3. Review cron job logs for automated maintenance status
  4. Analyze event streams for system-wide observability

Development Considerations

  1. All APIs follow the Processor pattern [[memory:6079830]]
  2. Database connections use owned types, not static lifetimes [[memory:7107428]]
  3. Comprehensive error handling with structured logging
  4. Event-driven architecture for scalable monitoring
  5. Materialized views for performance-critical queries

This observability framework provides complete visibility into the telecom system’s operation, enabling both users and administrators to monitor, analyze, and optimize the service effectively.