Skip to content

Architecture Overview โ€‹

NG Gateway is a high-performance IoT gateway designed for Industrial/Edge scenarios: It transports high-frequency data collected from southward device protocols through a Unified Data Model + Controllable Backpressure Event Pipeline to northward channels, and evolves continuously in a pluggable manner.

Rust High-Performance Core โ€‹

The core runtime is based on the tokio asynchronous model. To achieve "High Concurrency + High Throughput + Controllable Resources" simultaneously, we follow these architectural paradigms:

  • Fault Isolation: Tasks are split by device/channel/plugin granularity to avoid single-point anomaly diffusion.
  • Backpressure First: All critical event streams use Bounded Queues, turning "slow" from implicit accumulation into explicit controllability.
  • Structured Concurrency: Tasks have clear parent-child relationships, and stop/reload have clean and unified cancellation paths and timeout limits.

Understanding NG Gateway in One Diagram โ€‹

Tip: This is a logical view (not a physical deployment diagram). Actual deployment can be single-machine, containerized, or Kubernetes clustered.

text
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                              NG Gateway                               โ”‚
โ”‚                                                                       โ”‚
โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”‚
โ”‚  โ”‚   Southward   โ”‚    โ”‚   Core Pipeline       โ”‚    โ”‚   Northward    โ”‚ โ”‚
โ”‚  โ”‚  Drivers/IO   โ”‚โ”€โ”€โ”€โ–ถโ”‚  (async + backpressureโ”‚โ”€โ”€โ”€โ–ถโ”‚  Connectors    โ”‚ โ”‚
โ”‚  โ”‚  Modbus/S7/...โ”‚    โ”‚   + transform)        โ”‚    โ”‚  MQTT/Opcua/...โ”‚ โ”‚
โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–ฒโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–ฒโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–ฒโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ”‚
โ”‚          โ”‚                        โ”‚                        โ”‚          โ”‚
โ”‚          โ”‚                        โ”‚                        โ”‚          โ”‚
โ”‚   โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”        โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”       โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”  โ”‚
โ”‚   โ”‚  Device/    โ”‚        โ”‚  Data Model     โ”‚       โ”‚  Plugins      โ”‚  โ”‚
โ”‚   โ”‚  Driver Confโ”‚        โ”‚  & Events       โ”‚       โ”‚  Enrichment   โ”‚  โ”‚
โ”‚   โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜        โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜       โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜  โ”‚
โ”‚                                                                       โ”‚
โ”‚  Observability (tracing/metrics)  |  Security (TLS/AuthZ)  | Storage  โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Design Goals & Key Trade-offs โ€‹

NG Gateway faces the real industrial field of "Small Packets, High Frequency + Multi-Protocol Coexistence", which determines architectural priorities:

  • Throughput & Stability First: All critical links must withstand peak loads; the system must protect itself when external dependencies are unstable.
  • Extensible Without Losing Control: Supports runtime hot-plugging of drivers/plugins, on-demand enabling, and independent configuration, but the core path remains observable, tunable, and recyclable.
  • Unified Semantics & Data Model: Avoid "one model per protocol", ensuring northward integration and business rules can be reused.
  • Operations Friendly: Locatable, replayable, rollback-able; key metrics are measurable; error information has context.

Two Planes: Data Plane & Control Plane โ€‹

Architecturally, two planes are explicitly distinguished (this significantly reduces complexity):

  • Data Plane: Southward Acquisition โ†’ Normalization โ†’ Transformation/Routing โ†’ Northward Delivery (High Frequency, Throughput Sensitive)
  • Control Plane: Configuration/Command/Lifecycle Management (Low Frequency, Strong Consistency/Auditability more important)

Data Plane โ€‹

sequenceDiagram
  autonumber
  participant DA as Southward Device Actor (poll/subscribe)
  participant PD as Parser/Decoder
  participant NM as Normalize (Unified Model)
  participant Q1 as Bounded Queue (telemetry/event)
  participant TR as Transform & Rule (optional)
  participant RT as Route & Partition (app-channel)
  participant Q3 as Bounded Queue (publish)
  participant NB as Northward Connector Actor (MQTT v5/Kafka/...)

  loop telemetry/event stream
    DA->>PD: frames
    PD->>NM: points/events
    NM->>Q1: enqueue (bounded)
    Q1->>TR: dequeue
    opt transform/rule enabled
      TR->>RT: transform & route decision
    end
    RT->>Q3: enqueue publish (bounded)
    Q3->>NB: dequeue + publish
    Note over NB: retry + backoff on transient failures
  end

  Note over Q1,Q3: Backpressure propagates via bounded queues (slow publish -> upstream slows)

Control Plane โ€‹

sequenceDiagram
  autonumber
  participant P as Platform
  participant N as Northward Connector
  participant C as Core Command Router
  participant S as Southward Device Actor (Driver)

  P->>N: down command (command_id, payload)
  N->>C: validate + authorize + decode
  C->>S: route + execute (device_id, point command)
  alt success
    S-->>C: result (ok, value/receipt)
    C-->>N: ack (success)
    N-->>P: ack/result
  else timeout or device offline
    S-->>C: error (timeout/offline)
    C-->>N: ack (failed, reason)
    N-->>P: ack (failed)
  end

Queues & Backpressure Design: Bounded is Not "A Switch", But an Architecture โ€‹

For backpressure to "truly work", bounded queues must be placed at critical boundaries, with clear strategies for when they are full (Block/Drop/Degrade):

flowchart TB
  subgraph DEV[Device Actor]
    DA[device actor]
    RX["rx: command queue<br/>(bounded)"]
    TX["tx: telemetry queue<br/>(bounded)"]
  end

  subgraph CORE[Core Pipeline]
    QCORE["core queue<br/>(bounded)"]
    STAGE[transform/enrich stages]
  end

  subgraph NB[Northward]
    QPUB["publish queue<br/>(bounded)"]
    PUB[connector publish]
  end

  RX -->|execute| DA
  DA --> TX
  TX --> QCORE --> STAGE --> QPUB --> PUB
  PUB -. backpressure .-> QPUB
  QPUB -. backpressure .-> STAGE
  STAGE -. backpressure .-> QCORE
  QCORE -. backpressure .-> TX

TIP

Key Point: Backpressure strategies must match business semantics. Not all messages are worth "infinite retries".

Data Flow & Backpressure: From Southward to Northward โ€‹

From a data perspective, the gateway should be a clear, observable, backpressure-capable pipeline:

text
  โ†’ Southward (drivers)
  โ†’ Normalize (unified model)
  โ†’ Transform/Enrich (optional stages)
  โ†’ Route (topic/app/channel)
  โ†’ Northward (plugins)

The key point is How Backpressure Propagates:

  • Northward slows down โ†’ publish queue fills up โ†’ core queue fills up โ†’ Southward acquisition throttles/backs off โ†’ System overall stability
  • No link should silently pile data into an unbounded cache (this shifts risk to OOM)

Failure Semantics: A Stable System Must First Define "How to Handle Failure" โ€‹

Failure PointTypical CauseRecommended Default ActionConfigurable Items to Expose
Southward Read TimeoutDevice busy/Line jitterFinite retries + Backofftimeout, retries, backoff
Southward Parse FailureNoise/Partial Packet/CRCDrop bad frame + Resyncmax_frame_size, resync
Northward Publish FailureNetwork disconnect/Auth/Rate limitRetry + Backoff; Degrade if necessaryretries, backoff, buffer_policy
Queue FullDownstream slows downTrigger backpressure; or drop by semanticsqueue_capacity, drop_policy
Downlink Command TimeoutDevice unresponsiveReturn explicit failure (retryable)command_timeout, idempotency

Best Practice

Make "Failure Semantics" part of the configuration schema, and visualize them in metrics (retry counts, drop counts, blocking time).

Plugin-based Extension โ€‹

  • Southward Driver: Oriented to field protocols, responsible for acquisition and parsing.
  • Northward Plugin: Oriented to platform protocols/SDKs, responsible for reporting and control dispatch.

The value of plugin-based architecture is not just "hot-plugging", but Isolating Changes & Risk Control:

  • Adding platform integration does not affect the core acquisition link.
  • Adding a protocol driver only affects the IO tasks and parsers of the corresponding devices.
  • Enable on demand, reducing attack surface and resource consumption.

Released under the Apache License 2.0.