Architecture Overview โ
NG Gateway is a high-performance IoT gateway designed for Industrial/Edge scenarios: It transports high-frequency data collected from southward device protocols through a Unified Data Model + Controllable Backpressure Event Pipeline to northward channels, and evolves continuously in a pluggable manner.
Rust High-Performance Core โ
The core runtime is based on the tokio asynchronous model. To achieve "High Concurrency + High Throughput + Controllable Resources" simultaneously, we follow these architectural paradigms:
- Fault Isolation: Tasks are split by device/channel/plugin granularity to avoid single-point anomaly diffusion.
- Backpressure First: All critical event streams use Bounded Queues, turning "slow" from implicit accumulation into explicit controllability.
- Structured Concurrency: Tasks have clear parent-child relationships, and
stop/reloadhave clean and unified cancellation paths and timeout limits.
Understanding NG Gateway in One Diagram โ
Tip: This is a logical view (not a physical deployment diagram). Actual deployment can be single-machine, containerized, or Kubernetes clustered.
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ NG Gateway โ
โ โ
โ โโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโ โ
โ โ Southward โ โ Core Pipeline โ โ Northward โ โ
โ โ Drivers/IO โโโโโถโ (async + backpressureโโโโโถโ Connectors โ โ
โ โ Modbus/S7/...โ โ + transform) โ โ MQTT/Opcua/...โ โ
โ โโโโโโโโโฒโโโโโโโโ โโโโโโโโโโโโโฒโโโโโโโโโโโโ โโโโโโโโโฒโโโโโโโโโ โ
โ โ โ โ โ
โ โ โ โ โ
โ โโโโโโโโดโโโโโโโ โโโโโโโโโโดโโโโโโโโโ โโโโโโโโโดโโโโโโโโ โ
โ โ Device/ โ โ Data Model โ โ Plugins โ โ
โ โ Driver Confโ โ & Events โ โ Enrichment โ โ
โ โโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโ โ
โ โ
โ Observability (tracing/metrics) | Security (TLS/AuthZ) | Storage โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโDesign Goals & Key Trade-offs โ
NG Gateway faces the real industrial field of "Small Packets, High Frequency + Multi-Protocol Coexistence", which determines architectural priorities:
- Throughput & Stability First: All critical links must withstand peak loads; the system must protect itself when external dependencies are unstable.
- Extensible Without Losing Control: Supports runtime hot-plugging of drivers/plugins, on-demand enabling, and independent configuration, but the core path remains observable, tunable, and recyclable.
- Unified Semantics & Data Model: Avoid "one model per protocol", ensuring northward integration and business rules can be reused.
- Operations Friendly: Locatable, replayable, rollback-able; key metrics are measurable; error information has context.
Two Planes: Data Plane & Control Plane โ
Architecturally, two planes are explicitly distinguished (this significantly reduces complexity):
- Data Plane: Southward Acquisition โ Normalization โ Transformation/Routing โ Northward Delivery (High Frequency, Throughput Sensitive)
- Control Plane: Configuration/Command/Lifecycle Management (Low Frequency, Strong Consistency/Auditability more important)
Data Plane โ
sequenceDiagram
autonumber
participant DA as Southward Device Actor (poll/subscribe)
participant PD as Parser/Decoder
participant NM as Normalize (Unified Model)
participant Q1 as Bounded Queue (telemetry/event)
participant TR as Transform & Rule (optional)
participant RT as Route & Partition (app-channel)
participant Q3 as Bounded Queue (publish)
participant NB as Northward Connector Actor (MQTT v5/Kafka/...)
loop telemetry/event stream
DA->>PD: frames
PD->>NM: points/events
NM->>Q1: enqueue (bounded)
Q1->>TR: dequeue
opt transform/rule enabled
TR->>RT: transform & route decision
end
RT->>Q3: enqueue publish (bounded)
Q3->>NB: dequeue + publish
Note over NB: retry + backoff on transient failures
end
Note over Q1,Q3: Backpressure propagates via bounded queues (slow publish -> upstream slows)
Control Plane โ
sequenceDiagram
autonumber
participant P as Platform
participant N as Northward Connector
participant C as Core Command Router
participant S as Southward Device Actor (Driver)
P->>N: down command (command_id, payload)
N->>C: validate + authorize + decode
C->>S: route + execute (device_id, point command)
alt success
S-->>C: result (ok, value/receipt)
C-->>N: ack (success)
N-->>P: ack/result
else timeout or device offline
S-->>C: error (timeout/offline)
C-->>N: ack (failed, reason)
N-->>P: ack (failed)
end
Queues & Backpressure Design: Bounded is Not "A Switch", But an Architecture โ
For backpressure to "truly work", bounded queues must be placed at critical boundaries, with clear strategies for when they are full (Block/Drop/Degrade):
flowchart TB
subgraph DEV[Device Actor]
DA[device actor]
RX["rx: command queue<br/>(bounded)"]
TX["tx: telemetry queue<br/>(bounded)"]
end
subgraph CORE[Core Pipeline]
QCORE["core queue<br/>(bounded)"]
STAGE[transform/enrich stages]
end
subgraph NB[Northward]
QPUB["publish queue<br/>(bounded)"]
PUB[connector publish]
end
RX -->|execute| DA
DA --> TX
TX --> QCORE --> STAGE --> QPUB --> PUB
PUB -. backpressure .-> QPUB
QPUB -. backpressure .-> STAGE
STAGE -. backpressure .-> QCORE
QCORE -. backpressure .-> TX
TIP
Key Point: Backpressure strategies must match business semantics. Not all messages are worth "infinite retries".
Data Flow & Backpressure: From Southward to Northward โ
From a data perspective, the gateway should be a clear, observable, backpressure-capable pipeline:
โ Southward (drivers)
โ Normalize (unified model)
โ Transform/Enrich (optional stages)
โ Route (topic/app/channel)
โ Northward (plugins)The key point is How Backpressure Propagates:
- Northward slows down โ publish queue fills up โ core queue fills up โ Southward acquisition throttles/backs off โ System overall stability
- No link should silently pile data into an unbounded cache (this shifts risk to OOM)
Failure Semantics: A Stable System Must First Define "How to Handle Failure" โ
| Failure Point | Typical Cause | Recommended Default Action | Configurable Items to Expose |
|---|---|---|---|
| Southward Read Timeout | Device busy/Line jitter | Finite retries + Backoff | timeout, retries, backoff |
| Southward Parse Failure | Noise/Partial Packet/CRC | Drop bad frame + Resync | max_frame_size, resync |
| Northward Publish Failure | Network disconnect/Auth/Rate limit | Retry + Backoff; Degrade if necessary | retries, backoff, buffer_policy |
| Queue Full | Downstream slows down | Trigger backpressure; or drop by semantics | queue_capacity, drop_policy |
| Downlink Command Timeout | Device unresponsive | Return explicit failure (retryable) | command_timeout, idempotency |
Best Practice
Make "Failure Semantics" part of the configuration schema, and visualize them in metrics (retry counts, drop counts, blocking time).
Plugin-based Extension โ
- Southward Driver: Oriented to field protocols, responsible for acquisition and parsing.
- Northward Plugin: Oriented to platform protocols/SDKs, responsible for reporting and control dispatch.
The value of plugin-based architecture is not just "hot-plugging", but Isolating Changes & Risk Control:
- Adding platform integration does not affect the core acquisition link.
- Adding a protocol driver only affects the IO tasks and parsers of the corresponding devices.
- Enable on demand, reducing attack surface and resource consumption.
