How GPU providers, workers, the gateway and Redis talk to each other — and exactly what changes when we deploy into CAE / CCE Kubernetes, where Redis is reachable only via in-cluster Service DNS (no public LoadBalancer / ELB).
Thick green = direct Redis (the hot path: job queue, token stream, result, cancel). Blue = gateway HTTP (register / heartbeat / logs / metrics). Teal = carried over the Tailscale tailnet. Dashed red = blocked / disabled.
flowchart LR CL["Clients / SDK
OpenAI-compatible API"] VM["Physical VM · bare-metal GPU
worker-agent + vLLM"] subgraph K8S["CAE · CCE Kubernetes (restricted namespace)"] direction TB GW["API Gateway
:8080"] RD[("Redis
ClusterIP · headless
public NLB: OFF")] PG[("Postgres")] end subgraph DIS["Disabled in CAE — external cloud GPU"] direction TB RP["RunPod provider"] PI["Prime Intellect provider"] end CL -->|"HTTPS · ingress"| GW GW <-->|"direct redis · queue / result / pub-sub / state"| RD GW ---|"async"| PG VM -->|"HTTP · register / heartbeat / logs / metrics"| GW VM ==>|"direct redis over Tailscale tailnet:
BRPOP queue · PUBLISH stream · SET result · EXISTS cancel"| RD RP -.->|"needs public Redis or reverse-tunnel"| RD PI -.->|"no tunnel path — needs public Redis"| RD linkStyle 0 stroke:#3b82f6,stroke-width:2.5px; linkStyle 1 stroke:#10b981,stroke-width:4px; linkStyle 2 stroke:#3a4a5e,stroke-width:1.5px,stroke-dasharray:4 4; linkStyle 3 stroke:#3b82f6,stroke-width:2.5px; linkStyle 4 stroke:#14b8a6,stroke-width:4px; linkStyle 5 stroke:#ef4444,stroke-width:2px,stroke-dasharray:6 5; linkStyle 6 stroke:#ef4444,stroke-width:2px,stroke-dasharray:6 5; classDef k8s fill:#0c1b16,stroke:#10b981,color:#d1fae5; classDef ext fill:#101826,stroke:#3b82f6,color:#dbeafe; classDef dis fill:#1c0f12,stroke:#ef4444,color:#fecaca,stroke-dasharray:5 4; class GW,RD,PG k8s; class CL,VM ext; class RP,PI dis;
redis.publicLoadBalancer.enabled=false
and workerRedisUrl="" → the gateway hands workers the in-cluster Redis DNS. Only the provider set needs locking down.Inbound to the cluster vs. outbound from it, over time. The VM → Redis legs ride the tailnet; everything to the gateway is HTTP.
sequenceDiagram
autonumber
participant C as Client
participant G as Gateway (k8s)
participant R as Redis (ClusterIP)
participant W as worker-agent (VM · tailnet)
W->>G: register (HTTP) — request redis_url
G-->>W: redis_url = in-cluster Service DNS
loop every 5s
W->>G: heartbeat / logs / metrics (HTTP)
end
C->>G: POST /v1/chat/completions (HTTPS)
G->>R: LPUSH queue:{app_id}
W->>R: BRPOP queue:{app_id} (over tailnet)
W->>R: PUBLISH stream:{id} · SET result:{id}
R-->>G: SUBSCRIBE stream:{id} · GET result:{id}
G-->>C: SSE / JSON response
Cloud workers run outside the cluster, so they reach Redis either through a per-pod reverse SSH tunnel (RunPod, purple) or a public AWS NLB (Prime Intellect / non-tunnel, orange). This is exactly the public exposure CAE forbids.
flowchart LR CL["Clients / SDK"] RPP["RunPod pod
worker-agent + vLLM
(external cloud GPU)"] PIP["Prime Intellect pod
(external cloud GPU)"] VM["Physical VM
(tailnet)"] subgraph K8S["Kubernetes (prod)"] direction TB GW["API Gateway :8080"] RD[("Redis")] NLB(["Public AWS NLB
redis.publicLoadBalancer"]) end CL -->|"HTTPS"| GW GW <-->|"direct redis"| RD RD ---|"exposes :6379"| NLB RPP -->|"HTTP register / heartbeat / logs / metrics"| GW RPP ==>|"reverse SSH tunnel → pod loopback → Redis"| RD PIP -->|"HTTP"| GW PIP ==>|"direct redis via public NLB"| NLB VM ==>|"direct redis via tailnet"| RD linkStyle 0 stroke:#3b82f6,stroke-width:2.5px; linkStyle 1 stroke:#10b981,stroke-width:4px; linkStyle 2 stroke:#f59e0b,stroke-width:2.5px; linkStyle 3 stroke:#3b82f6,stroke-width:2.5px; linkStyle 4 stroke:#a855f7,stroke-width:4px; linkStyle 5 stroke:#3b82f6,stroke-width:2.5px; linkStyle 6 stroke:#f59e0b,stroke-width:4px; linkStyle 7 stroke:#14b8a6,stroke-width:4px; classDef k8s fill:#0c1b16,stroke:#10b981,color:#d1fae5; classDef ext fill:#101826,stroke:#3b82f6,color:#dbeafe; classDef pub fill:#241a07,stroke:#f59e0b,color:#fde68a; class GW,RD k8s; class NLB pub; class CL,RPP,PIP,VM ext;
Workers reach in-cluster Redis over the Tailscale tailnet. No public exposure. The CAE path.
Runs inside the gateway pod → ClusterIP Redis resolves directly. Good for dev/smoke tests.
External pods. Only reach private Redis via a per-pod reverse SSH tunnel — fragile: ephemeral keys lost on gateway restart, single-gateway only.
External pods with no reverse-tunnel code path at all. Cannot reach a private Redis — would force a public NLB.
Every distinct path that touches Redis, and whether it survives a cluster-internal-only Redis. The four direct-redis worker legs are the only ones that break — and only when the worker runs outside the cluster without a tunnel/tailnet.
| Path | Origin | Transport | Needs public Redis? | Survives svc-only? |
|---|---|---|---|---|
Job dispatch — BRPOP queue:{app} | worker (out) | direct-redis | Yes* | tunnel / tailnet |
Streaming — PUBLISH stream:{id} | worker (out) | direct-redis | Yes* | tunnel / tailnet |
Result write — SET result:{id} | worker (out) | direct-redis | Yes* | tunnel / tailnet |
Cancel poll — EXISTS cancel:{id} | worker (out) | direct-redis | Yes* | tunnel / tailnet |
| Worker register / heartbeat / logs / metrics | worker (out) | gateway-HTTP | No | Yes |
| Client inference (REST / SSE) | client (out) | gateway-HTTP | No | Yes |
| Gateway registry / autoscaler / reconciler | gateway (in) | direct-redis | No | Yes |
| Bench & training log streams | gateway (in) | direct-redis | No | Yes |
| Redis public LoadBalancer (AWS NLB) | chart → internet | direct-redis | Yes | No — keep OFF |
| Redis internal headless Service | chart → in-cluster | direct-redis | No | Yes |
docs/CAE_REDIS_EXPOSURE.md in the repo.