Networking Essentials
Most system-design interviews touch networking. You don’t need to recite RFCs, but you do need to choose the right protocol, explain why, and anticipate the failure modes.
This post is my working cheat sheet — structured for re-reading before an interview, with diagrams, Go snippets, and the trade-offs that actually come up. I rewrote my original HelloInterview notes into seven sections that build on each other: the layer model, then each layer from the wire up to application protocols, then load balancing, then resilience.
How to use this: skim the headings and diagrams first. Second pass, read the “why it matters” paragraphs. Third pass, the code. Don’t memorize — understand the trade-off.
1. Networking 101
Every network interaction is a stack of responsibilities. Each layer talks only to the one directly above and below it, so you can swap implementations without breaking the others — the same HTTP request works whether it rides on Ethernet, Wi-Fi, or LTE.
Textbooks teach the 7-layer OSI model. In practice everyone uses the 4-layer TCP/IP model because the three top OSI layers collapse into “the application decides.” Know both names for the interview; use the 4-layer one when reasoning.
The 4-layer TCP/IP stack. Each layer wraps the payload from the layer above.
What each layer actually does
| Layer | Job in one sentence | Unit | Addresses |
|---|---|---|---|
| Application | ”What does this message mean?“ | message | URLs, hostnames |
| Transport | ”Who on that machine should get it, and is it reliable?“ | segment (TCP) / datagram (UDP) | port numbers |
| Internet | ”Which machine on the internet, and how do we route there?“ | packet | IP addresses |
| Link | ”How do we put bits on this physical medium?“ | frame | MAC addresses |
A packet at the link layer is literally a nested envelope: [ Ethernet [ IP [ TCP [ HTTP ... ] ] ] ]. Each device along the route peels off the link-layer envelope to decide where to forward next, then reseals with a new one.
What happens when you type https://example.com and press Enter
This is the single most common warm-up question. The full answer touches every layer:
sequenceDiagram
autonumber
participant C as 🖥️ Client
participant D as 🗺️ DNS
participant S as 🌐 Server :443
rect rgb(254, 249, 195)
Note over C,D: Phase 1 — DNS resolution
C->>D: query A example.com
activate D
D-->>C: 93.184.216.34
deactivate D
end
rect rgb(219, 234, 254)
Note over C,S: Phase 2 — TCP 3-way handshake
C->>S: SYN seq=x
activate S
S-->>C: SYN·ACK seq=y, ack=x+1
C->>S: ACK ack=y+1
end
rect rgb(220, 252, 231)
Note over C,S: Phase 3 — TLS 1.3 handshake (1 RTT)
C->>S: ClientHello + key share
S-->>C: ServerHello + cert + Finished
Note over C,S: both sides derive the session key
end
rect rgb(233, 213, 255)
Note over C,S: Phase 4 — HTTP over TLS
C->>S: GET / HTTP/1.1
S-->>C: 200 OK · Content-Type: text/html
deactivate S
endA single HTTPS request touches DNS, TCP, TLS, and HTTP. HTTP/2 and /3 fold steps 2 + 3 + 4 into fewer round-trips.
Mental shortcut for the interview:
- Resolve — DNS turns
example.cominto an IP (UDP 53, or TCP 53 for large answers). - Connect — TCP 3-way handshake (SYN, SYN-ACK, ACK) to the IP on port 443.
- Secure — TLS handshake negotiates a session key; with TLS 1.3 this is 1 RTT, sometimes 0-RTT on resumption.
- Request — the client sends an HTTP request over the encrypted stream.
- Respond — the server sends HTML. The browser parses, finds asset URLs, and repeats.
Every one of those steps is a potential failure mode an interviewer can probe. Keep it at the tip of your tongue.
Ports you are expected to know
| Port | Protocol | What |
|---|---|---|
| 22 | TCP | SSH |
| 53 | UDP / TCP | DNS |
| 80 | TCP | HTTP |
| 443 | TCP / UDP | HTTPS (UDP if HTTP/3) |
| 6379 | TCP | Redis |
| 5432 | TCP | Postgres |
| 9092 | TCP | Kafka |
2. Network Layer
The network (L3) layer’s job is to get a packet from a source IP to a destination IP, possibly across many routers. It doesn’t care about ports, connections, or reliability — those are the transport layer’s problem.
IPv4 vs IPv6 in one table
| IPv4 | IPv6 | |
|---|---|---|
| Address size | 32 bits | 128 bits |
| Total addresses | ~4.3 × 10⁹ (exhausted since 2011) | ~3.4 × 10³⁸ |
| Notation | 192.168.1.1 | 2001:db8::1 |
| Header | variable (20–60 B), checksummed | fixed 40 B, no checksum |
| NAT needed? | yes, universally | designed to not need it |
| Packet fragmentation | routers can fragment | only the sender; routers drop + PMTUD |
| Configuration | DHCP or static | SLAAC (stateless) + DHCPv6 |
In interviews: IPv6 adoption is slow because NAT + CGNAT let IPv4 limp along, and because dual-stack migration is politically painful. Design for both when you can, deploy behind a load balancer that terminates either.
What’s in an IPv4 header
You don’t need to memorize byte offsets, but knowing what fields exist explains a lot of real behavior — MTU issues, traceroute output, and why iptables rules reference TTL.
The two fields that come up the most:
- TTL (Time To Live) — a hop counter. Each router decrements it; when it hits 0 the packet is dropped and an ICMP “time exceeded” is sent back.
tracerouteexploits this by sending probes withTTL=1, 2, 3…and listening for the ICMP replies. - Protocol — tells the receiver how to interpret the payload: 6 for TCP, 17 for UDP, 1 for ICMP.
Routing, in one paragraph
Routers maintain routing tables that map “destination prefix → next hop.” When a packet arrives, the router looks up the longest matching prefix of the destination IP and forwards to the corresponding next hop. On the internet, routing tables are built dynamically by BGP between autonomous systems. On your laptop, the table has two useful entries — your subnet (192.168.1.0/24 → direct) and everything else (0.0.0.0/0 → your gateway).
# See your routing table$ ip routedefault via 192.168.1.1 dev wlan0192.168.1.0/24 dev wlan0 proto kernel scope link src 192.168.1.42
# Trace the hops to a destination$ traceroute -n example.com 1 192.168.1.1 1.4 ms 2 100.64.0.1 8.2 ms # CGNAT inside the ISP 3 203.0.113.5 9.1 ms ...NAT: why your home IP is a lie
Your laptop’s 192.168.1.42 is a private IP, invalid on the public internet. Your router does Network Address Translation: when you send a packet, it rewrites the source IP to the router’s public IP and remembers the translation in a table. When the reply comes back, it rewrites the destination back to 192.168.1.42 and forwards to your laptop.
flowchart LR
laptop["<b>Your laptop</b><br/><code>192.168.1.42</code><br/>src port 51000"]
router["<b>Router (NAT)</b><br/>priv 192.168.1.1<br/>pub 203.0.113.17<br/><i>keeps translation table</i>"]
target["<b>example.com</b><br/><code>93.184.216.34</code><br/>dst port 443"]
laptop -->|"outbound<br/>src 192.168.1.42:51000"| router
router -->|"rewritten<br/>src 203.0.113.17:60321"| target
target -->|"reply<br/>dst 203.0.113.17:60321"| router
router -->|"rewritten<br/>dst 192.168.1.42:51000"| laptop
classDef neutral fill:#dbeafe,stroke:#3b6fd6,stroke-width:1.5px,color:#0f172a;
classDef highlight fill:#e9d5ff,stroke:#7c3aed,stroke-width:2px,color:#4c1d95;
class laptop,target neutral
class router highlightOne public IP fronts many private hosts by multiplexing on the source port. NAT breaks when two sides both need to initiate — see WebRTC in §4c.
CIDR notation, quick reference
192.168.1.0/24 means the first 24 bits are the network prefix, leaving 8 bits = 256 addresses (minus 2 for network + broadcast). Memorize these edge cases:
| Prefix | Size | Common use |
|---|---|---|
/32 | 1 address | single host |
/24 | 256 | small subnet, home LAN |
/16 | 65,536 | corp subnet |
/8 | 16 M | legacy class A (10.0.0.0/8 private) |
/0 | all | default route |
RFC 1918 private ranges (never routable on the public internet): 10.0.0.0/8, 172.16.0.0/12, 192.168.0.0/16. 169.254.0.0/16 is link-local (what you get if DHCP fails). 127.0.0.0/8 is loopback.
3. Transport Layer
IP gets a packet to a host. The transport layer gets it to the right program on that host, using a 16-bit port number. It also decides whether the stream is reliable, ordered, and flow-controlled (TCP) or fire-and-forget (UDP). Everything above this layer — HTTP, gRPC, DNS, SMTP — is just a convention layered on top of one of these two.
TCP: the 3-way handshake
Before any data flows, TCP opens a connection by exchanging three segments. Each side picks a random initial sequence number (ISN) to defend against blind spoofing, and each side acks the other’s ISN + 1.
sequenceDiagram
autonumber
participant C as 🖥️ Client
participant S as 🌐 Server
Note over C: state: CLOSED
Note over S: state: LISTEN
rect rgb(219, 234, 254)
C->>S: SYN · seq=x
activate S
Note over C: SYN_SENT
Note over S: SYN_RCVD
S-->>C: SYN·ACK · seq=y, ack=x+1
C->>S: ACK · ack=y+1
deactivate S
end
Note over C,S: ✅ ESTABLISHED — 1 full RTT before the first byte of payloadThree segments, one RTT. Motivates keep-alive, connection pooling, and HTTP/2 multiplexing.
Two interview gotchas on the handshake:
- SYN flood. If the server commits memory on every received
SYN, an attacker can flood them and exhaust connection tables. The fix is SYN cookies — encode the connection state in the ACK sequence number and allocate memory only after the client’s finalACKproves it came from the real source. - Half-open connection. If the client crashes after the 3-way handshake, the server has no idea. Keep-alive probes (TCP or application-level) exist to detect this; defaulting to “forever-idle sockets are fine” is wrong.
Reliability, built from primitives
TCP is a reliable, ordered, byte stream. It achieves that on top of an unreliable IP layer with four mechanisms layered on each other:
- Sequence numbers on every byte. The receiver reorders out-of-order segments into a contiguous stream.
- Cumulative ACKs.
ack = Nmeans “I have received everything up to byteN-1.” - Retransmission on timeout. The sender keeps a running estimate of RTT. If no ACK arrives within
RTO, resend. - Fast retransmit. If the sender gets three duplicate ACKs (
ack = Kfour times), it infers a single segment was lost and resends without waiting for the RTO.
The sender does not send one segment at a time — it keeps a whole window of bytes in-flight:
flowchart LR
B1["1"]:::acked --- B2["2"]:::acked --- B3["3"]:::acked
B3 ==>|"send base"| B4
B4["4"]:::flight --- B5["5"]:::flight --- B6["6"]:::flight --- B7["7"]:::flight
B7 ==>|"next byte to send"| B8
B8["8"]:::avail --- B9["9"]:::avail
B9 ===|"window edge"| B10
B10["10"]:::blocked --- B11["11"]:::blocked
subgraph legend ["Legend"]
direction LR
L1[" acked "]:::acked
L2[" in-flight (unacked) "]:::flight
L3[" can send now "]:::avail
L4[" blocked (beyond window) "]:::blocked
end
classDef acked fill:#e5e7eb,stroke:#6b7b9a,color:#475569;
classDef flight fill:#93c5fd,stroke:#3b6fd6,stroke-width:1.5px,color:#0f172a;
classDef avail fill:#f1f5f9,stroke:#6b7b9a,color:#475569;
classDef blocked fill:#fafafa,stroke:#6b7280,stroke-dasharray:3 3,color:#6b7280;The sender can have window = min(cwnd, rwnd) bytes in-flight without waiting for ACKs. Every ACK slides the window right; every loss shrinks it.
Flow control vs congestion control
These sound alike but answer different questions, and interviewers will test whether you know the difference.
| Flow control | Congestion control | |
|---|---|---|
| Protects | the receiver | the network |
| Lives on | the receiver advertises rwnd | the sender computes cwnd |
| Signal | receiver’s buffer space, sent back in every ACK | packet loss + RTT trends |
| Classic algorithm | simple: rwnd in header | Reno / CUBIC / BBR |
Slow start + AIMD in one line each.
- Slow start: new connection,
cwnddoubles every RTT until loss. - AIMD (Reno / CUBIC after slow start): Additive Increase, Multiplicative Decrease. On each ACK,
cwnd += 1/cwnd. On loss,cwnd /= 2.
BBR (Google’s newer algorithm, used on YouTube + many CDNs) breaks from AIMD entirely — it models the path’s bandwidth and RTT directly and ignores loss as the primary signal. Worth mentioning when the interviewer asks about modern TCP behavior under shallow-buffered bufferbloat networks.
The states an application actually touches
stateDiagram-v2
direction TB
[*] --> CLOSED
CLOSED --> SYN_SENT: active open<br/>send SYN
CLOSED --> LISTEN: passive open
SYN_SENT --> ESTABLISHED: recv SYN-ACK<br/>send ACK
LISTEN --> SYN_RCVD: recv SYN<br/>send SYN-ACK
SYN_RCVD --> ESTABLISHED: recv ACK
ESTABLISHED --> FIN_WAIT_1: close()<br/>send FIN
ESTABLISHED --> CLOSE_WAIT: recv FIN<br/>send ACK
FIN_WAIT_1 --> FIN_WAIT_2: recv ACK
FIN_WAIT_2 --> TIME_WAIT: recv FIN<br/>send ACK
TIME_WAIT --> CLOSED: 2 × MSL timeout
CLOSE_WAIT --> LAST_ACK: close()<br/>send FIN
LAST_ACK --> CLOSED: recv ACK
classDef good fill:#dcfce7,stroke:#16a34a,stroke-width:2px,color:#14532d;
classDef closing fill:#fef3c7,stroke:#ca8a04,stroke-width:1.5px,color:#713f12;
classDef terminal fill:#f3f4f6,stroke:#6b7280,stroke-width:1.5px,color:#1f2937;
class ESTABLISHED good
class FIN_WAIT_1,FIN_WAIT_2,CLOSE_WAIT,LAST_ACK,TIME_WAIT closing
class CLOSED terminalSimplified TCP state machine. Left branch = active side (initiator). Right branch = passive side (usually the server). Full spec has 11 states; these are the ones you'll reference in a debugging story.
Why TIME_WAIT matters. After an active close, the initiator stays in TIME_WAIT for 2 × MSL (typically 60 s on Linux) before the quadruple (src-ip, src-port, dst-ip, dst-port) can be reused. On a server that opens many outbound connections (a payment gateway, a scraper) this can exhaust ephemeral ports. Mitigations: enable SO_REUSEADDR / net.ipv4.tcp_tw_reuse, use connection pooling, or reduce churn.
UDP: 8 bytes and a shrug
UDP’s header is the minimum viable protocol.
What UDP doesn’t do: no handshake, no ordering, no retransmit, no flow or congestion control. What it does do: stay out of your way. That makes it the substrate for DNS (usually 1 packet round-trip, no need for a connection), video/voice (loss is fine, reordering is worse than drop), game state (“where is the tank now” is more useful than “where was it 200 ms ago”), and, since QUIC, modern HTTP itself.
TCP vs UDP, side by side
| TCP | UDP | |
|---|---|---|
| Connection | handshake before data | none |
| Header | 20 B minimum, up to 60 B | 8 B |
| Ordering | guaranteed | app’s problem |
| Reliability | guaranteed | app’s problem |
| Flow control | yes (rwnd) | no |
| Congestion control | yes (CUBIC / BBR / …) | no (app must be well-behaved) |
| Latency floor | 1 RTT + slow start | 1 one-way trip |
| Typical users | HTTP/1.1, HTTP/2, gRPC, SSH, DB | DNS, QUIC (HTTP/3), games, voice, video |
Head-of-line blocking, the reason HTTP/3 exists
TCP’s ordering guarantee has a dark side. If segment 5 is dropped, segments 6–10 that did arrive must sit in the kernel’s receive buffer until segment 5 is retransmitted and filled in. Everything above TCP — including independent HTTP/2 streams — has to wait. This is head-of-line (HOL) blocking at the transport layer.
HTTP/3 fixes this by building on QUIC (which rides on UDP) and doing its own stream-level ordering: one lost packet stalls only its stream, not all the concurrent streams on the same connection. More on this in §4a.
Decision rubric
Pick TCP when the correctness of the byte stream matters more than the 1-RTT cost and you’re OK waiting on retransmits:
- HTTP/1.1, HTTP/2, gRPC-over-HTTP/2
- SSH, SQL wire protocols, Kafka
- File transfer / replication
Pick UDP when you can tolerate loss or you need to own reordering yourself:
- DNS queries (1 packet, fits in MTU, retry at app level)
- QUIC / HTTP/3
- Real-time media (WebRTC, VoIP, game state)
- Multicast / broadcast (TCP is strictly 1-to-1)
Go snippets
TCP client with sane timeouts. The default net.Dial has no timeout and will happily hang forever.
package main
import ( "net" "time")
func openConn() (net.Conn, error) { d := net.Dialer{ Timeout: 3 * time.Second, // connect timeout KeepAlive: 30 * time.Second, // TCP keep-alive probes } conn, err := d.Dial("tcp", "example.com:443") if err != nil { return nil, err } // Deadlines for read/write, refreshed per operation. _ = conn.SetDeadline(time.Now().Add(10 * time.Second)) return conn, nil}UDP echo server. The read loop is packet-oriented, not stream-oriented — one ReadFromUDP returns exactly one datagram. Framing is the app’s job.
package main
import ( "log" "net")
func main() { addr, _ := net.ResolveUDPAddr("udp", ":9000") conn, err := net.ListenUDP("udp", addr) if err != nil { log.Fatal(err) } defer conn.Close()
buf := make([]byte, 2048) // 1 datagram at a time for { n, peer, err := conn.ReadFromUDP(buf) if err != nil { log.Printf("read: %v", err) continue } // No framing guarantees: 'n' bytes are one logical message. if _, err := conn.WriteToUDP(buf[:n], peer); err != nil { log.Printf("write to %s: %v", peer, err) } }}Gotchas interviewers love
- Nagle’s algorithm. TCP coalesces small writes to reduce packet overhead, which interacts badly with
TCP_ACK-delayed receivers. Latency-sensitive apps (interactive SSH, game clients) setTCP_NODELAYto disable it. TIME_WAITexhaustion. A high-churn outbound client (think: aggressive HTTP client with no keep-alive) can run out of ephemeral source ports. Reuse connections or bumpip_local_port_range.- MTU / PMTUD blackhole. If a middlebox drops ICMP “fragmentation needed” messages, the sender never learns to shrink its packets and the connection stalls on any segment larger than the path MTU. Common cause of “works on my laptop, times out on corp VPN.”
- UDP amplification DDoS. A spoofed 50-byte DNS query can return a 3,000-byte response. Open resolvers and misconfigured NTP / memcached servers are classic reflectors. If you build a UDP service, cap the reply size and rate-limit per source.
4. Application Layer
Above the transport layer, protocols express what the conversation is about. They’re organized by purpose, not by position in a stack: HTTP is a request/response protocol, SMTP is a mail protocol, gRPC is an RPC system. This section is split into three groups of the ones an interviewer will actually probe — 4a. HTTP family (with TLS), 4b. API styles: REST vs GraphQL vs gRPC, and 4c. Real-time: SSE vs WebSocket vs WebRTC.
4a. HTTP / HTTPS / HTTP/2 / HTTP/3
HTTP is the protocol of the web — a stateless, text-based request/response protocol on TCP port 80 (or 443 with TLS). “Stateless” is the operative word: every request is self-contained from the server’s point of view. State (sessions, auth) lives in cookies, headers, or the database, not the protocol.
Anatomy of a request
GET /users/42?include=orders HTTP/1.1Host: api.example.comAuthorization: Bearer eyJhbGciOi...Accept: application/jsonAccept-Encoding: gzip, brIf-None-Match: "a3f7b9"Connection: keep-aliveEach line is Header: Value. Blank line separates headers from body. The server responds with a status line, headers, and body:
HTTP/1.1 200 OKContent-Type: application/jsonContent-Length: 847ETag: "a3f7b9"Cache-Control: private, max-age=60Connection: keep-alive
{"id":42,"name":"Ada","orders":[...]}Status codes you must know cold
| Range | Meaning | Must-know examples |
|---|---|---|
| 1xx | informational | 101 Switching Protocols (WebSocket upgrade) |
| 2xx | success | 200 OK · 201 Created · 204 No Content · 206 Partial Content |
| 3xx | redirect / cache | 301 Moved Permanently · 302 Found · 304 Not Modified · 307/308 (preserve method) |
| 4xx | client error | 400 Bad Request · 401 Unauthorized · 403 Forbidden · 404 Not Found · 409 Conflict · 422 Unprocessable · 429 Too Many Requests |
| 5xx | server error | 500 Internal · 502 Bad Gateway · 503 Unavailable · 504 Gateway Timeout |
Easy-to-mix-up pair: 401 means “I don’t know who you are” — re-auth and retry. 403 means “I know who you are, and you can’t.”
Idempotency matters
| Method | Idempotent | Safe (read-only) | Body | Cacheable |
|---|---|---|---|---|
| GET | ✅ | ✅ | no | ✅ |
| HEAD | ✅ | ✅ | no | ✅ |
| OPTIONS | ✅ | ✅ | no | — |
| PUT | ✅ | ❌ | yes | ❌ |
| DELETE | ✅ | ❌ | — | ❌ |
| POST | ❌ | ❌ | yes | rarely |
| PATCH | ❌ | ❌ | yes | ❌ |
The interview gotcha is POST is not idempotent — if the client retries a POST because it didn’t see the response, it can create the same order twice. The canonical fix is an idempotency key: a client-generated unique string the server dedupes on. Stripe, AWS, and every payment-adjacent API does this.
Caching, briefly
HTTP caching is coordinated between server, browser, and intermediaries (CDN, proxies). The knobs:
Cache-Control: public, max-age=3600— how long, where.ETag: "..."/Last-Modified: ...— identity of the current version. Client sendsIf-None-Match/If-Modified-Sinceon revalidation; server replies304 Not Modifiedand no body.Vary: Accept-Encoding, Authorization— split cache entries by these request headers.
CDNs use the same primitives — §6 covers the edge story.
HTTPS: HTTP + TLS
HTTPS is HTTP encrypted with TLS. The TLS handshake runs once when the connection is established, producing symmetric keys for the rest of the connection.
sequenceDiagram
autonumber
participant C as 🔐 Client
participant S as 🌐 Server
rect rgb(219, 234, 254)
Note over C,S: TLS 1.3 — fresh handshake (1 RTT)
C->>+S: ClientHello<br/>(supported ciphers · key share · SNI)
S-->>-C: ServerHello · cert · Finished<br/>(picks cipher · sends its key share)
Note over C,S: 🔑 both sides derive the session key
C->>S: Finished (encrypted)
C->>+S: HTTP GET / (encrypted)
S-->>-C: HTTP 200 OK (encrypted)
end
rect rgb(220, 252, 231)
Note over C,S: TLS 1.3 resumption — 0 RTT (early data)
C->>+S: ClientHello + <b>early data</b> (encrypted with PSK)
S-->>-C: ServerHello + response (no handshake wait)
Note right of S: ⚠️ early data is <br/>replayable — don't use for<br/>state-changing requests
endWhat TLS gives you — three things interviewers will ask:
- Confidentiality — AEAD ciphers (AES-GCM, ChaCha20-Poly1305) encrypt the payload.
- Integrity — the same AEAD MAC detects tampering.
- Authentication — the server’s cert, signed by a CA the client trusts, proves you’re talking to
example.com, not an attacker.
Common interview probes:
- Why is TLS 1.3 faster than 1.2? One RTT vs two. TLS 1.2 separated key-exchange from Finished; TLS 1.3 combines them and removes obsolete ciphers.
- What is SNI? Server Name Indication — the hostname the client is asking for, sent unencrypted in ClientHello. Lets one IP host multiple certs. Encrypted Client Hello (ECH) fixes the leak.
- 0-RTT trade-off? On resumed sessions, the client can send data in its first flight. Great for latency; the data is replayable if the attacker captures and retransmits it. Don’t use 0-RTT for state-changing requests.
HTTP/2: binary, multiplexed, one connection
HTTP/1.1 opens one TCP connection per in-flight request (browsers cap ~6 per origin). HTTP/2 keeps a single connection and multiplexes many streams over it.
graph LR
subgraph H11 ["🐢 HTTP/1.1 — many TCP connections, head-of-line serial"]
direction TB
C1["TCP #1<br/>GET /index.html"]
C2["TCP #2<br/>GET /app.js"]
C3["TCP #3<br/>GET /style.css"]
C4["TCP #4<br/>GET /logo.png"]
end
subgraph H2 ["🚀 HTTP/2 — one connection, multiplexed streams"]
direction TB
T(["1 TCP + TLS"])
T --> S1["stream 1<br/>/index.html"]
T --> S2["stream 3<br/>/app.js"]
T --> S3["stream 5<br/>/style.css"]
T --> S4["stream 7<br/>/logo.png"]
end
classDef old fill:#fee2e2,stroke:#dc2626,stroke-width:1.5px,color:#7f1d1d;
classDef new fill:#dcfce7,stroke:#16a34a,stroke-width:1.5px,color:#14532d;
classDef hub fill:#dbeafe,stroke:#3b6fd6,stroke-width:2px,color:#0f172a;
class C1,C2,C3,C4 old
class S1,S2,S3,S4 new
class T hubKey improvements over 1.1:
- Binary framing. Every message is split into
DATA/HEADERS/SETTINGSframes. Cheap to parse, no ambiguity. - Multiplexing. Many concurrent streams on one connection. No head-of-line blocking at the HTTP layer.
- HPACK header compression. Redundant headers (cookies, UA,
Host) are table-indexed instead of resent. Huge win for short requests. - Server push. Server can pre-send assets it knows the client will need. (Deprecated by most browsers — misused more often than helpful.)
- Stream priorities. Clients can weight streams; used to deliver CSS/JS before images.
The catch. HTTP/2 multiplexing lives above TCP. If one packet is lost, TCP stalls delivery of all streams until retransmission arrives — HOL blocking at the transport layer, exactly the problem we flagged in §3.
HTTP/3: HTTP over QUIC over UDP
HTTP/3 replaces TCP with QUIC, a transport protocol built on UDP that combines “TCP semantics + TLS 1.3” into one unified handshake with independent streams:
| HTTP/1.1 | HTTP/2 | HTTP/3 | |
|---|---|---|---|
| Transport | TCP | TCP | QUIC (UDP) |
| Security | optional TLS | TLS mandatory (practice) | TLS 1.3 mandatory, integrated |
| Framing | text | binary frames | binary frames |
| Multiplexing | no (multiple TCP) | yes (1 TCP) | yes (QUIC streams) |
| Connection open | TCP + TLS = 2-3 RTT | TCP + TLS = 2-3 RTT | 1 RTT (0 RTT on resumption) |
| Head-of-line blocking | yes | yes, at TCP layer | no — per-stream loss |
| Connection migration | no (IP change breaks it) | no | yes (connection ID) |
| Deployed by | Everything | ~70% of web | CDNs + big sites, growing |
Connection migration is the underrated killer feature: QUIC identifies a connection by an ID in the header, not by the 4-tuple. Your phone switches from Wi-Fi to 5G, the IP changes, TCP would reset — QUIC just keeps going.
Go http.Client: the timeouts you must set
The zero-value http.Client{} has no timeout. A single slow server can hang your entire service. Always configure:
package main
import ( "context" "net" "net/http" "time")
// prodClient is a reusable, properly-bounded HTTP client.// One per process — it pools connections internally.var prodClient = &http.Client{ Timeout: 10 * time.Second, // total request budget, covers retries-internal Transport: &http.Transport{ DialContext: (&net.Dialer{ Timeout: 3 * time.Second, // TCP connect KeepAlive: 30 * time.Second, }).DialContext, TLSHandshakeTimeout: 3 * time.Second, ResponseHeaderTimeout: 5 * time.Second, ExpectContinueTimeout: 1 * time.Second, IdleConnTimeout: 90 * time.Second, MaxIdleConns: 100, MaxIdleConnsPerHost: 10, // bump for high-throughput clients ForceAttemptHTTP2: true, },}
func fetch(ctx context.Context, url string) (*http.Response, error) { // Prefer request-level context over client Timeout when the deadline // must propagate across service boundaries. req, err := http.NewRequestWithContext(ctx, http.MethodGet, url, nil) if err != nil { return nil, err } req.Header.Set("Accept", "application/json") req.Header.Set("User-Agent", "myservice/1.0") return prodClient.Do(req)}Three timeouts in this snippet and why each exists:
Dialer.Timeout— how long TCP connect can take. Defends against unreachable hosts.TLSHandshakeTimeout— how long TLS can take after connect.ResponseHeaderTimeout— how long the server can take to send status + headers. A slow backend blocking here looks like a hung request — this bounds it without cutting off a legitimately large streaming body.
Bonus: request-level deadlines. Prefer ctx, cancel := context.WithTimeout(parent, 800*time.Millisecond) at the call site over mutating the client — the deadline then propagates cleanly through gRPC/HTTP/database layers downstream.
Interview gotchas for §4a
Connection: closevs keep-alive. HTTP/1.0 closed by default. HTTP/1.1 keep-alive by default. Servers that emitConnection: closeon every response will cripple your client’s connection pool.- Cookie scoping.
Domain=example.comincludes subdomains.Securerestricts to HTTPS.HttpOnlyhides from JS.SameSite=Laxis the sane default to block CSRF. - Redirect traps.
301is cached aggressively. If you deploy301 /old → /newand later change your mind, clients may never retry. Use302or307during rollouts. Content-LengthvsTransfer-Encoding: chunked. A response has exactly one. If a reverse proxy (nginx, HAProxy) buffers a chunked response to addContent-Length, latency on streaming endpoints goes up. Turn buffering off at the proxy for SSE/gRPC-web.- HTTP/2 with self-signed certs. Go’s
http.Transportdisables HTTP/2 if you setTLSClientConfig.InsecureSkipVerify = truewithout also settingNextProtos = []string{"h2", "http/1.1"}. Debugging headache at 2am.
4b. REST vs GraphQL vs gRPC
Three API styles, three different philosophies. The interviewer usually isn’t asking “which is best” — they want to see you reason about the trade-off for the problem at hand.
| REST | GraphQL | gRPC | |
|---|---|---|---|
| Transport | HTTP/1.1 or 2 | HTTP POST (usually /graphql) | HTTP/2 |
| Serialization | JSON (usually) | JSON | Protobuf (binary) |
| Schema | OpenAPI (optional) | SDL (required) | .proto (required) |
| Endpoint shape | many resource URLs | single endpoint | RPC methods per service |
| Who picks the fields? | server | client | server |
| Over-/under-fetching | easy to hit | solved | solved per method |
| Streaming | chunked / SSE | subscriptions (via WS) | native: server / client / bidi |
| Browser-friendly | yes | yes | no (needs gRPC-Web or Connect) |
| Tooling | curl, Postman, every lang | any GraphQL client | protoc + code generation |
| Caching | HTTP cache works out of the box | hard; client-side libs (Relay, Apollo) | none built-in; app layer |
| Best fit | public APIs, CRUD, docs-as-product | many clients, varied views on same data | internal microservices, high-throughput |
The over-fetching / under-fetching problem
The single clearest argument for GraphQL. Imagine a mobile screen that needs user name, last order ID, and unread notification count.
flowchart TB
Need(["📱 Mobile needs: <b>name</b> · <b>lastOrderId</b> · <b>unreadCount</b>"])
subgraph REST ["🔁 REST — 3 round trips"]
direction TB
R1["GET /users/42<br/><i>returns 20 fields, keeps 1</i>"]
R2["GET /users/42/orders?limit=1<br/><i>returns full Order, keeps 1 id</i>"]
R3["GET /users/42/notifications/unread-count"]
R1 --> R2 --> R3
end
subgraph GQL ["⚡ GraphQL — 1 round trip, exact shape"]
direction TB
G1["POST /graphql<br/>user(id: 42) { name, lastOrder { id }, unreadCount }"]
end
subgraph GRPC ["🛰️ gRPC — 1 round trip, server-defined shape"]
direction TB
P1["UserService.GetProfile(id=42)<br/><i>server-defined aggregate method</i>"]
end
Need --> REST
Need --> GQL
Need --> GRPC
classDef need fill:#fef3c7,stroke:#d97706,stroke-width:2px,color:#78350f;
classDef rest fill:#fee2e2,stroke:#dc2626,stroke-width:1.5px,color:#7f1d1d;
classDef gql fill:#dcfce7,stroke:#16a34a,stroke-width:1.5px,color:#14532d;
classDef grpc fill:#e0e7ff,stroke:#4f46e5,stroke-width:1.5px,color:#312e81;
class Need need
class R1,R2,R3 rest
class G1 gql
class P1 grpcREST gets you the JSON but wastes bandwidth (orders contains 20 fields you don’t need) or demands many round-trips. GraphQL lets the client ask for exactly what it wants. gRPC solves it too, but by defining a server-side aggregate method — if the mobile and web teams want different shapes you end up with GetProfileForWeb and GetProfileForMobile, which is fine for a handful of clients but doesn’t scale like GraphQL does.
REST, done well
The mental model is resources (nouns) acted on by HTTP methods (verbs). URLs are hierarchical; methods are the verbs; status codes are the outcome.
GET /v1/users listGET /v1/users/42 readPOST /v1/users createPUT /v1/users/42 replace (idempotent)PATCH /v1/users/42 partial updateDELETE /v1/users/42 deleteGET /v1/users/42/orders sub-resourceConventions that save you from bike-shed arguments:
- Cursor pagination —
?cursor=eyJpZCI6Mjc...&limit=50. Not offset/limit — that’s O(N) on the DB. - Filter/sort as query params —
?status=active&sort=-createdAt. - Versioning — keep it in the path (
/v1/,/v2/). Header-based versioning is clever and will bite you during debugging. - Errors — RFC 7807
application/problem+json:{"type":"...", "title":"...", "detail":"...", "instance":"..."}. Stop inventing shapes.
HATEOAS (hypermedia links in responses) is the theoretically pure form but is almost never shipped. If the interviewer asks, explain what it is and note most production APIs don’t bother.
GraphQL: one endpoint, client-shaped responses
The server publishes a typed schema; the client sends a query describing the shape it wants; the runtime walks the query and calls resolvers to fetch each field.
type User { id: ID! name: String! email: String! orders(limit: Int = 10): [Order!]! unreadCount: Int!}
type Order { id: ID! total: Money! items: [LineItem!]!}
type Query { user(id: ID!): User}Client sends:
query Profile($id: ID!) { user(id: $id) { name orders(limit: 1) { id total { amount currency } } unreadCount }}The famous trap — N+1 queries. The orders resolver is called once per User. If you’re listing 50 users and blindly loop, that’s 50 DB round-trips. The fix is a DataLoader — batches + caches per-request:
// pseudo-Go: inside an HTTP request-scoped loaderloader := dataloader.New(func(ctx context.Context, ids []int) []*Order { // one SQL: SELECT ... WHERE user_id = ANY($1) return db.OrdersByUserIDs(ctx, ids)})// Each resolver call just does: loader.Load(userID) → coalesced into 1 queryOther GraphQL-isms to have an answer for:
- Mutations are a separate root type; they execute sequentially (not parallel like
Queryfields). - Subscriptions push server → client; transport is usually WebSocket.
- Persisted queries — client registers queries at build time; at runtime it only sends the query ID. Saves bandwidth, forbids arbitrary queries, defuses the “malicious client writes an expensive query” attack.
- Caching is the hardest part. Apollo Client normalizes objects by
__typename + idclient-side; server-side is usually cache-miss territory unless you’re doing persisted queries + HTTP cache headers.
gRPC: typed RPCs over HTTP/2
You write a .proto, the compiler generates typed client + server stubs in every language you use.
syntax = "proto3";package user.v1;
service UserService { rpc GetProfile(GetProfileRequest) returns (UserProfile); rpc WatchProfile(GetProfileRequest) returns (stream UserProfile); // server stream rpc ImportUsers(stream UserInput) returns (ImportSummary); // client stream rpc Chat(stream ChatMessage) returns (stream ChatMessage); // bidirectional}
message GetProfileRequest { string user_id = 1; }
message UserProfile { string user_id = 1; string name = 2; string email = 3; int32 unread_count = 4;}Go server, unary method:
type userServer struct { pb.UnimplementedUserServiceServer db *sql.DB}
func (s *userServer) GetProfile( ctx context.Context, req *pb.GetProfileRequest,) (*pb.UserProfile, error) { // context carries the client's deadline + cancellation + metadata var p pb.UserProfile err := s.db.QueryRowContext(ctx, `SELECT user_id, name, email, unread_count FROM users WHERE user_id=$1`, req.GetUserId(), ).Scan(&p.UserId, &p.Name, &p.Email, &p.UnreadCount) if err == sql.ErrNoRows { return nil, status.Errorf(codes.NotFound, "user %s not found", req.GetUserId()) } if err != nil { return nil, status.Errorf(codes.Internal, "db: %v", err) } return &p, nil}Go client call, with deadline:
conn, err := grpc.NewClient("user-svc:50051", grpc.WithTransportCredentials(insecure.NewCredentials()))if err != nil { return err }defer conn.Close()
client := pb.NewUserServiceClient(conn)ctx, cancel := context.WithTimeout(ctx, 300*time.Millisecond)defer cancel()
profile, err := client.GetProfile(ctx, &pb.GetProfileRequest{UserId: "42"})The four streaming modes — interviewers love this picture:
sequenceDiagram
autonumber
participant C as 🖥️ Client
participant S as 🛰️ Server
rect rgb(219, 234, 254)
Note over C,S: 1️⃣ Unary — classic request / response
C->>+S: GetProfile(id=42)
S-->>-C: UserProfile
end
rect rgb(220, 252, 231)
Note over C,S: 2️⃣ Server streaming — one request, many responses
C->>+S: WatchProfile(id=42)
S-->>C: UserProfile v1
S-->>C: UserProfile v2
S-->>-C: UserProfile v3 ...
end
rect rgb(254, 243, 199)
Note over C,S: 3️⃣ Client streaming — many requests, one summary
C->>+S: ImportUsers (user_1)
C->>S: ImportUsers (user_2)
C->>S: ImportUsers (user_3)
S-->>-C: ImportSummary (count=3)
end
rect rgb(237, 214, 255)
Note over C,S: 4️⃣ Bidirectional — full-duplex, interleaved
C->>+S: Chat (hello)
S-->>C: Chat (hi)
C->>S: Chat (how are you)
S-->>-C: Chat (good)
endWhy gRPC wins for internal microservices:
- Protobuf is small (1.5-5× smaller than JSON on the wire) and fast to marshal.
- HTTP/2 multiplexing + long-lived connections → low latency + good head-of-line story within a service mesh.
context.Context— deadlines and cancellations propagate across service hops out of the box.- Status codes are a closed set (
codes.NotFound,codes.DeadlineExceeded, …), not free-form strings.
Where gRPC hurts:
- Browsers can’t speak gRPC natively (needs trailers HTTP/2 makes awkward in the browser). Solutions: gRPC-Web (proxy translates) or Connect (gRPC-compatible, works over HTTP/1.1 too).
- Bigger learning curve — protoc toolchain, codegen in every language, backward-compat discipline (
reserved, field numbers never reused). - Observability is trickier than REST (no URL pattern in logs; need OTel/tracing from day one).
Decision rubric
Pick REST when:
- The API is public or consumed by many external devs.
- Humans browse it (Postman, curl, docs).
- You want the HTTP cache to do real work (CDN, browser).
- CRUD on resources is most of what you’re doing.
Pick GraphQL when:
- Many clients (web, iOS, Android) need different slices of the same underlying data graph.
- Backends-for-frontends would otherwise proliferate.
- The team can invest in schema review, DataLoader discipline, and persisted-query infra.
Pick gRPC when:
- Internal service-to-service traffic, especially polyglot teams (Go + Python + Java).
- Strong typing across languages is worth the toolchain cost.
- Streaming or low-latency RPC is a first-class need.
- You have a service mesh, tracing, and observability to absorb the operational tax.
Interview gotchas for §4b
- REST isn’t an RFC. There’s no committee defining “correct REST.” Different teams mean different things. Lead with your own definition.
- GraphQL security surface. A single expressive query can be a DDoS primitive — one field can resolve into 10,000 DB reads. Production deployments need query depth limits, query cost analysis, persisted queries, and rate-limiting by user, not by endpoint.
- gRPC deadline inheritance. If a
GetProfilehandler calls three downstream services and just passes along the same context, the slowest of the three sees the full 300ms. Budgets out: you need to subtract expected work from each leg (or at least be intentional about it). - Version drift in protobuf. Once a
.protois deployed, you cannot reuse a field number.reserved 7, 9;prevents someone from re-assigning later. Forgetting this breaks wire compatibility silently. - “Why not just JSON-RPC?” — a valid interview probe. JSON-RPC is lighter-weight than gRPC but lacks streaming, codegen, and HTTP/2 flow-control. Fine for a small internal tool, not for a service mesh.
4c. SSE vs WebSocket vs WebRTC
Three ways to do “real-time.” The interviewer wants to see you pick the simplest one that solves the problem — not the coolest.
Start with the decision tree
flowchart TD
Q(["Do you need <b>real-time</b> updates?"])
D1{"Direction?"}
D2{"Payload type?"}
D3{"Latency bound?"}
POLL(["Long-polling / plain HTTP is fine ✅"])
SSE(["<b>SSE</b><br/>server → client<br/>text only, auto-reconnect"])
WS(["<b>WebSocket</b><br/>full-duplex<br/>text or binary"])
RTC(["<b>WebRTC</b><br/>peer-to-peer<br/>sub-100ms media + data"])
Q --> |no, updates can wait seconds| POLL
Q --> |yes| D1
D1 --> |server → client only| SSE
D1 --> |both directions| D2
D2 --> |text / structured| WS
D2 --> |audio / video / low-latency data| D3
D3 --> |p2p · every ms matters| RTC
D3 --> |OK with a server relay| WS
classDef question fill:#fef3c7,stroke:#d97706,stroke-width:2px,color:#78350f;
classDef answer fill:#dcfce7,stroke:#16a34a,stroke-width:1.5px,color:#14532d;
classDef fallback fill:#f3f4f6,stroke:#6b7280,stroke-width:1.5px,color:#1f2937;
class Q,D1,D2,D3 question
class SSE,WS,RTC answer
class POLL fallbackOne-line rule of thumb: SSE for feeds, WebSocket for chat, WebRTC for media.
SSE — the boring answer that’s often right
Server-Sent Events is just an HTTP response that never ends. The server holds the connection open and writes data: ...\n\n chunks whenever it has something to say. The browser has a built-in EventSource API that does reconnect + last-event-id for you.
sequenceDiagram
autonumber
participant C as 🖥️ Browser
participant S as 🌐 Server
rect rgb(219, 234, 254)
Note over C,S: ① Open the stream — one HTTP request, never-ending response
C->>+S: GET /stream · Accept: text/event-stream
S-->>C: 200 OK · Content-Type: text/event-stream<br/>Cache-Control: no-store · Connection: keep-alive
end
rect rgb(220, 252, 231)
Note over C,S: ② Server pushes events whenever it wants
S-->>C: event: price · data: {BTC: 67123}
S-->>C: event: price · data: {BTC: 67140}
S-->>C: 💓 : heartbeat (comment line keeps proxies awake)
S-->>-C: event: price · data: {BTC: 67089}
end
rect rgb(254, 226, 226)
Note right of C: ⚠️ connection drops
end
rect rgb(254, 243, 199)
Note over C,S: ③ EventSource auto-reconnects with Last-Event-ID
C->>S: GET /stream · Last-Event-ID: 42
endWhy it’s underrated:
- It’s just HTTP — CDNs, proxies, auth cookies, browser DevTools, all already work.
EventSourcehandles reconnect + backoff automatically.id:+Last-Event-IDgives you exactly-once replay for free if the server indexes by ID.
Where it bites:
- Unidirectional — for an upstream “ack,” use a second
POSTendpoint. Awkward if you need true bi-di. - Text only — send JSON, not binary. Encode binary if you must.
- HTTP/1.1 6-connection limit per origin. If you open SSE on 7 tabs of your app, the 7th hangs. Fix: use HTTP/2 (same-origin streams are multiplexed).
- Proxy buffering. Nginx / CDNs love to buffer responses. Disable per-route (
proxy_buffering off;,X-Accel-Buffering: no) — otherwise clients see nothing until the server flushes 8 KB.
Go handler:
func priceStream(w http.ResponseWriter, r *http.Request) { // Headers SSE spec requires. w.Header().Set("Content-Type", "text/event-stream") w.Header().Set("Cache-Control", "no-store") w.Header().Set("Connection", "keep-alive") // Defeat proxy buffering — crucial for nginx, Cloudflare. w.Header().Set("X-Accel-Buffering", "no")
// The flusher lets us force chunks out immediately. flusher, ok := w.(http.Flusher) if !ok { http.Error(w, "streaming unsupported", http.StatusInternalServerError) return }
// Heartbeat every 15 s so intermediary idle timeouts don't close us. heartbeat := time.NewTicker(15 * time.Second) defer heartbeat.Stop()
updates := subscribePrices(r.Context()) // chan PriceTick
var id int for { select { case <-r.Context().Done(): return // client disconnected case <-heartbeat.C: fmt.Fprint(w, ": ping\n\n") // comment line = keep-alive flusher.Flush() case tick, ok := <-updates: if !ok { return } id++ // 'id:' makes it resumable via Last-Event-ID on reconnect. fmt.Fprintf(w, "id: %d\nevent: price\ndata: %s\n\n", id, tick.JSON()) flusher.Flush() } }}WebSocket — when you need full-duplex
WebSocket upgrades an HTTP connection into a long-lived, full-duplex, message-framed TCP stream. After the upgrade, the two sides exchange binary or text frames in either direction with almost no per-message overhead.
sequenceDiagram
autonumber
participant C as 🖥️ Client
participant S as 🌐 Server
rect rgb(219, 234, 254)
Note over C,S: ① HTTP upgrade — the only HTTP round-trip in a WS session
C->>+S: GET /ws HTTP/1.1<br/>Upgrade: websocket<br/>Sec-WebSocket-Key: x3JJ…<br/>Sec-WebSocket-Version: 13
S-->>-C: 🎉 101 Switching Protocols<br/>Upgrade: websocket<br/>Sec-WebSocket-Accept: hash(key + GUID)
end
rect rgb(220, 252, 231)
Note over C,S: ② Full-duplex frames — either side sends any time
C->>S: TEXT · {action: subscribe, room: 42}
S-->>C: TEXT · {msg: welcome}
S-->>C: 🔢 BINARY · 0xCAFE…
end
rect rgb(254, 243, 199)
Note over C,S: ③ Keep-alive control frames
C->>S: PING
S-->>C: PONG
end
rect rgb(254, 226, 226)
Note over C,S: ④ Graceful close
C->>S: CLOSE (1000 normal)
S-->>C: CLOSE (ack)
endHighlights:
101 Switching Protocolsis the magic status code — the server accepts the upgrade, TCP stays open, the protocol changes under it.- Client frames are masked (XOR with a per-frame key) to defeat cache-poisoning attacks on legacy proxies. Server frames are not.
- TEXT frames must be valid UTF-8; BINARY frames don’t. Use BINARY for protobuf, msgpack, images.
- PING/PONG frames are control-plane only. Use them; without keepalive, NAT timers + proxy idle timers (~60-120 s) will close the socket silently.
Go, with gorilla/ws (still the de-facto library even after its archival; gobwas/ws is the zero-alloc alternative):
var upgrader = websocket.Upgrader{ ReadBufferSize: 1024, WriteBufferSize: 1024, CheckOrigin: func(r *http.Request) bool { // Enforce same-origin. WebSocket doesn't respect CORS — // you enforce origin policy yourself here. return r.Header.Get("Origin") == "https://app.example.com" },}
func wsHandler(w http.ResponseWriter, r *http.Request) { conn, err := upgrader.Upgrade(w, r, nil) if err != nil { return // Upgrade already wrote the error response. } defer conn.Close()
// Ping/pong: send pings every 30s, expect pongs within 60s. conn.SetReadDeadline(time.Now().Add(60 * time.Second)) conn.SetPongHandler(func(string) error { conn.SetReadDeadline(time.Now().Add(60 * time.Second)) return nil })
go func() { t := time.NewTicker(30 * time.Second) defer t.Stop() for range t.C { if err := conn.WriteControl( websocket.PingMessage, nil, time.Now().Add(5*time.Second), ); err != nil { return } } }()
for { msgType, data, err := conn.ReadMessage() if err != nil { return // client gone or deadline exceeded } // Echo server — replace with real routing. if err := conn.WriteMessage(msgType, data); err != nil { return } }}Gotchas:
- No built-in auth. The upgrade request is HTTP, so do auth there (cookie /
Authorizationheader). Some browsers strip theAuthorizationheader on WS upgrades — use a cookie or a token-in-URL. CheckOriginis opt-in. Forget it and you’ve just built CSRF-over-WebSocket.- No request / response semantics. You send frames and hope. Implement a correlation-ID in your JSON envelope if you need req/resp on top.
- Horizontal scaling. WS sockets are sticky to one pod. When pod dies, users reconnect — and may land on a pod with no state. Fan-out via Redis pub/sub or a broker (NATS, Redis Streams, Kafka).
WebRTC — peer-to-peer, media-grade
WebRTC lets two browsers talk directly (peer-to-peer), bypassing your server for the heavy media streams. You still need a server — the signaling server — to help them find each other and exchange connection info.
sequenceDiagram
autonumber
participant A as 👤 Peer A
participant SIG as 📡 Signaling<br/>(your WS server)
participant ST as 🛰️ STUN
participant TN as 🔁 TURN
participant B as 👤 Peer B
rect rgb(219, 234, 254)
Note over A,B: ① Signal (through your server — usually WebSocket)
A->>+SIG: offer (SDP)
SIG->>+B: offer (SDP)
B->>-SIG: answer (SDP)
SIG->>-A: answer (SDP)
end
rect rgb(254, 243, 199)
Note over A,B: ② ICE — discover reachable addresses
A->>+ST: "what's my public IP?"
ST-->>-A: 203.0.113.1:51000
A->>SIG: ICE candidate (host + reflexive)
SIG->>B: (forwarded)
B->>SIG: ICE candidate (its own)
SIG->>A: (forwarded)
end
rect rgb(220, 252, 231)
Note over A,B: ③ Connectivity check — try direct first
A-->>B: STUN binding
B-->>A: STUN binding reply
Note over A,B: ✅ direct path works → done
end
rect rgb(254, 226, 226)
Note over A,B: ④ Fallback: symmetric NAT blocks direct → TURN relay
A-->>TN: relay allocate
TN-->>B: relayed packet
end
rect rgb(233, 213, 255)
Note over A,B: ⑤ Media / data flow — end-to-end encrypted
A-->>B: 🎥 video · 🎤 audio · 📂 data channel<br/>(DTLS-SRTP or SCTP over DTLS, all over UDP)
B-->>A: 🎥 video · 🎤 audio · 📂 data channel
endFour things you must know to pass a WebRTC interview:
- Signaling is your problem. WebRTC doesn’t dictate how the SDP offers/answers get across. People use WebSocket, SSE, long-poll, whatever. Pick one from the earlier part of this section.
- NAT traversal is why this is hard. Most peers are behind NAT (§2). STUN tells a peer its public IP
. TURN relays traffic when STUN can’t produce a workable path (symmetric NAT, strict firewalls). Budget for ~10-20 % of calls needing TURN — and TURN bandwidth is your bill. - ICE (Interactive Connectivity Establishment) is the algorithm that collects and prioritizes candidate addresses (host, server-reflexive, relayed), pings each pair, and picks the best one that works.
- Two flavors of traffic: media streams use DTLS-SRTP (encrypted RTP over UDP). Non-media data uses the DataChannel API, which is SCTP over DTLS over UDP. Both end-to-end encrypted.
When to reach for it (and when not to):
- ✅ Video/audio calls, screen share, live “remote desktop.”
- ✅ Sub-50 ms data — multiplayer games, collaborative tools where every ms shows.
- ❌ Chat. A WebSocket through your server is simpler, cheaper, and easier to moderate.
- ❌ Anything you need to log / record server-side. P2P means you’re not in the path.
Comparison, side by side
| SSE | WebSocket | WebRTC | |
|---|---|---|---|
| Direction | server → client | bi-directional | p2p (both) |
| Transport | HTTP text stream | TCP (post upgrade) | UDP (SCTP / SRTP) |
| Auto-reconnect | ✅ built-in | ❌ DIY | ❌ renegotiate |
| Binary | ❌ (text only) | ✅ | ✅ |
| Auth | cookies / headers | same as HTTP | out of band (signaling) |
| Works through strict proxies | ✅ (it’s HTTP) | mostly | often needs TURN |
| Infra complexity | lowest | medium | highest |
| Sample use cases | stock tickers, log tails, AI token stream | chat, dashboards, collaborative cursors | Zoom, Meet, Discord voice |
Interview gotchas for §4c
- “Why not long-poll?” — a classic warm-up. Long-polling works but each update is a new TCP + TLS handshake. For a dozen updates/sec, SSE/WS are dramatically cheaper.
- Scale the fan-out, not the socket. For 1 M concurrent WebSocket users, the limit isn’t TCP — it’s how fast you can broadcast a message to 1 M sockets. Keep a per-room subscriber index, fan-out via Redis pub/sub or Kafka, pin users to regions.
- AI token streaming. Both SSE and WebSocket work. Most LLM APIs (OpenAI, Anthropic) ship SSE — it’s simpler, and the stream is strictly server → client.
wss://is mandatory in production. Mobile carriers and corporate proxies routinely strip or block plainws://.- WebRTC without a TURN budget is a demo. Your team-coffee prototype works in the office because everyone’s on the same NAT. Real users need TURN, and TURN bandwidth costs real money.
5. Load Balancing
A load balancer is the thing that lets you say “run N copies of my service” instead of “run my service.” It does three jobs at once:
- Horizontal scaling — spread load across replicas.
- Availability — health-check backends, take dead ones out of rotation.
- Deployment flexibility — gate traffic into new versions for blue/green, canary, rolling.
Every system-design interview touches one of these three.
Client-side vs dedicated load balancing
Two fundamentally different architectures, with very different failure modes.
flowchart LR
subgraph CLIENT ["🧭 Client-side LB"]
direction TB
C1["Client<br/>(with registry cache)"]
REG[("Service<br/>registry<br/>(Consul, etcd,<br/>k8s EndpointSlice)")]
B1["Backend A"]
B2["Backend B"]
B3["Backend C"]
C1 -.refresh.-> REG
C1 -->|picks a peer| B1
C1 --> B2
C1 --> B3
end
subgraph DED ["🏗️ Dedicated LB"]
direction TB
C2[Client] --> VIP["Load balancer<br/>(nginx · Envoy · ALB)"]
VIP --> D1[Backend A]
VIP --> D2[Backend B]
VIP --> D3[Backend C]
HC[["health<br/>checks"]] -.active.-> D1
HC -.-> D2
HC -.-> D3
VIP --- HC
end
classDef client fill:#fef3c7,stroke:#d97706,stroke-width:1.5px,color:#78350f;
classDef ded fill:#dbeafe,stroke:#3b6fd6,stroke-width:1.5px,color:#0f172a;
classDef backend fill:#dcfce7,stroke:#16a34a,stroke-width:1.5px,color:#14532d;
classDef infra fill:#e9d5ff,stroke:#7c3aed,stroke-width:1.5px,color:#4c1d95;
class C1,C2 client
class VIP,HC ded
class B1,B2,B3,D1,D2,D3 backend
class REG infra
| Client-side | Dedicated | |
|---|---|---|
| Who picks the backend? | the client library | a box in the middle |
| Extra network hop? | no | yes |
| Failure blast radius | one client affected | LB down = everything affected |
| Health-check work | every client | centralized at the LB |
| Best fit | internal RPC (gRPC, Finagle, mesh sidecars) | HTTP from unknown clients (browsers, mobile) |
| Typical infra | Consul / etcd + client library | ALB / NLB / nginx / Envoy / k8s Service |
Kubernetes Service objects are quietly a distributed dedicated LB: each node’s kube-proxy programs iptables/IPVS rules to DNAT traffic to a healthy pod, so there is no single chokepoint box. That’s a nice hybrid — client code talks to a single VIP, but the data plane is per-node.
L4 vs L7 — the axis you must know cold
| L4 (transport) | L7 (application) | |
|---|---|---|
| Inspects | IP + port + TCP flags | HTTP method, path, headers, cookies, gRPC metadata |
| Routing rules | ”all :443 → pool X" | "/api/v2/* → pool A, cookie beta=1 → pool B” |
| TLS | passthrough (SNI-only) | terminate + re-inspect |
| CPU cost | low | high (parse each request) |
| Connection reuse | transparent | LB owns the pool to backends |
| Typical products | AWS NLB, HAProxy mode tcp, IPVS | ALB, nginx, Envoy, Traefik, Kong |
Rule of thumb: if you need header-based routing, canary by cookie, path rewrites, or per-route rate limits → L7. If you need raw throughput for arbitrary TCP (Redis, Postgres replicas, WebSocket pass-through) → L4.
Balancing algorithms
Assume 4 backends. Each algorithm decides where request N+1 goes.
flowchart LR
R[["Incoming requests<br/>1 · 2 · 3 · 4 · 5 · 6 · 7 · 8 · 9 · 10"]]
subgraph RR ["🔁 Round-robin"]
direction TB
A1["A: 1,5,9"]
A2["B: 2,6,10"]
A3["C: 3,7"]
A4["D: 4,8"]
end
subgraph WRR ["⚖️ Weighted (A=3 · B=1 · C=1 · D=1)"]
direction TB
B1["A: 1,2,3,7,8,9"]
B2["B: 4,10"]
B3["C: 5"]
B4["D: 6"]
end
subgraph LC ["📊 Least-connections"]
direction TB
C1["A: 3 open"]
C2["B: 4 open"]
C3["C: <b>1 open ← next req</b>"]
C4["D: 2 open"]
end
subgraph P2C ["🎲 Power of two choices"]
direction TB
D1["pick 2 random:<br/>C (1 open) vs A (3 open)"]
D2["send to C"]
end
R --> RR
R --> WRR
R --> LC
R --> P2C
classDef hub fill:#fef3c7,stroke:#d97706,stroke-width:2px,color:#78350f;
classDef ok fill:#dcfce7,stroke:#16a34a,stroke-width:1.5px,color:#14532d;
classDef hot fill:#fee2e2,stroke:#dc2626,stroke-width:1.5px,color:#7f1d1d;
classDef best fill:#dbeafe,stroke:#3b6fd6,stroke-width:2px,color:#0f172a;
class R hub
class A1,A2,A3,A4,B1,B2,B3,B4 ok
class C1,C2,C4,D1 ok
class C3,D2 bestQuick take on each:
- Round-robin — zero state; pathological when backends have different capacity or when request cost varies.
- Weighted round-robin — tell the LB “backend A is 3× the size,” traffic splits 3:1:1:1. Typical during canary ramp.
- Least connections — usually the right default for long-lived HTTP and request-response with long tail. Requires the LB to count in-flight.
- Power of two choices (P2C) — pick two backends at random, send to the one with fewer active requests. Surprisingly close to optimal, no global state needed. Used by Envoy, Finagle, NGINX Plus.
- IP hash / session affinity (sticky sessions) — hashes source IP (or a cookie) to a backend. Use only when the app has in-memory session state; prefer refactoring the state out.
Consistent hashing, for cache locality and stateful services
When each backend holds different data (a sharded cache, a stateful computation, a per-user queue) you can’t send requests to just anyone — the answer is only on one specific node. Naive hash(key) % N sends 100 % of keys to new nodes when N changes. Consistent hashing shifts only ~1/N.
flowchart LR
subgraph RING ["Hash ring (mod 2³²)"]
direction TB
A["🟩 Backend A<br/>vnodes @ 0, 120, 250"]
B["🟦 Backend B<br/>vnodes @ 60, 180, 310"]
C["🟧 Backend C<br/>vnodes @ 90, 200, 340"]
end
K1["key 'user:42' → hash 85"] -->|"nearest clockwise = 90"| C
K2["key 'session:abc' → hash 155"] -->|"180"| B
K3["key 'order:7' → hash 220"] -->|"250"| A
classDef key fill:#fef3c7,stroke:#d97706,stroke-width:1.5px,color:#78350f;
classDef a fill:#dcfce7,stroke:#16a34a,stroke-width:1.5px,color:#14532d;
classDef b fill:#dbeafe,stroke:#3b6fd6,stroke-width:1.5px,color:#0f172a;
classDef c fill:#fed7aa,stroke:#ea580c,stroke-width:1.5px,color:#7c2d12;
class K1,K2,K3 key
class A a
class B b
class C cVnodes (virtual nodes) are the trick that makes the distribution uniform. Each physical backend claims ~150 random positions on the ring; a new backend pulls roughly the right slice of keys off its neighbors instead of inheriting whichever contiguous arc happened to be next to its single position. Redis Cluster, DynamoDB, Cassandra, memcached clients — all consistent-hash with vnodes under the hood.
Health checks — active and passive, together
Two complementary signals. In production you want both.
- Active — the LB probes
GET /healthzevery 2–5 s. A failure → take out of rotation afterNconsecutive misses. Cheap, predictable, but adds background load. - Passive — the LB observes live traffic: 5xx responses, connection resets, timeouts. If a backend’s error rate spikes above a threshold, eject it for a cooldown (Envoy’s outlier detection).
The /healthz vs /readyz distinction matters for Kubernetes:
/livez(liveness) — “am I alive?” If this fails, k8s restarts the container. Keep it trivial; don’t check DB here (otherwise one slow DB takes out every pod)./readyz(readiness) — “am I ready to serve traffic?” If this fails, k8s takes you out of the Service’s endpoint list but does not restart you. Check dependencies here (DB, caches, downstream auth service).
Slow-start / warm-up. When a new backend comes up, don’t immediately send it 25% of traffic — its connection pool is cold, JIT isn’t warm, the JVM hasn’t C2-compiled, Postgres connection handshakes amortize. Envoy has slow_start_config; nginx has slow_start=30s in upstream. Without it, the first pod in a rolling deploy absorbs a latency spike every time.
TLS termination: where does the crypto happen?
flowchart LR
CL[Client] -->|"HTTPS"| LB
subgraph TERM_LB ["① Terminate at LB"]
LB1["LB<br/>(holds cert + key)"]
LB1 -->|"HTTP<br/><i>or</i> re-encrypted HTTPS"| B1[Backend]
end
subgraph TERM_BACK ["② Passthrough to backend"]
LB2["LB<br/>(L4, SNI-only)"]
LB2 -->|"HTTPS passthrough"| B2["Backend<br/>(holds cert + key)"]
end
subgraph TERM_MESH ["③ Mesh sidecar (mTLS)"]
LB3[LB] -->|"HTTPS"| SC1[Envoy sidecar]
SC1 -->|"localhost plaintext"| B3[Backend]
SC1 -.mTLS to peers.- SC2[other sidecars]
end
classDef lb fill:#dbeafe,stroke:#3b6fd6,stroke-width:1.5px,color:#0f172a;
classDef backend fill:#dcfce7,stroke:#16a34a,stroke-width:1.5px,color:#14532d;
classDef mesh fill:#e9d5ff,stroke:#7c3aed,stroke-width:1.5px,color:#4c1d95;
class LB1,LB2,LB3 lb
class B1,B2,B3 backend
class SC1,SC2 mesh
| Terminate at LB | Passthrough | Mesh sidecar | |
|---|---|---|---|
| Cert lives on | LB | every backend | sidecar + LB |
| L7 routing? | ✅ | ❌ (L4 only) | ✅ |
| End-to-end encrypted? | ❌ unless LB→backend re-encrypts | ✅ | ✅ (mTLS) |
| Crypto CPU cost | centralized on LB | spread to backends | on sidecars |
| Typical use | public web app | pinned certs, compliance | zero-trust service mesh |
Global / geo load balancing
The LB box above is regional. Routing users to the closest healthy region happens one level up:
- DNS-level — the authoritative DNS server returns different
Arecords based on the resolver’s location (AWS Route 53 latency-based routing, GeoDNS). TTL is the enemy: a dead region bleeds traffic until TTLs expire everywhere. Keep global-health TTLs low (30-60 s). - Anycast — the same IP is announced from multiple BGP points; routers pick the topologically nearest. CDNs and DNS root servers use anycast. Failover is sub-second because it’s a routing update, not a DNS refresh.
- App-level — the app decides, possibly overriding DNS. E.g. the web app pins users to their home shard after login.
§6 goes deeper into CDN / regional.
Here’s the minimum viable production stack — one region, the seven boxes an interviewer expects you to name:
flowchart LR
user@{ shape: circle, label: "User" }
cdn@{ shape: cloud, label: "CDN" }
subgraph region ["Region (one of many)"]
direction TB
lb@{ shape: stadium, label: "LB (nginx)" }
app@{ shape: rounded, label: "API (Go)" }
cache@{ shape: cyl, label: "Redis" }
db@{ shape: cyl, label: "Postgres" }
end
kafka@{ shape: stadium, label: "Kafka (cross-region)" }
user -->|"HTTPS"| cdn
cdn -->|"miss → origin"| lb
lb -->|"route"| app
app -->|"read/write cache"| cache
app -->|"sync write"| db
db -->|"CDC events"| kafka
classDef neutral fill:#dbeafe,stroke:#3b6fd6,stroke-width:1.5px,color:#0f172a;
classDef storage fill:#fed7aa,stroke:#ea580c,stroke-width:1.5px,color:#7c2d12;
classDef highlight fill:#e9d5ff,stroke:#7c3aed,stroke-width:2px,color:#4c1d95;
class user,cdn,app neutral
class cache,db,kafka storage
class lb highlightSix logos, one focal point (the LB), one flow from left to right. Replicate the region group N times for multi-region; the Kafka bus is the only thing that actually crosses the boundary.
Deployment patterns the LB enables
- Blue/green — keep blue running, deploy green alongside, flip all traffic in one LB config change. Roll back = flip back.
- Canary — weighted routing: 1 % → 10 % → 50 % → 100 % to the new version, watching error rate + latency at each step.
- Rolling — replace pods N at a time; LB takes the restarting pod out of rotation via readiness.
All three require the LB to separate “in rotation” from “running,” which is exactly what readiness probes + weighted pools give you.
Go — a minimal round-robin L7 reverse proxy
Illustrative, not production:
package main
import ( "net/http" "net/http/httputil" "net/url" "sync/atomic" "time")
type Backend struct { URL *url.URL Healthy atomic.Bool Proxy *httputil.ReverseProxy}
type Pool struct { backends []*Backend idx atomic.Uint64 // for round-robin}
func NewPool(urls []string) *Pool { p := &Pool{} for _, raw := range urls { u, _ := url.Parse(raw) b := &Backend{URL: u} b.Proxy = httputil.NewSingleHostReverseProxy(u) // Mark backend unhealthy on transport errors. b.Proxy.ErrorHandler = func(w http.ResponseWriter, r *http.Request, err error) { b.Healthy.Store(false) http.Error(w, "bad gateway", http.StatusBadGateway) } b.Healthy.Store(true) p.backends = append(p.backends, b) } return p}
func (p *Pool) NextHealthy() *Backend { n := uint64(len(p.backends)) for i := uint64(0); i < n; i++ { b := p.backends[p.idx.Add(1)%n] if b.Healthy.Load() { return b } } return nil}
func (p *Pool) ServeHTTP(w http.ResponseWriter, r *http.Request) { b := p.NextHealthy() if b == nil { http.Error(w, "no healthy backend", http.StatusServiceUnavailable) return } b.Proxy.ServeHTTP(w, r)}
// Active health check: every 5s, GET /healthz on each backend.func (p *Pool) HealthLoop() { client := &http.Client{Timeout: 1 * time.Second} t := time.NewTicker(5 * time.Second) for range t.C { for _, b := range p.backends { resp, err := client.Get(b.URL.String() + "/healthz") b.Healthy.Store(err == nil && resp != nil && resp.StatusCode == 200) if resp != nil { resp.Body.Close() } } }}Shortcuts this deliberately takes: no retry on 5xx, no P2C, no TLS backend, no hot-reload of the pool. Production LBs (Envoy, nginx) do all of those plus connection pooling, HTTP/2 multiplexing, circuit breaking, and metrics — the reason “just write your own” is almost always the wrong answer.
Interview gotchas for §5
- Thundering herd after an LB event. When the LB restarts or many backends come up in a rolling deploy, naive round-robin sends one big wave to the newest pod. Slow-start mode (Envoy, NGINX Plus) ramps weight up over
Nseconds. Ask about it. - Session affinity is a scaling debt. Breaks the symmetry that lets you kill any pod without user impact. If you must, key affinity on
user_id(cookie), not source IP — phones roam between Wi-Fi and LTE. - L7 LBs are CPU-bound on TLS, not bandwidth. Plan capacity on handshakes-per-second, not gbps. Session resumption + OCSP stapling + HTTP/2 keepalive ease the bill.
keepalive_timeoutmismatches → 502 storms. If the LB’s idle timeout is longer than the backend’s, the LB will send new requests on sockets the backend is about to close. Always keep LB idle ≤ backend idle − a couple of seconds.- DNS TTL vs failover. A 300 s TTL means a dead region bleeds traffic for 300 s worldwide. Lower the TTL before you need to fail over (not during), or put anycast in front of the name.
- Don’t put an L7 LB in front of another L7 LB unless you love chasing ghost 502s. One hop that parses HTTP is enough; the second hop just adds surface area for header drift, keepalive mismatch, and H1↔H2 translation bugs.
6. Deep Dives — CDN, regional, resilience
Up to this point every protocol assumed the network cooperates. It doesn’t. Packets drop, regions fail, downstream services slow to a crawl, and your latency budget is shorter than any single hop’s tail. This section is the kit of patterns you reach for to keep a system standing when things go sideways.
The latency / availability math interviewers expect
Two numbers you should be able to derive on the whiteboard:
| SLA | Budget per year | Per month | Per week |
|---|---|---|---|
| 99.0% | 3.65 days | 7.2 h | 1.68 h |
| 99.9% (three 9s) | 8.76 h | 43.8 min | 10.1 min |
| 99.99% (four 9s) | 52.6 min | 4.4 min | 60.5 s |
| 99.999% (five 9s) | 5.26 min | 26.3 s | 6 s |
Composition rule. If you depend on N downstream services each at 99.9%, your availability is 0.999^N. Ten dependencies ≈ 99%. That’s why resilience patterns exist — they recover availability from components that individually aren’t good enough.
CDNs: push the edge closer
A CDN caches static (and increasingly dynamic) responses at points of presence (PoPs) near your users. A request hits the nearest PoP; if cached, served from there. If not, the PoP fetches from origin, caches, returns. First byte goes from ~200 ms trans-Pacific to ~20 ms same-city.
flowchart LR
user@{ shape: circle, label: "User" }
pop@{ shape: cloud, label: "Edge PoP" }
origin@{ shape: rounded, label: "Origin" }
bucket@{ shape: disk, label: "Object store" }
user -->|"request"| pop
pop -->|"cache miss"| origin
origin -->|"read"| bucket
bucket -.->|"response"| origin
origin -.->|"fill cache"| pop
pop -.->|"response"| user
classDef neutral fill:#dbeafe,stroke:#3b6fd6,stroke-width:1.5px,color:#0f172a;
classDef highlight fill:#e9d5ff,stroke:#7c3aed,stroke-width:2px,color:#4c1d95;
classDef storage fill:#fed7aa,stroke:#ea580c,stroke-width:1.5px,color:#7c2d12;
class user,origin neutral
class pop highlight
class bucket storageFour knobs that actually matter in an interview:
- Cache keys. By default, URL = key.
Vary: Accept-Languagesplits entries by language header. Get the key wrong → serve the wrong user’s content. - TTL vs stale-while-revalidate.
Cache-Control: max-age=60, stale-while-revalidate=3600= serve cached, asynchronously refetch after 60 s, serve a stale copy for up to 1 h if the origin is down. Trade freshness for resilience. - Cache-stampede protection. When a popular URL expires, 10,000 clients hit the origin simultaneously. Fix: request coalescing at the edge (one request fans out), or
stale-while-revalidate. - Purging. Pushing a fix? Tag-based invalidation (
Cache-Tag: article-42) is far better than URL-based when one piece of content appears in many URLs.
Regional partitioning: blast-radius management
Single-region = single blast radius. If you lose us-east-1, you lose everything. Three typical multi-region postures:
flowchart LR
subgraph AP ["Active / Passive"]
direction TB
APpri["Primary<br/>100% of traffic"]
APstd["Standby<br/>0% · replicated"]
APpri -->|"async replication"| APstd
end
subgraph AA ["Active / Active"]
direction TB
AA1["Region A<br/>50%"]
AA2["Region B<br/>50%"]
AA1 <-->|"bi-directional sync"| AA2
end
subgraph CELL ["Cell-based"]
direction TB
C1["Cell 1<br/>users 0-33%"]
C2["Cell 2<br/>users 33-66%"]
C3["Cell 3<br/>users 66-100%"]
end
classDef neutral fill:#dbeafe,stroke:#3b6fd6,stroke-width:1.5px,color:#0f172a;
classDef ok fill:#dcfce7,stroke:#16a34a,stroke-width:1.5px,color:#14532d;
classDef warn fill:#fef3c7,stroke:#d97706,stroke-width:1.5px,color:#78350f;
class APpri neutral
class APstd warn
class AA1,AA2 ok
class C1,C2,C3 neutral
| Model | Failover time | Data loss risk | Operational cost |
|---|---|---|---|
| Active / Passive | minutes (DNS / BGP flip) | last async-replication window (seconds to minutes) | low — stand-by is cheap |
| Active / Active | seconds (already serving) | merge conflicts if both wrote | high — multi-master sync |
| Cell-based | blast radius = one cell | only that cell’s users | medium — many small cells |
Cell-based is AWS’s favorite pattern: each “cell” is a self-contained stack serving a slice of users. When a cell goes bad, only that slice is affected. Adding capacity = adding cells, not scaling one giant region.
Timeouts: the most underappreciated resilience primitive
If you remember nothing else: every network call needs a timeout. The default behavior of most language HTTP libraries is “wait forever” — which translates to goroutines/threads piling up, connection pools exhausting, the whole service grinding to a halt because of one slow downstream.
Timeout budget — carry a deadline through the call stack, not a timeout per hop. If the top-level request has 800 ms left, an internal call can’t use 500 ms if two more hops come after it.
sequenceDiagram
autonumber
participant U as User
participant A as API (budget: 800ms)
participant B as Service B (budget: 400ms)
participant C as Service C (budget: 150ms)
U->>A: request
Note over A: deadline = now + 800ms
A->>B: call (ctx deadline = 400ms)
Note over B: deadline = now + 400ms
B->>C: call (ctx deadline = 150ms)
Note over C: deadline = now + 150ms
C-->>B: 80ms
B-->>A: 220ms
A-->>U: 370ms ✓ within budgetIn Go, context.Context does this for you — context.WithDeadline(parent, t) clamps the child’s deadline to whichever is tighter. gRPC and well-behaved HTTP libraries read ctx.Deadline() and fail fast if there’s no time left.
Retries with exponential backoff + jitter
When a transient error happens (DNS blip, upstream restart, rate-limit), the first instinct is to retry. Done naively, retries turn a 1-second outage into a 30-second outage as clients synchronize their retry storms.
Three rules to retry well:
- Retry only idempotent operations.
GET,PUT,DELETE— yes.POST— only if you carry an idempotency key. - Cap the number of attempts. Usually 3–5. Infinite retries is a DDoS on yourself.
- Exponential backoff + jitter. Double the delay each attempt, then add randomness so N clients don’t all retry at the exact same moment.
sequenceDiagram
autonumber
participant C as Client
participant S as Upstream
C->>S: GET /resource
S--xC: 503 Service Unavailable
Note over C: wait 100-300ms<br/>(base 200ms + jitter)
C->>S: attempt 2
S--xC: 503
Note over C: wait 200-600ms<br/>(base 400ms + jitter)
C->>S: attempt 3
S--xC: 503
Note over C: wait 400-1200ms<br/>(base 800ms + jitter)
C->>S: attempt 4
S-->>C: 200 OK ✅The jitter is crucial. Without it, all clients that failed at t=0 retry simultaneously at t=200, crushing the service again. With full jitter (sleep = rand(0, base * 2^attempt)), retries smear across the whole interval.
Go — the retry that you copy-paste into every new service:
import ( "context" "errors" "math/rand" "time")
type retryableFn func(ctx context.Context) error
// retry runs fn with full-jitter exponential backoff, capped at maxAttempts.// Returns the last error if all attempts fail. Respects the context deadline.func retry(ctx context.Context, maxAttempts int, base time.Duration, fn retryableFn) error { var err error for attempt := 0; attempt < maxAttempts; attempt++ { err = fn(ctx) if err == nil { return nil } if !isRetryable(err) || attempt == maxAttempts-1 { return err } // Full jitter: sleep = random in [0, base * 2^attempt] maxSleep := base * (1 << attempt) sleep := time.Duration(rand.Int63n(int64(maxSleep)))
select { case <-time.After(sleep): case <-ctx.Done(): return errors.Join(err, ctx.Err()) } } return err}
func isRetryable(err error) bool { // 5xx, connection reset, deadline exceeded — yes. // 4xx (client error), validation errors — no. var te interface{ Timeout() bool } if errors.As(err, &te) && te.Timeout() { return true } // + protocol-specific heuristics (HTTP status, gRPC codes.Unavailable) … return false}Circuit breaker: stop hitting a dead service
A circuit breaker wraps calls to a downstream and fails fast when failures cross a threshold. Think of it as a fuse that opens to protect the rest of the system from cascading failures.
stateDiagram-v2
direction LR
[*] --> Closed
Closed --> Open : failures > threshold<br/>in rolling window
Open --> HalfOpen : after cool-down<br/>(e.g. 30s)
HalfOpen --> Closed : probe succeeds
HalfOpen --> Open : probe fails
note right of Closed
normal — calls go through
failure count incremented
end note
note right of Open
fail fast — no calls
return cached / fallback / 503
end note
note right of HalfOpen
let one probe through
decide based on result
end note
classDef good fill:#dcfce7,stroke:#16a34a,stroke-width:2px,color:#14532d;
classDef bad fill:#fee2e2,stroke:#dc2626,stroke-width:2px,color:#7f1d1d;
classDef probe fill:#fef3c7,stroke:#d97706,stroke-width:2px,color:#78350f;
class Closed good
class Open bad
class HalfOpen probeWhy this matters:
- Without a breaker: a dead downstream takes 10 s to time out per call. At 1000 req/s incoming, that’s 10,000 goroutines stuck in flight → OOM within seconds.
- With a breaker in Open state: calls fail in microseconds with a known error. Goroutines complete, connection pools stay healthy, upstream clients can be told “this feature is degraded, retry in 30 s” instead of “your entire page timed out.”
Go with sony/gobreaker:
import "github.com/sony/gobreaker/v2"
var paymentBreaker = gobreaker.NewCircuitBreaker[*PaymentResult](gobreaker.Settings{ Name: "payment-gateway", MaxRequests: 3, // allowed through in Half-Open Interval: 60 * time.Second, // rolling window Timeout: 30 * time.Second, // Open → Half-Open cool-down ReadyToTrip: func(counts gobreaker.Counts) bool { failureRate := float64(counts.TotalFailures) / float64(counts.Requests) return counts.Requests >= 20 && failureRate >= 0.5 }, OnStateChange: func(name string, from, to gobreaker.State) { log.Printf("breaker %s: %s → %s", name, from, to) },})
func chargeCard(ctx context.Context, req ChargeRequest) (*PaymentResult, error) { return paymentBreaker.Execute(func() (*PaymentResult, error) { return paymentClient.Charge(ctx, req) })}The knobs that matter (with reasonable defaults):
- Failure threshold — failure rate ≥ 50% over a rolling window of 20+ requests. Lower = more sensitive, more false trips.
- Cool-down — time in Open before trying Half-Open. Usually 15-60 s. Too short = hammer while still sick; too long = slow recovery.
- Half-Open probe count — how many requests before declaring healthy again. 1-5. Too many = expose more traffic to a still-broken service.
Bulkheads: isolate the blast radius
Don’t share pools across unrelated features. If payment and search both draw from one http.Transport.MaxIdleConns = 100 pool, a payment-gateway slowdown can starve search. Give each downstream its own pool (separate http.Client), or per-tenant pools for multi-tenant systems.
Paired with a breaker, this means one dead dependency degrades only its feature — the rest of the app keeps working.
Rate limiting, three places
- At the edge (CDN / API gateway) — by IP / API key. Rejects floods before they touch your code.
- At the service (middleware) — by user / tenant / endpoint. Enforces per-customer contracts.
- At the downstream call site (client-side) — token bucket per upstream. Shields your dependencies from you.
The classic algorithm is token bucket: capacity tokens refilled at rate/sec. Each request costs 1. If no tokens, 429. Bursts up to capacity, steady-state rate. Go’s golang.org/x/time/rate.Limiter is idiomatic.
Putting it together — the resilience stack
flowchart TB
REQ(["incoming request"])
TMO["① timeout budget<br/>ctx.WithDeadline"]
RLT["② rate limiter<br/>token bucket / leaky"]
BLK["③ bulkhead<br/>per-dependency pool"]
CB["④ circuit breaker<br/>closed / open / half-open"]
RET["⑤ retry<br/>exp backoff + jitter"]
CALL(["downstream call"])
FALLBACK(["fallback / 503 / cached"])
REQ --> TMO --> RLT --> BLK --> CB
CB -->|"closed"| RET --> CALL
CB -->|"open"| FALLBACK
CB -->|"half-open"| CALL
classDef step fill:#dbeafe,stroke:#3b6fd6,stroke-width:1.5px,color:#0f172a;
classDef good fill:#dcfce7,stroke:#16a34a,stroke-width:1.5px,color:#14532d;
classDef warn fill:#fef3c7,stroke:#d97706,stroke-width:1.5px,color:#78350f;
class TMO,RLT,BLK,CB,RET step
class CALL good
class FALLBACK warn
class REQ,FALLBACK highlightOrder matters: timeouts first so nothing can run forever, rate-limit before the expensive work, bulkhead to isolate, breaker to fail fast, retry on retryable errors only, then the actual call. Getting the order wrong (e.g. retrying before the breaker) amplifies bad behavior instead of absorbing it.
Interview gotchas for §6
- Retry storms after a mass timeout. If you set retries = 3 on every layer (client, gateway, service, downstream), a single slow call multiplies into 3⁴ = 81 attempts. Pick one layer to retry; the others pass the error up.
DELETEisn’t always safe to retry. It’s idempotent semantically (DELETE xtwice = same state) but the second DELETE on a not-found resource may return 404 — your caller needs to treat 404-after-DELETE as success, not failure.- Breaker + retry interaction. The retry layer inside a breaker means one user retrying 3× accounts for 3 failures in the breaker’s window, tripping it faster than you’d expect. Decide: retry outside the breaker (breaker is the ultimate source of truth) or inside (retries are “part of one operation”).
- Cold-start after
Open → HalfOpen. If your downstream just came back and you send all your traffic in the first second, you kill it again. UseMaxRequestsin Half-Open, or add a gradual weight ramp-up (see §5 slow-start). - Monitor
OnStateChange. A breaker silently tripping is worse than no breaker — users see fallbacks and you don’t know why. Page / log every state transition.
7. Interview cheat sheet
Three ways to use this section:
- Night-before review — read only this page, open the diagrams you don’t remember.
- During the interview — when the interviewer drops a keyword, the tables below have a starting sentence.
- Mock warm-up — cover the right column and quiz yourself.
Answer template for any networking question
Strong answers have four beats in order. Missing one is the usual reason a good technical answer feels mediocre:
- Frame the trade-off. Name the two or three things we’re choosing between (latency vs throughput, consistency vs availability, correctness vs cost).
- Pick a default. Give a concrete choice with numbers where you can.
- Call out the failure mode. Say out loud when your default breaks and what you’d reach for next.
- Tie to the specific system in the prompt. Generic answers rate generic.
When you hear X → how to open
The right-hand column is the first sentence, not the whole answer. Expand from there.
Design decisions
| Interviewer says | Open with |
|---|---|
| ”What happens when I type a URL?” | DNS → TCP 3-way → TLS 1.3 (1 RTT) → HTTP request → render. First byte floor = one RTT. HTTP/3 folds the handshake into QUIC. |
| ”TCP or UDP for this?” | Default TCP for correctness; UDP when ordering/retransmits are the app’s job (DNS, media, QUIC) or when HOL blocking matters. |
| ”REST, GraphQL, or gRPC?” | REST for public / CRUD / cacheable. GraphQL when one graph × many client shapes. gRPC for internal polyglot services with streaming. |
| ”WebSocket, SSE, or WebRTC?” | SSE for server → client feeds. WebSocket for bi-di text/binary. WebRTC only if you need media or sub-50ms peer-to-peer. |
| ”301 vs 302?“ | 301 = permanent, cached aggressively, pain to roll back. 302 = temporary, not cached. Use 307/308 to preserve the HTTP method. |
| ”How do you encrypt service-to-service traffic?“ | mTLS, usually delegated to a service mesh sidecar (Envoy / Linkerd). The mesh owns cert rotation, identity, and policy. |
Scaling
| Interviewer says | Open with |
|---|---|
| ”Scale this stateless service.” | One LB (L4 for raw throughput, L7 for routing / rewriting) fronts N replicas, with state in a DB or cache. Add health checks + slow-start to prevent thundering herd on rollouts. |
| ”Design a rate limiter.” | Token bucket: capacity C, refill rate R. Bursts up to C, steady-state R. Key by tenant / user / IP, persist counters in Redis for multi-node consistency. |
| ”Design for 1 M concurrent connections.” | The bottleneck is fan-out, not the sockets themselves. Per-room subscriber index, pub/sub broker (Redis / Kafka / NATS) to broadcast, region-pinned pods, connection-count health signal. |
| ”Deploy with zero downtime.” | Readiness gates rotation. Rolling replaces N pods at a time; blue/green keeps both versions hot and flips the LB; canary shifts traffic by weight (1% → 10% → 100%) while watching error rate. |
| ”Multi-region strategy?” | Active/passive if the cost of replication lag is tolerable; active/active if the app can handle conflict resolution; cell-based when blast radius is the primary concern. |
Failure + resilience
| Interviewer says | Open with |
|---|---|
| ”Postgres goes down — what happens?” | Clients carry deadlines; a circuit breaker opens after threshold so we fail fast instead of piling up goroutines. Serve cached / read-replica if the endpoint tolerates it; return a structured degradation (503 with a Retry-After) otherwise. |
| ”Why are your latencies spiking?” | Separate p50 from p99 first. Likely suspects: GC pauses, connection pool saturation, downstream tail latency, TCP retransmits on a flaky path, or a cold cache. Instrument with distributed tracing to find the hop. |
| ”Your service keeps 502-ing.” | Usually an LB ↔ backend keep-alive mismatch: LB reuses a connection the backend just closed. Align keepalive_timeout (LB < backend) and watch upstream_reset logs. |
| ”One user’s bad request is taking down the service.” | You need bulkheads. Separate connection pools per downstream so one slow dependency doesn’t starve the rest; rate-limit per user/tenant, not just globally. |
| ”What’s wrong with retrying every error?” | Retry storms. Each layer retries 3× → stacks multiplicatively (3⁴ = 81 attempts). Retry in one place (client), use exponential backoff with full jitter, and only for idempotent operations or calls with an idempotency key. |
The one-pager
If nothing else sticks, memorize this:
| Decision | Default | Why |
|---|---|---|
| Transport | TCP for correctness, UDP / QUIC for real-time | TCP head-of-line blocking is at the stream layer |
| HTTP version | HTTP/2 within a datacenter, HTTP/3 at the edge for mobile | /3 rides QUIC → no HOL, connection migration across networks |
| API style | REST public, gRPC internal, GraphQL when one schema × many clients | Each pattern matches a distinct constraint |
| Real-time | SSE server→client, WebSocket bi-di, WebRTC media / sub-50ms | Pick the simplest channel that solves the problem |
| Load balancing | L4 for raw throughput, L7 for HTTP-aware routing | L7 is CPU-bound on TLS, not bandwidth |
| LB algorithm | Least-connections as default, P2C when stateless, consistent hashing for shard affinity | |
| Resilience stack | timeout → rate-limit → bulkhead → breaker → retry → call | Order matters — retries before the breaker amplify failures |
| Retries | Exponential backoff with full jitter, cap 3–5 attempts, idempotent only | Prevents retry storms |
| Multi-region | Cell-based for blast-radius control, active / active for sub-minute RTO | Active/passive cheapest, active/active costliest |
| Caching | Cache-Control: max-age=N, stale-while-revalidate=M | Resilience usually beats freshness |
Pitfalls to volunteer
Interviewers reward candidates who surface failure modes before being asked. The list below is short enough to scan the day of; drop one or two where they fit the scenario:
- POST is not idempotent. Safe retries need an idempotency key propagated through every layer.
TIME_WAITport exhaustion on high-churn outbound clients. Use connection pools; avoidtcp_tw_recycle(deprecated, breaks NAT).Connection: closeemitted on every response cripples the client pool.- WebSocket without
CheckOriginis CSRF-over-WebSocket waiting to happen. - WebRTC without a TURN budget is a demo, not a product — plan for 10–20 % of calls to need the relay.
- Session affinity is scaling debt. It breaks the symmetry that lets you terminate any pod.
keepalive_timeoutasymmetry between LB and backend produces 502 storms.- DNS TTL during failover. A 300 s TTL means 300 s of bleeding traffic to a dead region.
- Retry inside a breaker double-counts failures. Decide which layer owns retry semantics.
- Reusing a protobuf field number silently breaks wire compatibility. Use
reserved. - Unbounded GraphQL queries are a DoS primitive. Enforce depth limits and persisted queries in production.
- 0-RTT TLS early data is replayable. Never use it for state-changing requests.
- HTTP/2 with a self-signed cert in Go needs
NextProtos = ["h2", "http/1.1"]— otherwise the client silently falls back to HTTP/1.1. - PMTUD black-holing. A middlebox dropping ICMP “fragmentation needed” packets stalls any segment over the path MTU.
Further reading
In order of depth-per-hour:
- System Design Primer — a curated reading list masquerading as a README. Best starting point.
- Designing Data-Intensive Applications (Kleppmann). Chapters 5, 6, and 7 on replication, partitioning, and transactions. The implicit syllabus of most system-design rounds.
- RFC 9110 (HTTP semantics) and RFC 9114 (HTTP/3) — readable, surprisingly short.
- AWS Builders’ Library — essays on the patterns in §5 and §6, written at the scale they were invented for.
- highscalability.com — post-mortems and architecture profiles from companies in production.
- ByteByteGo’s newsletter — weekly, diagram-heavy, short enough to read on a commute.
Wrapping up
A working mental model of networking is cumulative. You will not acquire it in one sitting — you will notice one week that a problem at the layer above is easier because you understood the one below. Use this post as the spine; fill the gaps with whatever production mystery you’re chasing that week.
Corrections and sharper phrasings are welcome. Open an issue on the blog’s repo and I’ll update with attribution.