💭 note
12564 words
63 minutes

Networking Essentials

Most system-design interviews touch networking. You don’t need to recite RFCs, but you do need to choose the right protocol, explain why, and anticipate the failure modes.

This post is my working cheat sheet — structured for re-reading before an interview, with diagrams, Go snippets, and the trade-offs that actually come up. I rewrote my original HelloInterview notes into seven sections that build on each other: the layer model, then each layer from the wire up to application protocols, then load balancing, then resilience.

How to use this: skim the headings and diagrams first. Second pass, read the “why it matters” paragraphs. Third pass, the code. Don’t memorize — understand the trade-off.


1. Networking 101#

Every network interaction is a stack of responsibilities. Each layer talks only to the one directly above and below it, so you can swap implementations without breaking the others — the same HTTP request works whether it rides on Ethernet, Wi-Fi, or LTE.

Textbooks teach the 7-layer OSI model. In practice everyone uses the 4-layer TCP/IP model because the three top OSI layers collapse into “the application decides.” Know both names for the interview; use the 4-layer one when reasoning.

L7 Application HTTP · DNS · gRPC · SSH L4 Transport TCP · UDP · QUIC L3 Internet IP · ICMP · routing L2 Link Ethernet · Wi-Fi · ARP · MAC encapsulates ↓
The 4-layer TCP/IP stack. Each layer wraps the payload from the one above.

The 4-layer TCP/IP stack. Each layer wraps the payload from the layer above.

What each layer actually does#

LayerJob in one sentenceUnitAddresses
Application”What does this message mean?“messageURLs, hostnames
Transport”Who on that machine should get it, and is it reliable?“segment (TCP) / datagram (UDP)port numbers
Internet”Which machine on the internet, and how do we route there?“packetIP addresses
Link”How do we put bits on this physical medium?“frameMAC addresses

A packet at the link layer is literally a nested envelope: [ Ethernet [ IP [ TCP [ HTTP ... ] ] ] ]. Each device along the route peels off the link-layer envelope to decide where to forward next, then reseals with a new one.

What happens when you type https://example.com and press Enter#

This is the single most common warm-up question. The full answer touches every layer:

sequenceDiagram
    autonumber
    participant C as 🖥️ Client
    participant D as 🗺️ DNS
    participant S as 🌐 Server :443

    rect rgb(254, 249, 195)
    Note over C,D: Phase 1 — DNS resolution
    C->>D: query A example.com
    activate D
    D-->>C: 93.184.216.34
    deactivate D
    end

    rect rgb(219, 234, 254)
    Note over C,S: Phase 2 — TCP 3-way handshake
    C->>S: SYN  seq=x
    activate S
    S-->>C: SYN·ACK  seq=y, ack=x+1
    C->>S: ACK  ack=y+1
    end

    rect rgb(220, 252, 231)
    Note over C,S: Phase 3 — TLS 1.3 handshake (1 RTT)
    C->>S: ClientHello + key share
    S-->>C: ServerHello + cert + Finished
    Note over C,S: both sides derive the session key
    end

    rect rgb(233, 213, 255)
    Note over C,S: Phase 4 — HTTP over TLS
    C->>S: GET / HTTP/1.1
    S-->>C: 200 OK · Content-Type: text/html
    deactivate S
    end

A single HTTPS request touches DNS, TCP, TLS, and HTTP. HTTP/2 and /3 fold steps 2 + 3 + 4 into fewer round-trips.

Mental shortcut for the interview:

  1. Resolve — DNS turns example.com into an IP (UDP 53, or TCP 53 for large answers).
  2. Connect — TCP 3-way handshake (SYN, SYN-ACK, ACK) to the IP on port 443.
  3. Secure — TLS handshake negotiates a session key; with TLS 1.3 this is 1 RTT, sometimes 0-RTT on resumption.
  4. Request — the client sends an HTTP request over the encrypted stream.
  5. Respond — the server sends HTML. The browser parses, finds asset URLs, and repeats.

Every one of those steps is a potential failure mode an interviewer can probe. Keep it at the tip of your tongue.

Ports you are expected to know#

PortProtocolWhat
22TCPSSH
53UDP / TCPDNS
80TCPHTTP
443TCP / UDPHTTPS (UDP if HTTP/3)
6379TCPRedis
5432TCPPostgres
9092TCPKafka

2. Network Layer#

The network (L3) layer’s job is to get a packet from a source IP to a destination IP, possibly across many routers. It doesn’t care about ports, connections, or reliability — those are the transport layer’s problem.

IPv4 vs IPv6 in one table#

IPv4IPv6
Address size32 bits128 bits
Total addresses~4.3 × 10⁹ (exhausted since 2011)~3.4 × 10³⁸
Notation192.168.1.12001:db8::1
Headervariable (20–60 B), checksummedfixed 40 B, no checksum
NAT needed?yes, universallydesigned to not need it
Packet fragmentationrouters can fragmentonly the sender; routers drop + PMTUD
ConfigurationDHCP or staticSLAAC (stateless) + DHCPv6

In interviews: IPv6 adoption is slow because NAT + CGNAT let IPv4 limp along, and because dual-stack migration is politically painful. Design for both when you can, deploy behind a load balancer that terminates either.

What’s in an IPv4 header#

You don’t need to memorize byte offsets, but knowing what fields exist explains a lot of real behavior — MTU issues, traceroute output, and why iptables rules reference TTL.

0 8 16 31 bits Ver IHL ToS / DSCP Total length (bytes) Identification Flags Fragment offset TTL Protocol Header checksum Source IP address Destination IP address Options (rarely used) + payload ↓ 4 4 8 16 16 3 13 8 8 (6=TCP, 17=UDP) 16 32 32
The highlighted fields are the ones you'll reference in interviews: TTL, Protocol, and the addresses. Total length caps at 65,535 bytes.

The two fields that come up the most:

  • TTL (Time To Live) — a hop counter. Each router decrements it; when it hits 0 the packet is dropped and an ICMP “time exceeded” is sent back. traceroute exploits this by sending probes with TTL=1, 2, 3… and listening for the ICMP replies.
  • Protocol — tells the receiver how to interpret the payload: 6 for TCP, 17 for UDP, 1 for ICMP.

Routing, in one paragraph#

Routers maintain routing tables that map “destination prefix → next hop.” When a packet arrives, the router looks up the longest matching prefix of the destination IP and forwards to the corresponding next hop. On the internet, routing tables are built dynamically by BGP between autonomous systems. On your laptop, the table has two useful entries — your subnet (192.168.1.0/24 → direct) and everything else (0.0.0.0/0 → your gateway).

Terminal window
# See your routing table
$ ip route
default via 192.168.1.1 dev wlan0
192.168.1.0/24 dev wlan0 proto kernel scope link src 192.168.1.42
# Trace the hops to a destination
$ traceroute -n example.com
1 192.168.1.1 1.4 ms
2 100.64.0.1 8.2 ms # CGNAT inside the ISP
3 203.0.113.5 9.1 ms
...

NAT: why your home IP is a lie#

Your laptop’s 192.168.1.42 is a private IP, invalid on the public internet. Your router does Network Address Translation: when you send a packet, it rewrites the source IP to the router’s public IP and remembers the translation in a table. When the reply comes back, it rewrites the destination back to 192.168.1.42 and forwards to your laptop.

flowchart LR
    laptop["<b>Your laptop</b><br/><code>192.168.1.42</code><br/>src port 51000"]
    router["<b>Router (NAT)</b><br/>priv 192.168.1.1<br/>pub 203.0.113.17<br/><i>keeps translation table</i>"]
    target["<b>example.com</b><br/><code>93.184.216.34</code><br/>dst port 443"]

    laptop -->|"outbound<br/>src 192.168.1.42:51000"| router
    router -->|"rewritten<br/>src 203.0.113.17:60321"| target
    target -->|"reply<br/>dst 203.0.113.17:60321"| router
    router -->|"rewritten<br/>dst 192.168.1.42:51000"| laptop

    classDef neutral fill:#dbeafe,stroke:#3b6fd6,stroke-width:1.5px,color:#0f172a;
    classDef highlight fill:#e9d5ff,stroke:#7c3aed,stroke-width:2px,color:#4c1d95;
    class laptop,target neutral
    class router highlight

One public IP fronts many private hosts by multiplexing on the source port. NAT breaks when two sides both need to initiate — see WebRTC in §4c.

CIDR notation, quick reference#

192.168.1.0/24 means the first 24 bits are the network prefix, leaving 8 bits = 256 addresses (minus 2 for network + broadcast). Memorize these edge cases:

PrefixSizeCommon use
/321 addresssingle host
/24256small subnet, home LAN
/1665,536corp subnet
/816 Mlegacy class A (10.0.0.0/8 private)
/0alldefault route

RFC 1918 private ranges (never routable on the public internet): 10.0.0.0/8, 172.16.0.0/12, 192.168.0.0/16. 169.254.0.0/16 is link-local (what you get if DHCP fails). 127.0.0.0/8 is loopback.


3. Transport Layer#

IP gets a packet to a host. The transport layer gets it to the right program on that host, using a 16-bit port number. It also decides whether the stream is reliable, ordered, and flow-controlled (TCP) or fire-and-forget (UDP). Everything above this layer — HTTP, gRPC, DNS, SMTP — is just a convention layered on top of one of these two.

TCP: the 3-way handshake#

Before any data flows, TCP opens a connection by exchanging three segments. Each side picks a random initial sequence number (ISN) to defend against blind spoofing, and each side acks the other’s ISN + 1.

sequenceDiagram
    autonumber
    participant C as 🖥️ Client
    participant S as 🌐 Server

    Note over C: state: CLOSED
    Note over S: state: LISTEN
    rect rgb(219, 234, 254)
    C->>S: SYN · seq=x
    activate S
    Note over C: SYN_SENT
    Note over S: SYN_RCVD
    S-->>C: SYN·ACK · seq=y, ack=x+1
    C->>S: ACK · ack=y+1
    deactivate S
    end
    Note over C,S: ✅ ESTABLISHED — 1 full RTT before the first byte of payload

Three segments, one RTT. Motivates keep-alive, connection pooling, and HTTP/2 multiplexing.

Two interview gotchas on the handshake:

  • SYN flood. If the server commits memory on every received SYN, an attacker can flood them and exhaust connection tables. The fix is SYN cookies — encode the connection state in the ACK sequence number and allocate memory only after the client’s final ACK proves it came from the real source.
  • Half-open connection. If the client crashes after the 3-way handshake, the server has no idea. Keep-alive probes (TCP or application-level) exist to detect this; defaulting to “forever-idle sockets are fine” is wrong.

Reliability, built from primitives#

TCP is a reliable, ordered, byte stream. It achieves that on top of an unreliable IP layer with four mechanisms layered on each other:

  1. Sequence numbers on every byte. The receiver reorders out-of-order segments into a contiguous stream.
  2. Cumulative ACKs. ack = N means “I have received everything up to byte N-1.”
  3. Retransmission on timeout. The sender keeps a running estimate of RTT. If no ACK arrives within RTO, resend.
  4. Fast retransmit. If the sender gets three duplicate ACKs (ack = K four times), it infers a single segment was lost and resends without waiting for the RTO.

The sender does not send one segment at a time — it keeps a whole window of bytes in-flight:

flowchart LR
    B1["1"]:::acked --- B2["2"]:::acked --- B3["3"]:::acked
    B3 ==>|"send base"| B4
    B4["4"]:::flight --- B5["5"]:::flight --- B6["6"]:::flight --- B7["7"]:::flight
    B7 ==>|"next byte to send"| B8
    B8["8"]:::avail --- B9["9"]:::avail
    B9 ===|"window edge"| B10
    B10["10"]:::blocked --- B11["11"]:::blocked

    subgraph legend ["Legend"]
        direction LR
        L1[" acked "]:::acked
        L2[" in-flight (unacked) "]:::flight
        L3[" can send now "]:::avail
        L4[" blocked (beyond window) "]:::blocked
    end

    classDef acked   fill:#e5e7eb,stroke:#6b7b9a,color:#475569;
    classDef flight  fill:#93c5fd,stroke:#3b6fd6,stroke-width:1.5px,color:#0f172a;
    classDef avail   fill:#f1f5f9,stroke:#6b7b9a,color:#475569;
    classDef blocked fill:#fafafa,stroke:#6b7280,stroke-dasharray:3 3,color:#6b7280;

The sender can have window = min(cwnd, rwnd) bytes in-flight without waiting for ACKs. Every ACK slides the window right; every loss shrinks it.

Flow control vs congestion control#

These sound alike but answer different questions, and interviewers will test whether you know the difference.

Flow controlCongestion control
Protectsthe receiverthe network
Lives onthe receiver advertises rwndthe sender computes cwnd
Signalreceiver’s buffer space, sent back in every ACKpacket loss + RTT trends
Classic algorithmsimple: rwnd in headerReno / CUBIC / BBR

Slow start + AIMD in one line each.

  • Slow start: new connection, cwnd doubles every RTT until loss.
  • AIMD (Reno / CUBIC after slow start): Additive Increase, Multiplicative Decrease. On each ACK, cwnd += 1/cwnd. On loss, cwnd /= 2.

BBR (Google’s newer algorithm, used on YouTube + many CDNs) breaks from AIMD entirely — it models the path’s bandwidth and RTT directly and ignores loss as the primary signal. Worth mentioning when the interviewer asks about modern TCP behavior under shallow-buffered bufferbloat networks.

The states an application actually touches#

stateDiagram-v2
    direction TB
    [*] --> CLOSED
    CLOSED --> SYN_SENT: active open<br/>send SYN
    CLOSED --> LISTEN: passive open
    SYN_SENT --> ESTABLISHED: recv SYN-ACK<br/>send ACK
    LISTEN --> SYN_RCVD: recv SYN<br/>send SYN-ACK
    SYN_RCVD --> ESTABLISHED: recv ACK
    ESTABLISHED --> FIN_WAIT_1: close()<br/>send FIN
    ESTABLISHED --> CLOSE_WAIT: recv FIN<br/>send ACK
    FIN_WAIT_1 --> FIN_WAIT_2: recv ACK
    FIN_WAIT_2 --> TIME_WAIT: recv FIN<br/>send ACK
    TIME_WAIT --> CLOSED: 2 × MSL timeout
    CLOSE_WAIT --> LAST_ACK: close()<br/>send FIN
    LAST_ACK --> CLOSED: recv ACK

    classDef good fill:#dcfce7,stroke:#16a34a,stroke-width:2px,color:#14532d;
    classDef closing fill:#fef3c7,stroke:#ca8a04,stroke-width:1.5px,color:#713f12;
    classDef terminal fill:#f3f4f6,stroke:#6b7280,stroke-width:1.5px,color:#1f2937;
    class ESTABLISHED good
    class FIN_WAIT_1,FIN_WAIT_2,CLOSE_WAIT,LAST_ACK,TIME_WAIT closing
    class CLOSED terminal

Simplified TCP state machine. Left branch = active side (initiator). Right branch = passive side (usually the server). Full spec has 11 states; these are the ones you'll reference in a debugging story.

Why TIME_WAIT matters. After an active close, the initiator stays in TIME_WAIT for 2 × MSL (typically 60 s on Linux) before the quadruple (src-ip, src-port, dst-ip, dst-port) can be reused. On a server that opens many outbound connections (a payment gateway, a scraper) this can exhaust ephemeral ports. Mitigations: enable SO_REUSEADDR / net.ipv4.tcp_tw_reuse, use connection pooling, or reduce churn.

UDP: 8 bytes and a shrug#

UDP’s header is the minimum viable protocol.

0 16 31 bits Source port Destination port Length (header + data) Checksum (optional on IPv4) 16 16 16 16
No seq number. No ack. No state. Each datagram is independent — if it matters to you, you handle it in the app.

What UDP doesn’t do: no handshake, no ordering, no retransmit, no flow or congestion control. What it does do: stay out of your way. That makes it the substrate for DNS (usually 1 packet round-trip, no need for a connection), video/voice (loss is fine, reordering is worse than drop), game state (“where is the tank now” is more useful than “where was it 200 ms ago”), and, since QUIC, modern HTTP itself.

TCP vs UDP, side by side#

TCPUDP
Connectionhandshake before datanone
Header20 B minimum, up to 60 B8 B
Orderingguaranteedapp’s problem
Reliabilityguaranteedapp’s problem
Flow controlyes (rwnd)no
Congestion controlyes (CUBIC / BBR / …)no (app must be well-behaved)
Latency floor1 RTT + slow start1 one-way trip
Typical usersHTTP/1.1, HTTP/2, gRPC, SSH, DBDNS, QUIC (HTTP/3), games, voice, video

Head-of-line blocking, the reason HTTP/3 exists#

TCP’s ordering guarantee has a dark side. If segment 5 is dropped, segments 6–10 that did arrive must sit in the kernel’s receive buffer until segment 5 is retransmitted and filled in. Everything above TCP — including independent HTTP/2 streams — has to wait. This is head-of-line (HOL) blocking at the transport layer.

HTTP/3 fixes this by building on QUIC (which rides on UDP) and doing its own stream-level ordering: one lost packet stalls only its stream, not all the concurrent streams on the same connection. More on this in §4a.

Decision rubric#

Pick TCP when the correctness of the byte stream matters more than the 1-RTT cost and you’re OK waiting on retransmits:

  • HTTP/1.1, HTTP/2, gRPC-over-HTTP/2
  • SSH, SQL wire protocols, Kafka
  • File transfer / replication

Pick UDP when you can tolerate loss or you need to own reordering yourself:

  • DNS queries (1 packet, fits in MTU, retry at app level)
  • QUIC / HTTP/3
  • Real-time media (WebRTC, VoIP, game state)
  • Multicast / broadcast (TCP is strictly 1-to-1)

Go snippets#

TCP client with sane timeouts. The default net.Dial has no timeout and will happily hang forever.

package main
import (
"net"
"time"
)
func openConn() (net.Conn, error) {
d := net.Dialer{
Timeout: 3 * time.Second, // connect timeout
KeepAlive: 30 * time.Second, // TCP keep-alive probes
}
conn, err := d.Dial("tcp", "example.com:443")
if err != nil {
return nil, err
}
// Deadlines for read/write, refreshed per operation.
_ = conn.SetDeadline(time.Now().Add(10 * time.Second))
return conn, nil
}

UDP echo server. The read loop is packet-oriented, not stream-oriented — one ReadFromUDP returns exactly one datagram. Framing is the app’s job.

package main
import (
"log"
"net"
)
func main() {
addr, _ := net.ResolveUDPAddr("udp", ":9000")
conn, err := net.ListenUDP("udp", addr)
if err != nil {
log.Fatal(err)
}
defer conn.Close()
buf := make([]byte, 2048) // 1 datagram at a time
for {
n, peer, err := conn.ReadFromUDP(buf)
if err != nil {
log.Printf("read: %v", err)
continue
}
// No framing guarantees: 'n' bytes are one logical message.
if _, err := conn.WriteToUDP(buf[:n], peer); err != nil {
log.Printf("write to %s: %v", peer, err)
}
}
}

Gotchas interviewers love#

  • Nagle’s algorithm. TCP coalesces small writes to reduce packet overhead, which interacts badly with TCP_ACK-delayed receivers. Latency-sensitive apps (interactive SSH, game clients) set TCP_NODELAY to disable it.
  • TIME_WAIT exhaustion. A high-churn outbound client (think: aggressive HTTP client with no keep-alive) can run out of ephemeral source ports. Reuse connections or bump ip_local_port_range.
  • MTU / PMTUD blackhole. If a middlebox drops ICMP “fragmentation needed” messages, the sender never learns to shrink its packets and the connection stalls on any segment larger than the path MTU. Common cause of “works on my laptop, times out on corp VPN.”
  • UDP amplification DDoS. A spoofed 50-byte DNS query can return a 3,000-byte response. Open resolvers and misconfigured NTP / memcached servers are classic reflectors. If you build a UDP service, cap the reply size and rate-limit per source.

4. Application Layer#

Above the transport layer, protocols express what the conversation is about. They’re organized by purpose, not by position in a stack: HTTP is a request/response protocol, SMTP is a mail protocol, gRPC is an RPC system. This section is split into three groups of the ones an interviewer will actually probe — 4a. HTTP family (with TLS), 4b. API styles: REST vs GraphQL vs gRPC, and 4c. Real-time: SSE vs WebSocket vs WebRTC.

4a. HTTP / HTTPS / HTTP/2 / HTTP/3#

HTTP is the protocol of the web — a stateless, text-based request/response protocol on TCP port 80 (or 443 with TLS). “Stateless” is the operative word: every request is self-contained from the server’s point of view. State (sessions, auth) lives in cookies, headers, or the database, not the protocol.

Anatomy of a request#

GET /users/42?include=orders HTTP/1.1
Host: api.example.com
Authorization: Bearer eyJhbGciOi...
Accept: application/json
Accept-Encoding: gzip, br
If-None-Match: "a3f7b9"
Connection: keep-alive

Each line is Header: Value. Blank line separates headers from body. The server responds with a status line, headers, and body:

HTTP/1.1 200 OK
Content-Type: application/json
Content-Length: 847
ETag: "a3f7b9"
Cache-Control: private, max-age=60
Connection: keep-alive
{"id":42,"name":"Ada","orders":[...]}

Status codes you must know cold#

RangeMeaningMust-know examples
1xxinformational101 Switching Protocols (WebSocket upgrade)
2xxsuccess200 OK · 201 Created · 204 No Content · 206 Partial Content
3xxredirect / cache301 Moved Permanently · 302 Found · 304 Not Modified · 307/308 (preserve method)
4xxclient error400 Bad Request · 401 Unauthorized · 403 Forbidden · 404 Not Found · 409 Conflict · 422 Unprocessable · 429 Too Many Requests
5xxserver error500 Internal · 502 Bad Gateway · 503 Unavailable · 504 Gateway Timeout

Easy-to-mix-up pair: 401 means “I don’t know who you are” — re-auth and retry. 403 means “I know who you are, and you can’t.”

Idempotency matters#

MethodIdempotentSafe (read-only)BodyCacheable
GETno
HEADno
OPTIONSno
PUTyes
DELETE
POSTyesrarely
PATCHyes

The interview gotcha is POST is not idempotent — if the client retries a POST because it didn’t see the response, it can create the same order twice. The canonical fix is an idempotency key: a client-generated unique string the server dedupes on. Stripe, AWS, and every payment-adjacent API does this.

Caching, briefly#

HTTP caching is coordinated between server, browser, and intermediaries (CDN, proxies). The knobs:

  • Cache-Control: public, max-age=3600 — how long, where.
  • ETag: "..." / Last-Modified: ... — identity of the current version. Client sends If-None-Match / If-Modified-Since on revalidation; server replies 304 Not Modified and no body.
  • Vary: Accept-Encoding, Authorization — split cache entries by these request headers.

CDNs use the same primitives — §6 covers the edge story.

HTTPS: HTTP + TLS#

HTTPS is HTTP encrypted with TLS. The TLS handshake runs once when the connection is established, producing symmetric keys for the rest of the connection.

sequenceDiagram
    autonumber
    participant C as 🔐 Client
    participant S as 🌐 Server

    rect rgb(219, 234, 254)
    Note over C,S: TLS 1.3 — fresh handshake (1 RTT)
    C->>+S: ClientHello<br/>(supported ciphers · key share · SNI)
    S-->>-C: ServerHello · cert · Finished<br/>(picks cipher · sends its key share)
    Note over C,S: 🔑 both sides derive the session key
    C->>S: Finished (encrypted)
    C->>+S: HTTP GET / (encrypted)
    S-->>-C: HTTP 200 OK (encrypted)
    end

    rect rgb(220, 252, 231)
    Note over C,S: TLS 1.3 resumption — 0 RTT (early data)
    C->>+S: ClientHello + <b>early data</b> (encrypted with PSK)
    S-->>-C: ServerHello + response (no handshake wait)
    Note right of S: ⚠️ early data is <br/>replayable — don't use for<br/>state-changing requests
    end

What TLS gives you — three things interviewers will ask:

  1. Confidentiality — AEAD ciphers (AES-GCM, ChaCha20-Poly1305) encrypt the payload.
  2. Integrity — the same AEAD MAC detects tampering.
  3. Authentication — the server’s cert, signed by a CA the client trusts, proves you’re talking to example.com, not an attacker.

Common interview probes:

  • Why is TLS 1.3 faster than 1.2? One RTT vs two. TLS 1.2 separated key-exchange from Finished; TLS 1.3 combines them and removes obsolete ciphers.
  • What is SNI? Server Name Indication — the hostname the client is asking for, sent unencrypted in ClientHello. Lets one IP host multiple certs. Encrypted Client Hello (ECH) fixes the leak.
  • 0-RTT trade-off? On resumed sessions, the client can send data in its first flight. Great for latency; the data is replayable if the attacker captures and retransmits it. Don’t use 0-RTT for state-changing requests.

HTTP/2: binary, multiplexed, one connection#

HTTP/1.1 opens one TCP connection per in-flight request (browsers cap ~6 per origin). HTTP/2 keeps a single connection and multiplexes many streams over it.

graph LR
    subgraph H11 ["🐢 HTTP/1.1 — many TCP connections, head-of-line serial"]
        direction TB
        C1["TCP #1<br/>GET /index.html"]
        C2["TCP #2<br/>GET /app.js"]
        C3["TCP #3<br/>GET /style.css"]
        C4["TCP #4<br/>GET /logo.png"]
    end

    subgraph H2 ["🚀 HTTP/2 — one connection, multiplexed streams"]
        direction TB
        T(["1 TCP + TLS"])
        T --> S1["stream 1<br/>/index.html"]
        T --> S2["stream 3<br/>/app.js"]
        T --> S3["stream 5<br/>/style.css"]
        T --> S4["stream 7<br/>/logo.png"]
    end

    classDef old fill:#fee2e2,stroke:#dc2626,stroke-width:1.5px,color:#7f1d1d;
    classDef new fill:#dcfce7,stroke:#16a34a,stroke-width:1.5px,color:#14532d;
    classDef hub fill:#dbeafe,stroke:#3b6fd6,stroke-width:2px,color:#0f172a;
    class C1,C2,C3,C4 old
    class S1,S2,S3,S4 new
    class T hub

Key improvements over 1.1:

  • Binary framing. Every message is split into DATA / HEADERS / SETTINGS frames. Cheap to parse, no ambiguity.
  • Multiplexing. Many concurrent streams on one connection. No head-of-line blocking at the HTTP layer.
  • HPACK header compression. Redundant headers (cookies, UA, Host) are table-indexed instead of resent. Huge win for short requests.
  • Server push. Server can pre-send assets it knows the client will need. (Deprecated by most browsers — misused more often than helpful.)
  • Stream priorities. Clients can weight streams; used to deliver CSS/JS before images.

The catch. HTTP/2 multiplexing lives above TCP. If one packet is lost, TCP stalls delivery of all streams until retransmission arrives — HOL blocking at the transport layer, exactly the problem we flagged in §3.

HTTP/3: HTTP over QUIC over UDP#

HTTP/3 replaces TCP with QUIC, a transport protocol built on UDP that combines “TCP semantics + TLS 1.3” into one unified handshake with independent streams:

HTTP/1.1HTTP/2HTTP/3
TransportTCPTCPQUIC (UDP)
Securityoptional TLSTLS mandatory (practice)TLS 1.3 mandatory, integrated
Framingtextbinary framesbinary frames
Multiplexingno (multiple TCP)yes (1 TCP)yes (QUIC streams)
Connection openTCP + TLS = 2-3 RTTTCP + TLS = 2-3 RTT1 RTT (0 RTT on resumption)
Head-of-line blockingyesyes, at TCP layerno — per-stream loss
Connection migrationno (IP change breaks it)noyes (connection ID)
Deployed byEverything~70% of webCDNs + big sites, growing

Connection migration is the underrated killer feature: QUIC identifies a connection by an ID in the header, not by the 4-tuple. Your phone switches from Wi-Fi to 5G, the IP changes, TCP would reset — QUIC just keeps going.

Go http.Client: the timeouts you must set#

The zero-value http.Client{} has no timeout. A single slow server can hang your entire service. Always configure:

package main
import (
"context"
"net"
"net/http"
"time"
)
// prodClient is a reusable, properly-bounded HTTP client.
// One per process — it pools connections internally.
var prodClient = &http.Client{
Timeout: 10 * time.Second, // total request budget, covers retries-internal
Transport: &http.Transport{
DialContext: (&net.Dialer{
Timeout: 3 * time.Second, // TCP connect
KeepAlive: 30 * time.Second,
}).DialContext,
TLSHandshakeTimeout: 3 * time.Second,
ResponseHeaderTimeout: 5 * time.Second,
ExpectContinueTimeout: 1 * time.Second,
IdleConnTimeout: 90 * time.Second,
MaxIdleConns: 100,
MaxIdleConnsPerHost: 10, // bump for high-throughput clients
ForceAttemptHTTP2: true,
},
}
func fetch(ctx context.Context, url string) (*http.Response, error) {
// Prefer request-level context over client Timeout when the deadline
// must propagate across service boundaries.
req, err := http.NewRequestWithContext(ctx, http.MethodGet, url, nil)
if err != nil {
return nil, err
}
req.Header.Set("Accept", "application/json")
req.Header.Set("User-Agent", "myservice/1.0")
return prodClient.Do(req)
}

Three timeouts in this snippet and why each exists:

  • Dialer.Timeout — how long TCP connect can take. Defends against unreachable hosts.
  • TLSHandshakeTimeout — how long TLS can take after connect.
  • ResponseHeaderTimeout — how long the server can take to send status + headers. A slow backend blocking here looks like a hung request — this bounds it without cutting off a legitimately large streaming body.

Bonus: request-level deadlines. Prefer ctx, cancel := context.WithTimeout(parent, 800*time.Millisecond) at the call site over mutating the client — the deadline then propagates cleanly through gRPC/HTTP/database layers downstream.

Interview gotchas for §4a#

  • Connection: close vs keep-alive. HTTP/1.0 closed by default. HTTP/1.1 keep-alive by default. Servers that emit Connection: close on every response will cripple your client’s connection pool.
  • Cookie scoping. Domain=example.com includes subdomains. Secure restricts to HTTPS. HttpOnly hides from JS. SameSite=Lax is the sane default to block CSRF.
  • Redirect traps. 301 is cached aggressively. If you deploy 301 /old → /new and later change your mind, clients may never retry. Use 302 or 307 during rollouts.
  • Content-Length vs Transfer-Encoding: chunked. A response has exactly one. If a reverse proxy (nginx, HAProxy) buffers a chunked response to add Content-Length, latency on streaming endpoints goes up. Turn buffering off at the proxy for SSE/gRPC-web.
  • HTTP/2 with self-signed certs. Go’s http.Transport disables HTTP/2 if you set TLSClientConfig.InsecureSkipVerify = true without also setting NextProtos = []string{"h2", "http/1.1"}. Debugging headache at 2am.

4b. REST vs GraphQL vs gRPC#

Three API styles, three different philosophies. The interviewer usually isn’t asking “which is best” — they want to see you reason about the trade-off for the problem at hand.

RESTGraphQLgRPC
TransportHTTP/1.1 or 2HTTP POST (usually /graphql)HTTP/2
SerializationJSON (usually)JSONProtobuf (binary)
SchemaOpenAPI (optional)SDL (required).proto (required)
Endpoint shapemany resource URLssingle endpointRPC methods per service
Who picks the fields?serverclientserver
Over-/under-fetchingeasy to hitsolvedsolved per method
Streamingchunked / SSEsubscriptions (via WS)native: server / client / bidi
Browser-friendlyyesyesno (needs gRPC-Web or Connect)
Toolingcurl, Postman, every langany GraphQL clientprotoc + code generation
CachingHTTP cache works out of the boxhard; client-side libs (Relay, Apollo)none built-in; app layer
Best fitpublic APIs, CRUD, docs-as-productmany clients, varied views on same datainternal microservices, high-throughput

The over-fetching / under-fetching problem#

The single clearest argument for GraphQL. Imagine a mobile screen that needs user name, last order ID, and unread notification count.

flowchart TB
    Need(["📱 Mobile needs: <b>name</b> · <b>lastOrderId</b> · <b>unreadCount</b>"])

    subgraph REST ["🔁 REST — 3 round trips"]
        direction TB
        R1["GET /users/42<br/><i>returns 20 fields, keeps 1</i>"]
        R2["GET /users/42/orders?limit=1<br/><i>returns full Order, keeps 1 id</i>"]
        R3["GET /users/42/notifications/unread-count"]
        R1 --> R2 --> R3
    end

    subgraph GQL ["⚡ GraphQL — 1 round trip, exact shape"]
        direction TB
        G1["POST /graphql<br/>user(id: 42) &#123; name, lastOrder &#123; id &#125;, unreadCount &#125;"]
    end

    subgraph GRPC ["🛰️ gRPC — 1 round trip, server-defined shape"]
        direction TB
        P1["UserService.GetProfile(id=42)<br/><i>server-defined aggregate method</i>"]
    end

    Need --> REST
    Need --> GQL
    Need --> GRPC

    classDef need fill:#fef3c7,stroke:#d97706,stroke-width:2px,color:#78350f;
    classDef rest fill:#fee2e2,stroke:#dc2626,stroke-width:1.5px,color:#7f1d1d;
    classDef gql fill:#dcfce7,stroke:#16a34a,stroke-width:1.5px,color:#14532d;
    classDef grpc fill:#e0e7ff,stroke:#4f46e5,stroke-width:1.5px,color:#312e81;
    class Need need
    class R1,R2,R3 rest
    class G1 gql
    class P1 grpc

REST gets you the JSON but wastes bandwidth (orders contains 20 fields you don’t need) or demands many round-trips. GraphQL lets the client ask for exactly what it wants. gRPC solves it too, but by defining a server-side aggregate method — if the mobile and web teams want different shapes you end up with GetProfileForWeb and GetProfileForMobile, which is fine for a handful of clients but doesn’t scale like GraphQL does.

REST, done well#

The mental model is resources (nouns) acted on by HTTP methods (verbs). URLs are hierarchical; methods are the verbs; status codes are the outcome.

GET /v1/users list
GET /v1/users/42 read
POST /v1/users create
PUT /v1/users/42 replace (idempotent)
PATCH /v1/users/42 partial update
DELETE /v1/users/42 delete
GET /v1/users/42/orders sub-resource

Conventions that save you from bike-shed arguments:

  • Cursor pagination?cursor=eyJpZCI6Mjc...&limit=50. Not offset/limit — that’s O(N) on the DB.
  • Filter/sort as query params — ?status=active&sort=-createdAt.
  • Versioning — keep it in the path (/v1/, /v2/). Header-based versioning is clever and will bite you during debugging.
  • Errors — RFC 7807 application/problem+json: {"type":"...", "title":"...", "detail":"...", "instance":"..."}. Stop inventing shapes.

HATEOAS (hypermedia links in responses) is the theoretically pure form but is almost never shipped. If the interviewer asks, explain what it is and note most production APIs don’t bother.

GraphQL: one endpoint, client-shaped responses#

The server publishes a typed schema; the client sends a query describing the shape it wants; the runtime walks the query and calls resolvers to fetch each field.

type User {
id: ID!
name: String!
email: String!
orders(limit: Int = 10): [Order!]!
unreadCount: Int!
}
type Order {
id: ID!
total: Money!
items: [LineItem!]!
}
type Query {
user(id: ID!): User
}

Client sends:

query Profile($id: ID!) {
user(id: $id) {
name
orders(limit: 1) { id total { amount currency } }
unreadCount
}
}

The famous trap — N+1 queries. The orders resolver is called once per User. If you’re listing 50 users and blindly loop, that’s 50 DB round-trips. The fix is a DataLoader — batches + caches per-request:

// pseudo-Go: inside an HTTP request-scoped loader
loader := dataloader.New(func(ctx context.Context, ids []int) []*Order {
// one SQL: SELECT ... WHERE user_id = ANY($1)
return db.OrdersByUserIDs(ctx, ids)
})
// Each resolver call just does: loader.Load(userID) → coalesced into 1 query

Other GraphQL-isms to have an answer for:

  • Mutations are a separate root type; they execute sequentially (not parallel like Query fields).
  • Subscriptions push server → client; transport is usually WebSocket.
  • Persisted queries — client registers queries at build time; at runtime it only sends the query ID. Saves bandwidth, forbids arbitrary queries, defuses the “malicious client writes an expensive query” attack.
  • Caching is the hardest part. Apollo Client normalizes objects by __typename + id client-side; server-side is usually cache-miss territory unless you’re doing persisted queries + HTTP cache headers.

gRPC: typed RPCs over HTTP/2#

You write a .proto, the compiler generates typed client + server stubs in every language you use.

syntax = "proto3";
package user.v1;
service UserService {
rpc GetProfile(GetProfileRequest) returns (UserProfile);
rpc WatchProfile(GetProfileRequest) returns (stream UserProfile); // server stream
rpc ImportUsers(stream UserInput) returns (ImportSummary); // client stream
rpc Chat(stream ChatMessage) returns (stream ChatMessage); // bidirectional
}
message GetProfileRequest { string user_id = 1; }
message UserProfile {
string user_id = 1;
string name = 2;
string email = 3;
int32 unread_count = 4;
}

Go server, unary method:

type userServer struct {
pb.UnimplementedUserServiceServer
db *sql.DB
}
func (s *userServer) GetProfile(
ctx context.Context, req *pb.GetProfileRequest,
) (*pb.UserProfile, error) {
// context carries the client's deadline + cancellation + metadata
var p pb.UserProfile
err := s.db.QueryRowContext(ctx,
`SELECT user_id, name, email, unread_count FROM users WHERE user_id=$1`,
req.GetUserId(),
).Scan(&p.UserId, &p.Name, &p.Email, &p.UnreadCount)
if err == sql.ErrNoRows {
return nil, status.Errorf(codes.NotFound, "user %s not found", req.GetUserId())
}
if err != nil {
return nil, status.Errorf(codes.Internal, "db: %v", err)
}
return &p, nil
}

Go client call, with deadline:

conn, err := grpc.NewClient("user-svc:50051",
grpc.WithTransportCredentials(insecure.NewCredentials()))
if err != nil { return err }
defer conn.Close()
client := pb.NewUserServiceClient(conn)
ctx, cancel := context.WithTimeout(ctx, 300*time.Millisecond)
defer cancel()
profile, err := client.GetProfile(ctx, &pb.GetProfileRequest{UserId: "42"})

The four streaming modes — interviewers love this picture:

sequenceDiagram
    autonumber
    participant C as 🖥️ Client
    participant S as 🛰️ Server

    rect rgb(219, 234, 254)
    Note over C,S: 1️⃣ Unary — classic request / response
    C->>+S: GetProfile(id=42)
    S-->>-C: UserProfile
    end

    rect rgb(220, 252, 231)
    Note over C,S: 2️⃣ Server streaming — one request, many responses
    C->>+S: WatchProfile(id=42)
    S-->>C: UserProfile v1
    S-->>C: UserProfile v2
    S-->>-C: UserProfile v3 ...
    end

    rect rgb(254, 243, 199)
    Note over C,S: 3️⃣ Client streaming — many requests, one summary
    C->>+S: ImportUsers (user_1)
    C->>S: ImportUsers (user_2)
    C->>S: ImportUsers (user_3)
    S-->>-C: ImportSummary (count=3)
    end

    rect rgb(237, 214, 255)
    Note over C,S: 4️⃣ Bidirectional — full-duplex, interleaved
    C->>+S: Chat (hello)
    S-->>C: Chat (hi)
    C->>S: Chat (how are you)
    S-->>-C: Chat (good)
    end

Why gRPC wins for internal microservices:

  • Protobuf is small (1.5-5× smaller than JSON on the wire) and fast to marshal.
  • HTTP/2 multiplexing + long-lived connections → low latency + good head-of-line story within a service mesh.
  • context.Context — deadlines and cancellations propagate across service hops out of the box.
  • Status codes are a closed set (codes.NotFound, codes.DeadlineExceeded, …), not free-form strings.

Where gRPC hurts:

  • Browsers can’t speak gRPC natively (needs trailers HTTP/2 makes awkward in the browser). Solutions: gRPC-Web (proxy translates) or Connect (gRPC-compatible, works over HTTP/1.1 too).
  • Bigger learning curve — protoc toolchain, codegen in every language, backward-compat discipline (reserved, field numbers never reused).
  • Observability is trickier than REST (no URL pattern in logs; need OTel/tracing from day one).

Decision rubric#

Pick REST when:

  • The API is public or consumed by many external devs.
  • Humans browse it (Postman, curl, docs).
  • You want the HTTP cache to do real work (CDN, browser).
  • CRUD on resources is most of what you’re doing.

Pick GraphQL when:

  • Many clients (web, iOS, Android) need different slices of the same underlying data graph.
  • Backends-for-frontends would otherwise proliferate.
  • The team can invest in schema review, DataLoader discipline, and persisted-query infra.

Pick gRPC when:

  • Internal service-to-service traffic, especially polyglot teams (Go + Python + Java).
  • Strong typing across languages is worth the toolchain cost.
  • Streaming or low-latency RPC is a first-class need.
  • You have a service mesh, tracing, and observability to absorb the operational tax.

Interview gotchas for §4b#

  • REST isn’t an RFC. There’s no committee defining “correct REST.” Different teams mean different things. Lead with your own definition.
  • GraphQL security surface. A single expressive query can be a DDoS primitive — one field can resolve into 10,000 DB reads. Production deployments need query depth limits, query cost analysis, persisted queries, and rate-limiting by user, not by endpoint.
  • gRPC deadline inheritance. If a GetProfile handler calls three downstream services and just passes along the same context, the slowest of the three sees the full 300ms. Budgets out: you need to subtract expected work from each leg (or at least be intentional about it).
  • Version drift in protobuf. Once a .proto is deployed, you cannot reuse a field number. reserved 7, 9; prevents someone from re-assigning later. Forgetting this breaks wire compatibility silently.
  • “Why not just JSON-RPC?” — a valid interview probe. JSON-RPC is lighter-weight than gRPC but lacks streaming, codegen, and HTTP/2 flow-control. Fine for a small internal tool, not for a service mesh.

4c. SSE vs WebSocket vs WebRTC#

Three ways to do “real-time.” The interviewer wants to see you pick the simplest one that solves the problem — not the coolest.

Start with the decision tree#

flowchart TD
    Q(["Do you need <b>real-time</b> updates?"])
    D1{"Direction?"}
    D2{"Payload type?"}
    D3{"Latency bound?"}

    POLL(["Long-polling / plain HTTP is fine ✅"])
    SSE(["<b>SSE</b><br/>server → client<br/>text only, auto-reconnect"])
    WS(["<b>WebSocket</b><br/>full-duplex<br/>text or binary"])
    RTC(["<b>WebRTC</b><br/>peer-to-peer<br/>sub-100ms media + data"])

    Q --> |no, updates can wait seconds| POLL
    Q --> |yes| D1
    D1 --> |server → client only| SSE
    D1 --> |both directions| D2
    D2 --> |text / structured| WS
    D2 --> |audio / video / low-latency data| D3
    D3 --> |p2p · every ms matters| RTC
    D3 --> |OK with a server relay| WS

    classDef question fill:#fef3c7,stroke:#d97706,stroke-width:2px,color:#78350f;
    classDef answer fill:#dcfce7,stroke:#16a34a,stroke-width:1.5px,color:#14532d;
    classDef fallback fill:#f3f4f6,stroke:#6b7280,stroke-width:1.5px,color:#1f2937;
    class Q,D1,D2,D3 question
    class SSE,WS,RTC answer
    class POLL fallback

One-line rule of thumb: SSE for feeds, WebSocket for chat, WebRTC for media.

SSE — the boring answer that’s often right#

Server-Sent Events is just an HTTP response that never ends. The server holds the connection open and writes data: ...\n\n chunks whenever it has something to say. The browser has a built-in EventSource API that does reconnect + last-event-id for you.

sequenceDiagram
    autonumber
    participant C as 🖥️ Browser
    participant S as 🌐 Server

    rect rgb(219, 234, 254)
    Note over C,S: ① Open the stream — one HTTP request, never-ending response
    C->>+S: GET /stream · Accept: text/event-stream
    S-->>C: 200 OK · Content-Type: text/event-stream<br/>Cache-Control: no-store · Connection: keep-alive
    end

    rect rgb(220, 252, 231)
    Note over C,S: ② Server pushes events whenever it wants
    S-->>C: event: price · data: &#123;BTC: 67123&#125;
    S-->>C: event: price · data: &#123;BTC: 67140&#125;
    S-->>C: 💓 : heartbeat (comment line keeps proxies awake)
    S-->>-C: event: price · data: &#123;BTC: 67089&#125;
    end

    rect rgb(254, 226, 226)
    Note right of C: ⚠️ connection drops
    end

    rect rgb(254, 243, 199)
    Note over C,S: ③ EventSource auto-reconnects with Last-Event-ID
    C->>S: GET /stream · Last-Event-ID: 42
    end

Why it’s underrated:

  • It’s just HTTP — CDNs, proxies, auth cookies, browser DevTools, all already work.
  • EventSource handles reconnect + backoff automatically.
  • id: + Last-Event-ID gives you exactly-once replay for free if the server indexes by ID.

Where it bites:

  • Unidirectional — for an upstream “ack,” use a second POST endpoint. Awkward if you need true bi-di.
  • Text only — send JSON, not binary. Encode binary if you must.
  • HTTP/1.1 6-connection limit per origin. If you open SSE on 7 tabs of your app, the 7th hangs. Fix: use HTTP/2 (same-origin streams are multiplexed).
  • Proxy buffering. Nginx / CDNs love to buffer responses. Disable per-route (proxy_buffering off;, X-Accel-Buffering: no) — otherwise clients see nothing until the server flushes 8 KB.

Go handler:

func priceStream(w http.ResponseWriter, r *http.Request) {
// Headers SSE spec requires.
w.Header().Set("Content-Type", "text/event-stream")
w.Header().Set("Cache-Control", "no-store")
w.Header().Set("Connection", "keep-alive")
// Defeat proxy buffering — crucial for nginx, Cloudflare.
w.Header().Set("X-Accel-Buffering", "no")
// The flusher lets us force chunks out immediately.
flusher, ok := w.(http.Flusher)
if !ok {
http.Error(w, "streaming unsupported", http.StatusInternalServerError)
return
}
// Heartbeat every 15 s so intermediary idle timeouts don't close us.
heartbeat := time.NewTicker(15 * time.Second)
defer heartbeat.Stop()
updates := subscribePrices(r.Context()) // chan PriceTick
var id int
for {
select {
case <-r.Context().Done():
return // client disconnected
case <-heartbeat.C:
fmt.Fprint(w, ": ping\n\n") // comment line = keep-alive
flusher.Flush()
case tick, ok := <-updates:
if !ok {
return
}
id++
// 'id:' makes it resumable via Last-Event-ID on reconnect.
fmt.Fprintf(w, "id: %d\nevent: price\ndata: %s\n\n",
id, tick.JSON())
flusher.Flush()
}
}
}

WebSocket — when you need full-duplex#

WebSocket upgrades an HTTP connection into a long-lived, full-duplex, message-framed TCP stream. After the upgrade, the two sides exchange binary or text frames in either direction with almost no per-message overhead.

sequenceDiagram
    autonumber
    participant C as 🖥️ Client
    participant S as 🌐 Server

    rect rgb(219, 234, 254)
    Note over C,S: ① HTTP upgrade — the only HTTP round-trip in a WS session
    C->>+S: GET /ws HTTP/1.1<br/>Upgrade: websocket<br/>Sec-WebSocket-Key: x3JJ…<br/>Sec-WebSocket-Version: 13
    S-->>-C: 🎉 101 Switching Protocols<br/>Upgrade: websocket<br/>Sec-WebSocket-Accept: hash(key + GUID)
    end

    rect rgb(220, 252, 231)
    Note over C,S: ② Full-duplex frames — either side sends any time
    C->>S: TEXT · &#123;action: subscribe, room: 42&#125;
    S-->>C: TEXT · &#123;msg: welcome&#125;
    S-->>C: 🔢 BINARY · 0xCAFE…
    end

    rect rgb(254, 243, 199)
    Note over C,S: ③ Keep-alive control frames
    C->>S: PING
    S-->>C: PONG
    end

    rect rgb(254, 226, 226)
    Note over C,S: ④ Graceful close
    C->>S: CLOSE (1000 normal)
    S-->>C: CLOSE (ack)
    end

Highlights:

  • 101 Switching Protocols is the magic status code — the server accepts the upgrade, TCP stays open, the protocol changes under it.
  • Client frames are masked (XOR with a per-frame key) to defeat cache-poisoning attacks on legacy proxies. Server frames are not.
  • TEXT frames must be valid UTF-8; BINARY frames don’t. Use BINARY for protobuf, msgpack, images.
  • PING/PONG frames are control-plane only. Use them; without keepalive, NAT timers + proxy idle timers (~60-120 s) will close the socket silently.

Go, with gorilla/ws (still the de-facto library even after its archival; gobwas/ws is the zero-alloc alternative):

var upgrader = websocket.Upgrader{
ReadBufferSize: 1024,
WriteBufferSize: 1024,
CheckOrigin: func(r *http.Request) bool {
// Enforce same-origin. WebSocket doesn't respect CORS —
// you enforce origin policy yourself here.
return r.Header.Get("Origin") == "https://app.example.com"
},
}
func wsHandler(w http.ResponseWriter, r *http.Request) {
conn, err := upgrader.Upgrade(w, r, nil)
if err != nil {
return // Upgrade already wrote the error response.
}
defer conn.Close()
// Ping/pong: send pings every 30s, expect pongs within 60s.
conn.SetReadDeadline(time.Now().Add(60 * time.Second))
conn.SetPongHandler(func(string) error {
conn.SetReadDeadline(time.Now().Add(60 * time.Second))
return nil
})
go func() {
t := time.NewTicker(30 * time.Second)
defer t.Stop()
for range t.C {
if err := conn.WriteControl(
websocket.PingMessage, nil,
time.Now().Add(5*time.Second),
); err != nil {
return
}
}
}()
for {
msgType, data, err := conn.ReadMessage()
if err != nil {
return // client gone or deadline exceeded
}
// Echo server — replace with real routing.
if err := conn.WriteMessage(msgType, data); err != nil {
return
}
}
}

Gotchas:

  • No built-in auth. The upgrade request is HTTP, so do auth there (cookie / Authorization header). Some browsers strip the Authorization header on WS upgrades — use a cookie or a token-in-URL.
  • CheckOrigin is opt-in. Forget it and you’ve just built CSRF-over-WebSocket.
  • No request / response semantics. You send frames and hope. Implement a correlation-ID in your JSON envelope if you need req/resp on top.
  • Horizontal scaling. WS sockets are sticky to one pod. When pod dies, users reconnect — and may land on a pod with no state. Fan-out via Redis pub/sub or a broker (NATS, Redis Streams, Kafka).

WebRTC — peer-to-peer, media-grade#

WebRTC lets two browsers talk directly (peer-to-peer), bypassing your server for the heavy media streams. You still need a server — the signaling server — to help them find each other and exchange connection info.

sequenceDiagram
    autonumber
    participant A as 👤 Peer A
    participant SIG as 📡 Signaling<br/>(your WS server)
    participant ST as 🛰️ STUN
    participant TN as 🔁 TURN
    participant B as 👤 Peer B

    rect rgb(219, 234, 254)
    Note over A,B: ① Signal (through your server — usually WebSocket)
    A->>+SIG: offer (SDP)
    SIG->>+B: offer (SDP)
    B->>-SIG: answer (SDP)
    SIG->>-A: answer (SDP)
    end

    rect rgb(254, 243, 199)
    Note over A,B: ② ICE — discover reachable addresses
    A->>+ST: "what's my public IP?"
    ST-->>-A: 203.0.113.1:51000
    A->>SIG: ICE candidate (host + reflexive)
    SIG->>B: (forwarded)
    B->>SIG: ICE candidate (its own)
    SIG->>A: (forwarded)
    end

    rect rgb(220, 252, 231)
    Note over A,B: ③ Connectivity check — try direct first
    A-->>B: STUN binding
    B-->>A: STUN binding reply
    Note over A,B: ✅ direct path works → done
    end

    rect rgb(254, 226, 226)
    Note over A,B: ④ Fallback: symmetric NAT blocks direct → TURN relay
    A-->>TN: relay allocate
    TN-->>B: relayed packet
    end

    rect rgb(233, 213, 255)
    Note over A,B: ⑤ Media / data flow — end-to-end encrypted
    A-->>B: 🎥 video · 🎤 audio · 📂 data channel<br/>(DTLS-SRTP or SCTP over DTLS, all over UDP)
    B-->>A: 🎥 video · 🎤 audio · 📂 data channel
    end

Four things you must know to pass a WebRTC interview:

  1. Signaling is your problem. WebRTC doesn’t dictate how the SDP offers/answers get across. People use WebSocket, SSE, long-poll, whatever. Pick one from the earlier part of this section.
  2. NAT traversal is why this is hard. Most peers are behind NAT (§2). STUN tells a peer its public IP. TURN relays traffic when STUN can’t produce a workable path (symmetric NAT, strict firewalls). Budget for ~10-20 % of calls needing TURN — and TURN bandwidth is your bill.
  3. ICE (Interactive Connectivity Establishment) is the algorithm that collects and prioritizes candidate addresses (host, server-reflexive, relayed), pings each pair, and picks the best one that works.
  4. Two flavors of traffic: media streams use DTLS-SRTP (encrypted RTP over UDP). Non-media data uses the DataChannel API, which is SCTP over DTLS over UDP. Both end-to-end encrypted.

When to reach for it (and when not to):

  • ✅ Video/audio calls, screen share, live “remote desktop.”
  • ✅ Sub-50 ms data — multiplayer games, collaborative tools where every ms shows.
  • ❌ Chat. A WebSocket through your server is simpler, cheaper, and easier to moderate.
  • ❌ Anything you need to log / record server-side. P2P means you’re not in the path.

Comparison, side by side#

SSEWebSocketWebRTC
Directionserver → clientbi-directionalp2p (both)
TransportHTTP text streamTCP (post upgrade)UDP (SCTP / SRTP)
Auto-reconnect✅ built-in❌ DIY❌ renegotiate
Binary❌ (text only)
Authcookies / headerssame as HTTPout of band (signaling)
Works through strict proxies✅ (it’s HTTP)mostlyoften needs TURN
Infra complexitylowestmediumhighest
Sample use casesstock tickers, log tails, AI token streamchat, dashboards, collaborative cursorsZoom, Meet, Discord voice

Interview gotchas for §4c#

  • “Why not long-poll?” — a classic warm-up. Long-polling works but each update is a new TCP + TLS handshake. For a dozen updates/sec, SSE/WS are dramatically cheaper.
  • Scale the fan-out, not the socket. For 1 M concurrent WebSocket users, the limit isn’t TCP — it’s how fast you can broadcast a message to 1 M sockets. Keep a per-room subscriber index, fan-out via Redis pub/sub or Kafka, pin users to regions.
  • AI token streaming. Both SSE and WebSocket work. Most LLM APIs (OpenAI, Anthropic) ship SSE — it’s simpler, and the stream is strictly server → client.
  • wss:// is mandatory in production. Mobile carriers and corporate proxies routinely strip or block plain ws://.
  • WebRTC without a TURN budget is a demo. Your team-coffee prototype works in the office because everyone’s on the same NAT. Real users need TURN, and TURN bandwidth costs real money.

5. Load Balancing#

A load balancer is the thing that lets you say “run N copies of my service” instead of “run my service.” It does three jobs at once:

  1. Horizontal scaling — spread load across replicas.
  2. Availability — health-check backends, take dead ones out of rotation.
  3. Deployment flexibility — gate traffic into new versions for blue/green, canary, rolling.

Every system-design interview touches one of these three.

Client-side vs dedicated load balancing#

Two fundamentally different architectures, with very different failure modes.

flowchart LR
    subgraph CLIENT ["🧭 Client-side LB"]
        direction TB
        C1["Client<br/>(with registry cache)"]
        REG[("Service<br/>registry<br/>(Consul, etcd,<br/>k8s EndpointSlice)")]
        B1["Backend A"]
        B2["Backend B"]
        B3["Backend C"]
        C1 -.refresh.-> REG
        C1 -->|picks a peer| B1
        C1 --> B2
        C1 --> B3
    end

    subgraph DED ["🏗️ Dedicated LB"]
        direction TB
        C2[Client] --> VIP["Load balancer<br/>(nginx · Envoy · ALB)"]
        VIP --> D1[Backend A]
        VIP --> D2[Backend B]
        VIP --> D3[Backend C]
        HC[["health<br/>checks"]] -.active.-> D1
        HC -.-> D2
        HC -.-> D3
        VIP --- HC
    end

    classDef client fill:#fef3c7,stroke:#d97706,stroke-width:1.5px,color:#78350f;
    classDef ded fill:#dbeafe,stroke:#3b6fd6,stroke-width:1.5px,color:#0f172a;
    classDef backend fill:#dcfce7,stroke:#16a34a,stroke-width:1.5px,color:#14532d;
    classDef infra fill:#e9d5ff,stroke:#7c3aed,stroke-width:1.5px,color:#4c1d95;
    class C1,C2 client
    class VIP,HC ded
    class B1,B2,B3,D1,D2,D3 backend
    class REG infra
Client-sideDedicated
Who picks the backend?the client librarya box in the middle
Extra network hop?noyes
Failure blast radiusone client affectedLB down = everything affected
Health-check workevery clientcentralized at the LB
Best fitinternal RPC (gRPC, Finagle, mesh sidecars)HTTP from unknown clients (browsers, mobile)
Typical infraConsul / etcd + client libraryALB / NLB / nginx / Envoy / k8s Service

Kubernetes Service objects are quietly a distributed dedicated LB: each node’s kube-proxy programs iptables/IPVS rules to DNAT traffic to a healthy pod, so there is no single chokepoint box. That’s a nice hybrid — client code talks to a single VIP, but the data plane is per-node.

L4 vs L7 — the axis you must know cold#

L4 (transport)L7 (application)
InspectsIP + port + TCP flagsHTTP method, path, headers, cookies, gRPC metadata
Routing rules”all :443 → pool X""/api/v2/* → pool A, cookie beta=1 → pool B”
TLSpassthrough (SNI-only)terminate + re-inspect
CPU costlowhigh (parse each request)
Connection reusetransparentLB owns the pool to backends
Typical productsAWS NLB, HAProxy mode tcp, IPVSALB, nginx, Envoy, Traefik, Kong

Rule of thumb: if you need header-based routing, canary by cookie, path rewrites, or per-route rate limits → L7. If you need raw throughput for arbitrary TCP (Redis, Postgres replicas, WebSocket pass-through) → L4.

Balancing algorithms#

Assume 4 backends. Each algorithm decides where request N+1 goes.

flowchart LR
    R[["Incoming requests<br/>1 · 2 · 3 · 4 · 5 · 6 · 7 · 8 · 9 · 10"]]

    subgraph RR ["🔁 Round-robin"]
        direction TB
        A1["A: 1,5,9"]
        A2["B: 2,6,10"]
        A3["C: 3,7"]
        A4["D: 4,8"]
    end
    subgraph WRR ["⚖️ Weighted (A=3 · B=1 · C=1 · D=1)"]
        direction TB
        B1["A: 1,2,3,7,8,9"]
        B2["B: 4,10"]
        B3["C: 5"]
        B4["D: 6"]
    end
    subgraph LC ["📊 Least-connections"]
        direction TB
        C1["A: 3 open"]
        C2["B: 4 open"]
        C3["C: <b>1 open ← next req</b>"]
        C4["D: 2 open"]
    end
    subgraph P2C ["🎲 Power of two choices"]
        direction TB
        D1["pick 2 random:<br/>C (1 open) vs A (3 open)"]
        D2["send to C"]
    end

    R --> RR
    R --> WRR
    R --> LC
    R --> P2C

    classDef hub fill:#fef3c7,stroke:#d97706,stroke-width:2px,color:#78350f;
    classDef ok fill:#dcfce7,stroke:#16a34a,stroke-width:1.5px,color:#14532d;
    classDef hot fill:#fee2e2,stroke:#dc2626,stroke-width:1.5px,color:#7f1d1d;
    classDef best fill:#dbeafe,stroke:#3b6fd6,stroke-width:2px,color:#0f172a;
    class R hub
    class A1,A2,A3,A4,B1,B2,B3,B4 ok
    class C1,C2,C4,D1 ok
    class C3,D2 best

Quick take on each:

  • Round-robin — zero state; pathological when backends have different capacity or when request cost varies.
  • Weighted round-robin — tell the LB “backend A is 3× the size,” traffic splits 3:1:1:1. Typical during canary ramp.
  • Least connections — usually the right default for long-lived HTTP and request-response with long tail. Requires the LB to count in-flight.
  • Power of two choices (P2C) — pick two backends at random, send to the one with fewer active requests. Surprisingly close to optimal, no global state needed. Used by Envoy, Finagle, NGINX Plus.
  • IP hash / session affinity (sticky sessions) — hashes source IP (or a cookie) to a backend. Use only when the app has in-memory session state; prefer refactoring the state out.

Consistent hashing, for cache locality and stateful services#

When each backend holds different data (a sharded cache, a stateful computation, a per-user queue) you can’t send requests to just anyone — the answer is only on one specific node. Naive hash(key) % N sends 100 % of keys to new nodes when N changes. Consistent hashing shifts only ~1/N.

flowchart LR
    subgraph RING ["Hash ring (mod 2³²)"]
        direction TB
        A["🟩 Backend A<br/>vnodes @ 0, 120, 250"]
        B["🟦 Backend B<br/>vnodes @ 60, 180, 310"]
        C["🟧 Backend C<br/>vnodes @ 90, 200, 340"]
    end

    K1["key 'user:42' → hash 85"] -->|"nearest clockwise = 90"| C
    K2["key 'session:abc' → hash 155"] -->|"180"| B
    K3["key 'order:7' → hash 220"] -->|"250"| A

    classDef key fill:#fef3c7,stroke:#d97706,stroke-width:1.5px,color:#78350f;
    classDef a fill:#dcfce7,stroke:#16a34a,stroke-width:1.5px,color:#14532d;
    classDef b fill:#dbeafe,stroke:#3b6fd6,stroke-width:1.5px,color:#0f172a;
    classDef c fill:#fed7aa,stroke:#ea580c,stroke-width:1.5px,color:#7c2d12;
    class K1,K2,K3 key
    class A a
    class B b
    class C c

Vnodes (virtual nodes) are the trick that makes the distribution uniform. Each physical backend claims ~150 random positions on the ring; a new backend pulls roughly the right slice of keys off its neighbors instead of inheriting whichever contiguous arc happened to be next to its single position. Redis Cluster, DynamoDB, Cassandra, memcached clients — all consistent-hash with vnodes under the hood.

Health checks — active and passive, together#

Two complementary signals. In production you want both.

  • Active — the LB probes GET /healthz every 2–5 s. A failure → take out of rotation after N consecutive misses. Cheap, predictable, but adds background load.
  • Passive — the LB observes live traffic: 5xx responses, connection resets, timeouts. If a backend’s error rate spikes above a threshold, eject it for a cooldown (Envoy’s outlier detection).

The /healthz vs /readyz distinction matters for Kubernetes:

  • /livez (liveness) — “am I alive?” If this fails, k8s restarts the container. Keep it trivial; don’t check DB here (otherwise one slow DB takes out every pod).
  • /readyz (readiness) — “am I ready to serve traffic?” If this fails, k8s takes you out of the Service’s endpoint list but does not restart you. Check dependencies here (DB, caches, downstream auth service).

Slow-start / warm-up. When a new backend comes up, don’t immediately send it 25% of traffic — its connection pool is cold, JIT isn’t warm, the JVM hasn’t C2-compiled, Postgres connection handshakes amortize. Envoy has slow_start_config; nginx has slow_start=30s in upstream. Without it, the first pod in a rolling deploy absorbs a latency spike every time.

TLS termination: where does the crypto happen?#

flowchart LR
    CL[Client] -->|"HTTPS"| LB

    subgraph TERM_LB ["① Terminate at LB"]
        LB1["LB<br/>(holds cert + key)"]
        LB1 -->|"HTTP<br/><i>or</i> re-encrypted HTTPS"| B1[Backend]
    end

    subgraph TERM_BACK ["② Passthrough to backend"]
        LB2["LB<br/>(L4, SNI-only)"]
        LB2 -->|"HTTPS passthrough"| B2["Backend<br/>(holds cert + key)"]
    end

    subgraph TERM_MESH ["③ Mesh sidecar (mTLS)"]
        LB3[LB] -->|"HTTPS"| SC1[Envoy sidecar]
        SC1 -->|"localhost plaintext"| B3[Backend]
        SC1 -.mTLS to peers.- SC2[other sidecars]
    end

    classDef lb fill:#dbeafe,stroke:#3b6fd6,stroke-width:1.5px,color:#0f172a;
    classDef backend fill:#dcfce7,stroke:#16a34a,stroke-width:1.5px,color:#14532d;
    classDef mesh fill:#e9d5ff,stroke:#7c3aed,stroke-width:1.5px,color:#4c1d95;
    class LB1,LB2,LB3 lb
    class B1,B2,B3 backend
    class SC1,SC2 mesh
Terminate at LBPassthroughMesh sidecar
Cert lives onLBevery backendsidecar + LB
L7 routing?❌ (L4 only)
End-to-end encrypted?❌ unless LB→backend re-encrypts✅ (mTLS)
Crypto CPU costcentralized on LBspread to backendson sidecars
Typical usepublic web apppinned certs, compliancezero-trust service mesh

Global / geo load balancing#

The LB box above is regional. Routing users to the closest healthy region happens one level up:

  • DNS-level — the authoritative DNS server returns different A records based on the resolver’s location (AWS Route 53 latency-based routing, GeoDNS). TTL is the enemy: a dead region bleeds traffic until TTLs expire everywhere. Keep global-health TTLs low (30-60 s).
  • Anycast — the same IP is announced from multiple BGP points; routers pick the topologically nearest. CDNs and DNS root servers use anycast. Failover is sub-second because it’s a routing update, not a DNS refresh.
  • App-level — the app decides, possibly overriding DNS. E.g. the web app pins users to their home shard after login.

§6 goes deeper into CDN / regional.

Here’s the minimum viable production stack — one region, the seven boxes an interviewer expects you to name:

flowchart LR
    user@{ shape: circle, label: "User" }
    cdn@{ shape: cloud, label: "CDN" }

    subgraph region ["Region (one of many)"]
        direction TB
        lb@{ shape: stadium, label: "LB (nginx)" }
        app@{ shape: rounded, label: "API (Go)" }
        cache@{ shape: cyl, label: "Redis" }
        db@{ shape: cyl, label: "Postgres" }
    end

    kafka@{ shape: stadium, label: "Kafka (cross-region)" }

    user -->|"HTTPS"| cdn
    cdn -->|"miss → origin"| lb
    lb -->|"route"| app
    app -->|"read/write cache"| cache
    app -->|"sync write"| db
    db -->|"CDC events"| kafka

    classDef neutral fill:#dbeafe,stroke:#3b6fd6,stroke-width:1.5px,color:#0f172a;
    classDef storage fill:#fed7aa,stroke:#ea580c,stroke-width:1.5px,color:#7c2d12;
    classDef highlight fill:#e9d5ff,stroke:#7c3aed,stroke-width:2px,color:#4c1d95;
    class user,cdn,app neutral
    class cache,db,kafka storage
    class lb highlight

Six logos, one focal point (the LB), one flow from left to right. Replicate the region group N times for multi-region; the Kafka bus is the only thing that actually crosses the boundary.

Deployment patterns the LB enables#

  • Blue/green — keep blue running, deploy green alongside, flip all traffic in one LB config change. Roll back = flip back.
  • Canary — weighted routing: 1 % → 10 % → 50 % → 100 % to the new version, watching error rate + latency at each step.
  • Rolling — replace pods N at a time; LB takes the restarting pod out of rotation via readiness.

All three require the LB to separate “in rotation” from “running,” which is exactly what readiness probes + weighted pools give you.

Go — a minimal round-robin L7 reverse proxy#

Illustrative, not production:

package main
import (
"net/http"
"net/http/httputil"
"net/url"
"sync/atomic"
"time"
)
type Backend struct {
URL *url.URL
Healthy atomic.Bool
Proxy *httputil.ReverseProxy
}
type Pool struct {
backends []*Backend
idx atomic.Uint64 // for round-robin
}
func NewPool(urls []string) *Pool {
p := &Pool{}
for _, raw := range urls {
u, _ := url.Parse(raw)
b := &Backend{URL: u}
b.Proxy = httputil.NewSingleHostReverseProxy(u)
// Mark backend unhealthy on transport errors.
b.Proxy.ErrorHandler = func(w http.ResponseWriter, r *http.Request, err error) {
b.Healthy.Store(false)
http.Error(w, "bad gateway", http.StatusBadGateway)
}
b.Healthy.Store(true)
p.backends = append(p.backends, b)
}
return p
}
func (p *Pool) NextHealthy() *Backend {
n := uint64(len(p.backends))
for i := uint64(0); i < n; i++ {
b := p.backends[p.idx.Add(1)%n]
if b.Healthy.Load() {
return b
}
}
return nil
}
func (p *Pool) ServeHTTP(w http.ResponseWriter, r *http.Request) {
b := p.NextHealthy()
if b == nil {
http.Error(w, "no healthy backend", http.StatusServiceUnavailable)
return
}
b.Proxy.ServeHTTP(w, r)
}
// Active health check: every 5s, GET /healthz on each backend.
func (p *Pool) HealthLoop() {
client := &http.Client{Timeout: 1 * time.Second}
t := time.NewTicker(5 * time.Second)
for range t.C {
for _, b := range p.backends {
resp, err := client.Get(b.URL.String() + "/healthz")
b.Healthy.Store(err == nil && resp != nil && resp.StatusCode == 200)
if resp != nil {
resp.Body.Close()
}
}
}
}

Shortcuts this deliberately takes: no retry on 5xx, no P2C, no TLS backend, no hot-reload of the pool. Production LBs (Envoy, nginx) do all of those plus connection pooling, HTTP/2 multiplexing, circuit breaking, and metrics — the reason “just write your own” is almost always the wrong answer.

Interview gotchas for §5#

  • Thundering herd after an LB event. When the LB restarts or many backends come up in a rolling deploy, naive round-robin sends one big wave to the newest pod. Slow-start mode (Envoy, NGINX Plus) ramps weight up over N seconds. Ask about it.
  • Session affinity is a scaling debt. Breaks the symmetry that lets you kill any pod without user impact. If you must, key affinity on user_id (cookie), not source IP — phones roam between Wi-Fi and LTE.
  • L7 LBs are CPU-bound on TLS, not bandwidth. Plan capacity on handshakes-per-second, not gbps. Session resumption + OCSP stapling + HTTP/2 keepalive ease the bill.
  • keepalive_timeout mismatches → 502 storms. If the LB’s idle timeout is longer than the backend’s, the LB will send new requests on sockets the backend is about to close. Always keep LB idle ≤ backend idle − a couple of seconds.
  • DNS TTL vs failover. A 300 s TTL means a dead region bleeds traffic for 300 s worldwide. Lower the TTL before you need to fail over (not during), or put anycast in front of the name.
  • Don’t put an L7 LB in front of another L7 LB unless you love chasing ghost 502s. One hop that parses HTTP is enough; the second hop just adds surface area for header drift, keepalive mismatch, and H1↔H2 translation bugs.

6. Deep Dives — CDN, regional, resilience#

Up to this point every protocol assumed the network cooperates. It doesn’t. Packets drop, regions fail, downstream services slow to a crawl, and your latency budget is shorter than any single hop’s tail. This section is the kit of patterns you reach for to keep a system standing when things go sideways.

The latency / availability math interviewers expect#

Two numbers you should be able to derive on the whiteboard:

SLABudget per yearPer monthPer week
99.0%3.65 days7.2 h1.68 h
99.9% (three 9s)8.76 h43.8 min10.1 min
99.99% (four 9s)52.6 min4.4 min60.5 s
99.999% (five 9s)5.26 min26.3 s6 s

Composition rule. If you depend on N downstream services each at 99.9%, your availability is 0.999^N. Ten dependencies ≈ 99%. That’s why resilience patterns exist — they recover availability from components that individually aren’t good enough.

CDNs: push the edge closer#

A CDN caches static (and increasingly dynamic) responses at points of presence (PoPs) near your users. A request hits the nearest PoP; if cached, served from there. If not, the PoP fetches from origin, caches, returns. First byte goes from ~200 ms trans-Pacific to ~20 ms same-city.

flowchart LR
    user@{ shape: circle, label: "User" }
    pop@{ shape: cloud, label: "Edge PoP" }
    origin@{ shape: rounded, label: "Origin" }
    bucket@{ shape: disk, label: "Object store" }

    user -->|"request"| pop
    pop -->|"cache miss"| origin
    origin -->|"read"| bucket
    bucket -.->|"response"| origin
    origin -.->|"fill cache"| pop
    pop -.->|"response"| user

    classDef neutral fill:#dbeafe,stroke:#3b6fd6,stroke-width:1.5px,color:#0f172a;
    classDef highlight fill:#e9d5ff,stroke:#7c3aed,stroke-width:2px,color:#4c1d95;
    classDef storage fill:#fed7aa,stroke:#ea580c,stroke-width:1.5px,color:#7c2d12;
    class user,origin neutral
    class pop highlight
    class bucket storage

Four knobs that actually matter in an interview:

  1. Cache keys. By default, URL = key. Vary: Accept-Language splits entries by language header. Get the key wrong → serve the wrong user’s content.
  2. TTL vs stale-while-revalidate. Cache-Control: max-age=60, stale-while-revalidate=3600 = serve cached, asynchronously refetch after 60 s, serve a stale copy for up to 1 h if the origin is down. Trade freshness for resilience.
  3. Cache-stampede protection. When a popular URL expires, 10,000 clients hit the origin simultaneously. Fix: request coalescing at the edge (one request fans out), or stale-while-revalidate.
  4. Purging. Pushing a fix? Tag-based invalidation (Cache-Tag: article-42) is far better than URL-based when one piece of content appears in many URLs.

Regional partitioning: blast-radius management#

Single-region = single blast radius. If you lose us-east-1, you lose everything. Three typical multi-region postures:

flowchart LR
    subgraph AP ["Active / Passive"]
        direction TB
        APpri["Primary<br/>100% of traffic"]
        APstd["Standby<br/>0% · replicated"]
        APpri -->|"async replication"| APstd
    end

    subgraph AA ["Active / Active"]
        direction TB
        AA1["Region A<br/>50%"]
        AA2["Region B<br/>50%"]
        AA1 <-->|"bi-directional sync"| AA2
    end

    subgraph CELL ["Cell-based"]
        direction TB
        C1["Cell 1<br/>users 0-33%"]
        C2["Cell 2<br/>users 33-66%"]
        C3["Cell 3<br/>users 66-100%"]
    end

    classDef neutral fill:#dbeafe,stroke:#3b6fd6,stroke-width:1.5px,color:#0f172a;
    classDef ok fill:#dcfce7,stroke:#16a34a,stroke-width:1.5px,color:#14532d;
    classDef warn fill:#fef3c7,stroke:#d97706,stroke-width:1.5px,color:#78350f;
    class APpri neutral
    class APstd warn
    class AA1,AA2 ok
    class C1,C2,C3 neutral
ModelFailover timeData loss riskOperational cost
Active / Passiveminutes (DNS / BGP flip)last async-replication window (seconds to minutes)low — stand-by is cheap
Active / Activeseconds (already serving)merge conflicts if both wrotehigh — multi-master sync
Cell-basedblast radius = one cellonly that cell’s usersmedium — many small cells

Cell-based is AWS’s favorite pattern: each “cell” is a self-contained stack serving a slice of users. When a cell goes bad, only that slice is affected. Adding capacity = adding cells, not scaling one giant region.

Timeouts: the most underappreciated resilience primitive#

If you remember nothing else: every network call needs a timeout. The default behavior of most language HTTP libraries is “wait forever” — which translates to goroutines/threads piling up, connection pools exhausting, the whole service grinding to a halt because of one slow downstream.

Timeout budget — carry a deadline through the call stack, not a timeout per hop. If the top-level request has 800 ms left, an internal call can’t use 500 ms if two more hops come after it.

sequenceDiagram
    autonumber
    participant U as User
    participant A as API (budget: 800ms)
    participant B as Service B (budget: 400ms)
    participant C as Service C (budget: 150ms)

    U->>A: request
    Note over A: deadline = now + 800ms
    A->>B: call (ctx deadline = 400ms)
    Note over B: deadline = now + 400ms
    B->>C: call (ctx deadline = 150ms)
    Note over C: deadline = now + 150ms
    C-->>B: 80ms
    B-->>A: 220ms
    A-->>U: 370ms ✓ within budget

In Go, context.Context does this for you — context.WithDeadline(parent, t) clamps the child’s deadline to whichever is tighter. gRPC and well-behaved HTTP libraries read ctx.Deadline() and fail fast if there’s no time left.

Retries with exponential backoff + jitter#

When a transient error happens (DNS blip, upstream restart, rate-limit), the first instinct is to retry. Done naively, retries turn a 1-second outage into a 30-second outage as clients synchronize their retry storms.

Three rules to retry well:

  1. Retry only idempotent operations. GET, PUT, DELETE — yes. POST — only if you carry an idempotency key.
  2. Cap the number of attempts. Usually 3–5. Infinite retries is a DDoS on yourself.
  3. Exponential backoff + jitter. Double the delay each attempt, then add randomness so N clients don’t all retry at the exact same moment.
sequenceDiagram
    autonumber
    participant C as Client
    participant S as Upstream

    C->>S: GET /resource
    S--xC: 503 Service Unavailable

    Note over C: wait 100-300ms<br/>(base 200ms + jitter)
    C->>S: attempt 2
    S--xC: 503

    Note over C: wait 200-600ms<br/>(base 400ms + jitter)
    C->>S: attempt 3
    S--xC: 503

    Note over C: wait 400-1200ms<br/>(base 800ms + jitter)
    C->>S: attempt 4
    S-->>C: 200 OK ✅

The jitter is crucial. Without it, all clients that failed at t=0 retry simultaneously at t=200, crushing the service again. With full jitter (sleep = rand(0, base * 2^attempt)), retries smear across the whole interval.

Go — the retry that you copy-paste into every new service:

import (
"context"
"errors"
"math/rand"
"time"
)
type retryableFn func(ctx context.Context) error
// retry runs fn with full-jitter exponential backoff, capped at maxAttempts.
// Returns the last error if all attempts fail. Respects the context deadline.
func retry(ctx context.Context, maxAttempts int, base time.Duration, fn retryableFn) error {
var err error
for attempt := 0; attempt < maxAttempts; attempt++ {
err = fn(ctx)
if err == nil {
return nil
}
if !isRetryable(err) || attempt == maxAttempts-1 {
return err
}
// Full jitter: sleep = random in [0, base * 2^attempt]
maxSleep := base * (1 << attempt)
sleep := time.Duration(rand.Int63n(int64(maxSleep)))
select {
case <-time.After(sleep):
case <-ctx.Done():
return errors.Join(err, ctx.Err())
}
}
return err
}
func isRetryable(err error) bool {
// 5xx, connection reset, deadline exceeded — yes.
// 4xx (client error), validation errors — no.
var te interface{ Timeout() bool }
if errors.As(err, &te) && te.Timeout() { return true }
// + protocol-specific heuristics (HTTP status, gRPC codes.Unavailable) …
return false
}

Circuit breaker: stop hitting a dead service#

A circuit breaker wraps calls to a downstream and fails fast when failures cross a threshold. Think of it as a fuse that opens to protect the rest of the system from cascading failures.

stateDiagram-v2
    direction LR
    [*] --> Closed

    Closed --> Open : failures > threshold<br/>in rolling window
    Open --> HalfOpen : after cool-down<br/>(e.g. 30s)
    HalfOpen --> Closed : probe succeeds
    HalfOpen --> Open : probe fails

    note right of Closed
        normal — calls go through
        failure count incremented
    end note
    note right of Open
        fail fast — no calls
        return cached / fallback / 503
    end note
    note right of HalfOpen
        let one probe through
        decide based on result
    end note

    classDef good fill:#dcfce7,stroke:#16a34a,stroke-width:2px,color:#14532d;
    classDef bad fill:#fee2e2,stroke:#dc2626,stroke-width:2px,color:#7f1d1d;
    classDef probe fill:#fef3c7,stroke:#d97706,stroke-width:2px,color:#78350f;
    class Closed good
    class Open bad
    class HalfOpen probe

Why this matters:

  • Without a breaker: a dead downstream takes 10 s to time out per call. At 1000 req/s incoming, that’s 10,000 goroutines stuck in flight → OOM within seconds.
  • With a breaker in Open state: calls fail in microseconds with a known error. Goroutines complete, connection pools stay healthy, upstream clients can be told “this feature is degraded, retry in 30 s” instead of “your entire page timed out.”

Go with sony/gobreaker:

import "github.com/sony/gobreaker/v2"
var paymentBreaker = gobreaker.NewCircuitBreaker[*PaymentResult](gobreaker.Settings{
Name: "payment-gateway",
MaxRequests: 3, // allowed through in Half-Open
Interval: 60 * time.Second, // rolling window
Timeout: 30 * time.Second, // Open → Half-Open cool-down
ReadyToTrip: func(counts gobreaker.Counts) bool {
failureRate := float64(counts.TotalFailures) / float64(counts.Requests)
return counts.Requests >= 20 && failureRate >= 0.5
},
OnStateChange: func(name string, from, to gobreaker.State) {
log.Printf("breaker %s: %s%s", name, from, to)
},
})
func chargeCard(ctx context.Context, req ChargeRequest) (*PaymentResult, error) {
return paymentBreaker.Execute(func() (*PaymentResult, error) {
return paymentClient.Charge(ctx, req)
})
}

The knobs that matter (with reasonable defaults):

  • Failure threshold — failure rate ≥ 50% over a rolling window of 20+ requests. Lower = more sensitive, more false trips.
  • Cool-down — time in Open before trying Half-Open. Usually 15-60 s. Too short = hammer while still sick; too long = slow recovery.
  • Half-Open probe count — how many requests before declaring healthy again. 1-5. Too many = expose more traffic to a still-broken service.

Bulkheads: isolate the blast radius#

Don’t share pools across unrelated features. If payment and search both draw from one http.Transport.MaxIdleConns = 100 pool, a payment-gateway slowdown can starve search. Give each downstream its own pool (separate http.Client), or per-tenant pools for multi-tenant systems.

Paired with a breaker, this means one dead dependency degrades only its feature — the rest of the app keeps working.

Rate limiting, three places#

  1. At the edge (CDN / API gateway) — by IP / API key. Rejects floods before they touch your code.
  2. At the service (middleware) — by user / tenant / endpoint. Enforces per-customer contracts.
  3. At the downstream call site (client-side) — token bucket per upstream. Shields your dependencies from you.

The classic algorithm is token bucket: capacity tokens refilled at rate/sec. Each request costs 1. If no tokens, 429. Bursts up to capacity, steady-state rate. Go’s golang.org/x/time/rate.Limiter is idiomatic.

Putting it together — the resilience stack#

flowchart TB
    REQ(["incoming request"])
    TMO["①  timeout budget<br/>ctx.WithDeadline"]
    RLT["②  rate limiter<br/>token bucket / leaky"]
    BLK["③  bulkhead<br/>per-dependency pool"]
    CB["④  circuit breaker<br/>closed / open / half-open"]
    RET["⑤  retry<br/>exp backoff + jitter"]
    CALL(["downstream call"])
    FALLBACK(["fallback / 503 / cached"])

    REQ --> TMO --> RLT --> BLK --> CB
    CB -->|"closed"| RET --> CALL
    CB -->|"open"| FALLBACK
    CB -->|"half-open"| CALL

    classDef step fill:#dbeafe,stroke:#3b6fd6,stroke-width:1.5px,color:#0f172a;
    classDef good fill:#dcfce7,stroke:#16a34a,stroke-width:1.5px,color:#14532d;
    classDef warn fill:#fef3c7,stroke:#d97706,stroke-width:1.5px,color:#78350f;
    class TMO,RLT,BLK,CB,RET step
    class CALL good
    class FALLBACK warn
    class REQ,FALLBACK highlight

Order matters: timeouts first so nothing can run forever, rate-limit before the expensive work, bulkhead to isolate, breaker to fail fast, retry on retryable errors only, then the actual call. Getting the order wrong (e.g. retrying before the breaker) amplifies bad behavior instead of absorbing it.

Interview gotchas for §6#

  • Retry storms after a mass timeout. If you set retries = 3 on every layer (client, gateway, service, downstream), a single slow call multiplies into 3⁴ = 81 attempts. Pick one layer to retry; the others pass the error up.
  • DELETE isn’t always safe to retry. It’s idempotent semantically (DELETE x twice = same state) but the second DELETE on a not-found resource may return 404 — your caller needs to treat 404-after-DELETE as success, not failure.
  • Breaker + retry interaction. The retry layer inside a breaker means one user retrying 3× accounts for 3 failures in the breaker’s window, tripping it faster than you’d expect. Decide: retry outside the breaker (breaker is the ultimate source of truth) or inside (retries are “part of one operation”).
  • Cold-start after Open → HalfOpen. If your downstream just came back and you send all your traffic in the first second, you kill it again. Use MaxRequests in Half-Open, or add a gradual weight ramp-up (see §5 slow-start).
  • Monitor OnStateChange. A breaker silently tripping is worse than no breaker — users see fallbacks and you don’t know why. Page / log every state transition.

7. Interview cheat sheet#

Three ways to use this section:

  • Night-before review — read only this page, open the diagrams you don’t remember.
  • During the interview — when the interviewer drops a keyword, the tables below have a starting sentence.
  • Mock warm-up — cover the right column and quiz yourself.

Answer template for any networking question#

Strong answers have four beats in order. Missing one is the usual reason a good technical answer feels mediocre:

  1. Frame the trade-off. Name the two or three things we’re choosing between (latency vs throughput, consistency vs availability, correctness vs cost).
  2. Pick a default. Give a concrete choice with numbers where you can.
  3. Call out the failure mode. Say out loud when your default breaks and what you’d reach for next.
  4. Tie to the specific system in the prompt. Generic answers rate generic.

When you hear X → how to open#

The right-hand column is the first sentence, not the whole answer. Expand from there.

Design decisions

Interviewer saysOpen with
”What happens when I type a URL?”DNS → TCP 3-way → TLS 1.3 (1 RTT) → HTTP request → render. First byte floor = one RTT. HTTP/3 folds the handshake into QUIC.
”TCP or UDP for this?”Default TCP for correctness; UDP when ordering/retransmits are the app’s job (DNS, media, QUIC) or when HOL blocking matters.
”REST, GraphQL, or gRPC?”REST for public / CRUD / cacheable. GraphQL when one graph × many client shapes. gRPC for internal polyglot services with streaming.
”WebSocket, SSE, or WebRTC?”SSE for server → client feeds. WebSocket for bi-di text/binary. WebRTC only if you need media or sub-50ms peer-to-peer.
”301 vs 302?“301 = permanent, cached aggressively, pain to roll back. 302 = temporary, not cached. Use 307/308 to preserve the HTTP method.
”How do you encrypt service-to-service traffic?“mTLS, usually delegated to a service mesh sidecar (Envoy / Linkerd). The mesh owns cert rotation, identity, and policy.

Scaling

Interviewer saysOpen with
”Scale this stateless service.”One LB (L4 for raw throughput, L7 for routing / rewriting) fronts N replicas, with state in a DB or cache. Add health checks + slow-start to prevent thundering herd on rollouts.
”Design a rate limiter.”Token bucket: capacity C, refill rate R. Bursts up to C, steady-state R. Key by tenant / user / IP, persist counters in Redis for multi-node consistency.
”Design for 1 M concurrent connections.”The bottleneck is fan-out, not the sockets themselves. Per-room subscriber index, pub/sub broker (Redis / Kafka / NATS) to broadcast, region-pinned pods, connection-count health signal.
”Deploy with zero downtime.”Readiness gates rotation. Rolling replaces N pods at a time; blue/green keeps both versions hot and flips the LB; canary shifts traffic by weight (1% → 10% → 100%) while watching error rate.
”Multi-region strategy?”Active/passive if the cost of replication lag is tolerable; active/active if the app can handle conflict resolution; cell-based when blast radius is the primary concern.

Failure + resilience

Interviewer saysOpen with
”Postgres goes down — what happens?”Clients carry deadlines; a circuit breaker opens after threshold so we fail fast instead of piling up goroutines. Serve cached / read-replica if the endpoint tolerates it; return a structured degradation (503 with a Retry-After) otherwise.
”Why are your latencies spiking?”Separate p50 from p99 first. Likely suspects: GC pauses, connection pool saturation, downstream tail latency, TCP retransmits on a flaky path, or a cold cache. Instrument with distributed tracing to find the hop.
”Your service keeps 502-ing.”Usually an LB ↔ backend keep-alive mismatch: LB reuses a connection the backend just closed. Align keepalive_timeout (LB < backend) and watch upstream_reset logs.
”One user’s bad request is taking down the service.”You need bulkheads. Separate connection pools per downstream so one slow dependency doesn’t starve the rest; rate-limit per user/tenant, not just globally.
”What’s wrong with retrying every error?”Retry storms. Each layer retries 3× → stacks multiplicatively (3⁴ = 81 attempts). Retry in one place (client), use exponential backoff with full jitter, and only for idempotent operations or calls with an idempotency key.

The one-pager#

If nothing else sticks, memorize this:

DecisionDefaultWhy
TransportTCP for correctness, UDP / QUIC for real-timeTCP head-of-line blocking is at the stream layer
HTTP versionHTTP/2 within a datacenter, HTTP/3 at the edge for mobile/3 rides QUIC → no HOL, connection migration across networks
API styleREST public, gRPC internal, GraphQL when one schema × many clientsEach pattern matches a distinct constraint
Real-timeSSE server→client, WebSocket bi-di, WebRTC media / sub-50msPick the simplest channel that solves the problem
Load balancingL4 for raw throughput, L7 for HTTP-aware routingL7 is CPU-bound on TLS, not bandwidth
LB algorithmLeast-connections as default, P2C when stateless, consistent hashing for shard affinity
Resilience stacktimeout → rate-limit → bulkhead → breaker → retry → callOrder matters — retries before the breaker amplify failures
RetriesExponential backoff with full jitter, cap 3–5 attempts, idempotent onlyPrevents retry storms
Multi-regionCell-based for blast-radius control, active / active for sub-minute RTOActive/passive cheapest, active/active costliest
CachingCache-Control: max-age=N, stale-while-revalidate=MResilience usually beats freshness

Pitfalls to volunteer#

Interviewers reward candidates who surface failure modes before being asked. The list below is short enough to scan the day of; drop one or two where they fit the scenario:

  • POST is not idempotent. Safe retries need an idempotency key propagated through every layer.
  • TIME_WAIT port exhaustion on high-churn outbound clients. Use connection pools; avoid tcp_tw_recycle (deprecated, breaks NAT).
  • Connection: close emitted on every response cripples the client pool.
  • WebSocket without CheckOrigin is CSRF-over-WebSocket waiting to happen.
  • WebRTC without a TURN budget is a demo, not a product — plan for 10–20 % of calls to need the relay.
  • Session affinity is scaling debt. It breaks the symmetry that lets you terminate any pod.
  • keepalive_timeout asymmetry between LB and backend produces 502 storms.
  • DNS TTL during failover. A 300 s TTL means 300 s of bleeding traffic to a dead region.
  • Retry inside a breaker double-counts failures. Decide which layer owns retry semantics.
  • Reusing a protobuf field number silently breaks wire compatibility. Use reserved.
  • Unbounded GraphQL queries are a DoS primitive. Enforce depth limits and persisted queries in production.
  • 0-RTT TLS early data is replayable. Never use it for state-changing requests.
  • HTTP/2 with a self-signed cert in Go needs NextProtos = ["h2", "http/1.1"] — otherwise the client silently falls back to HTTP/1.1.
  • PMTUD black-holing. A middlebox dropping ICMP “fragmentation needed” packets stalls any segment over the path MTU.

Further reading#

In order of depth-per-hour:

  1. System Design Primer — a curated reading list masquerading as a README. Best starting point.
  2. Designing Data-Intensive Applications (Kleppmann). Chapters 5, 6, and 7 on replication, partitioning, and transactions. The implicit syllabus of most system-design rounds.
  3. RFC 9110 (HTTP semantics) and RFC 9114 (HTTP/3) — readable, surprisingly short.
  4. AWS Builders’ Library — essays on the patterns in §5 and §6, written at the scale they were invented for.
  5. highscalability.com — post-mortems and architecture profiles from companies in production.
  6. ByteByteGo’s newsletter — weekly, diagram-heavy, short enough to read on a commute.

Wrapping up#

A working mental model of networking is cumulative. You will not acquire it in one sitting — you will notice one week that a problem at the layer above is easier because you understood the one below. Use this post as the spine; fill the gaps with whatever production mystery you’re chasing that week.

Corrections and sharper phrasings are welcome. Open an issue on the blog’s repo and I’ll update with attribution.

Networking Essentials
https://fuwari.vercel.app/posts/networking-essentials/
Author
Thomas Engineer
Published at
2026-04-20
License
CC BY-NC-SA 4.0