Networking Essentials

Most system-design interviews touch networking. You don’t need to recite RFCs, but you do need to choose the right protocol, explain why, and anticipate the failure modes.

This post is my working cheat sheet — structured for re-reading before an interview, with diagrams, Go snippets, and the trade-offs that actually come up. I rewrote my original HelloInterview notes into seven sections that build on each other: the layer model, then each layer from the wire up to application protocols, then load balancing, then resilience.

How to use this: skim the headings and diagrams first. Second pass, read the “why it matters” paragraphs. Third pass, the code. Don’t memorize — understand the trade-off.

1. Networking 101#

Every network interaction is a stack of responsibilities. Each layer talks only to the one directly above and below it, so you can swap implementations without breaking the others — the same HTTP request works whether it rides on Ethernet, Wi-Fi, or LTE.

Textbooks teach the 7-layer OSI model. In practice everyone uses the 4-layer TCP/IP model because the three top OSI layers collapse into “the application decides.” Know both names for the interview; use the 4-layer one when reasoning.

The 4-layer TCP/IP stack. Each layer wraps the payload from the one above.

The 4-layer TCP/IP stack. Each layer wraps the payload from the layer above.

What each layer actually does#

Layer	Job in one sentence	Unit	Addresses
Application	”What does this message mean?“	message	URLs, hostnames
Transport	”Who on that machine should get it, and is it reliable?“	segment (TCP) / datagram (UDP)	port numbers
Internet	”Which machine on the internet, and how do we route there?“	packet	IP addresses
Link	”How do we put bits on this physical medium?“	frame	MAC addresses

A packet at the link layer is literally a nested envelope: [ Ethernet [ IP [ TCP [ HTTP ... ] ] ] ]. Each device along the route peels off the link-layer envelope to decide where to forward next, then reseals with a new one.

What happens when you type `https://example.com` and press Enter#

This is the single most common warm-up question. The full answer touches every layer:

URL to pixels — the full request path

sequenceDiagram
    autonumber
    participant C as Client
    participant D as DNS
    participant S as Server :443

    rect rgb(254, 249, 195)
    Note over C,D: Phase 1 — DNS resolution
    C->>D: query A example.com
    activate D
    D-->>C: 93.184.216.34
    deactivate D
    end

    rect rgb(219, 234, 254)
    Note over C,S: Phase 2 — TCP 3-way handshake
    C->>S: SYN  seq=x
    activate S
    S-->>C: SYN·ACK  seq=y, ack=x+1
    C->>S: ACK  ack=y+1
    end

    rect rgb(220, 252, 231)
    Note over C,S: Phase 3 — TLS 1.3 handshake (1 RTT)
    C->>S: ClientHello + key share
    S-->>C: ServerHello + cert + Finished
    Note over C,S: both sides derive the session key
    end

    rect rgb(233, 213, 255)
    Note over C,S: Phase 4 — HTTP over TLS
    C->>S: GET / HTTP/1.1
    S-->>C: 200 OK · Content-Type: text/html
    deactivate S
    end

DNS → TCP → TLS → HTTP → app → browser paint. Every system-design answer has to place work somewhere on this line.

A single HTTPS request touches DNS, TCP, TLS, and HTTP. HTTP/2 and /3 fold steps 2 + 3 + 4 into fewer round-trips.

Mental shortcut for the interview:

Resolve — DNS turns example.com into an IP (UDP 53, or TCP 53 for large answers).
Connect — TCP 3-way handshake (SYN, SYN-ACK, ACK) to the IP on port 443.
Secure — TLS handshake negotiates a session key; with TLS 1.3 this is 1 RTT, sometimes 0-RTT on resumption.
Request — the client sends an HTTP request over the encrypted stream.
Respond — the server sends HTML. The browser parses, finds asset URLs, and repeats.

Every one of those steps is a potential failure mode an interviewer can probe. Keep it at the tip of your tongue.

Ports you are expected to know#

Port	Protocol	What
22	TCP	SSH
53	UDP / TCP	DNS
80	TCP	HTTP
443	TCP / UDP	HTTPS (UDP if HTTP/3)
6379	TCP	Redis
5432	TCP	Postgres
9092	TCP	Kafka

2. Network Layer#

The network (L3) layer’s job is to get a packet from a source IP to a destination IP, possibly across many routers. It doesn’t care about ports, connections, or reliability — those are the transport layer’s problem.

IPv4 vs IPv6 in one table#

	IPv4	IPv6
Address size	32 bits	128 bits
Total addresses	~4.3 × 10⁹ (exhausted since 2011)	~3.4 × 10³⁸
Notation	`192.168.1.1`	`2001:db8::1`
Header	variable (20–60 B), checksummed	fixed 40 B, no checksum
NAT needed?	yes, universally	designed to not need it
Packet fragmentation	routers can fragment	only the sender; routers drop + PMTUD
Configuration	DHCP or static	SLAAC (stateless) + DHCPv6

In interviews: IPv6 adoption is slow because NAT + CGNAT let IPv4 limp along, and because dual-stack migration is politically painful. Design for both when you can, deploy behind a load balancer that terminates either.

What’s in an IPv4 header#

You don’t need to memorize byte offsets, but knowing what fields exist explains a lot of real behavior — MTU issues, traceroute output, and why iptables rules reference TTL.

The highlighted fields are the ones you'll reference in interviews: TTL, Protocol, and the addresses. Total length caps at 65,535 bytes.

The two fields that come up the most:

TTL (Time To Live) — a hop counter. Each router decrements it; when it hits 0 the packet is dropped and an ICMP “time exceeded” is sent back. traceroute exploits this by sending probes with TTL=1, 2, 3… and listening for the ICMP replies.
Protocol — tells the receiver how to interpret the payload: 6 for TCP, 17 for UDP, 1 for ICMP.

Routing, in one paragraph#

Routers maintain routing tables that map “destination prefix → next hop.” When a packet arrives, the router looks up the longest matching prefix of the destination IP and forwards to the corresponding next hop. On the internet, routing tables are built dynamically by BGP between autonomous systems. On your laptop, the table has two useful entries — your subnet (192.168.1.0/24 → direct) and everything else (0.0.0.0/0 → your gateway).

1
# See your routing table
2
$ ip route
3
default via 192.168.1.1 dev wlan0
4
192.168.1.0/24 dev wlan0 proto kernel scope link src 192.168.1.42
5

6
# Trace the hops to a destination
7
$ traceroute -n example.com
8
 1  192.168.1.1         1.4 ms
9
 2  100.64.0.1          8.2 ms       # CGNAT inside the ISP
10
 3  203.0.113.5         9.1 ms
11
 ...

NAT: why your home IP is a lie#

Your laptop’s 192.168.1.42 is a private IP, invalid on the public internet. Your router does Network Address Translation: when you send a packet, it rewrites the source IP to the router’s public IP and remembers the translation in a table. When the reply comes back, it rewrites the destination back to 192.168.1.42 and forwards to your laptop.

NAT — private LAN behind one public IP

flowchart LR
    laptop["<img src="/icons/mdi-laptop.svg" alt="mdi:laptop" class="diagram-node-icon" width="24" height="24" style="display:block;margin:0 auto 4px;" /> Your laptop<br/><span class='sub'>192.168.1.42:51000</span>"]:::client
    router["<img src="/icons/mdi-router-wireless.svg" alt="mdi:router-wireless" class="diagram-node-icon" width="24" height="24" style="display:block;margin:0 auto 4px;" /> Router (NAT)<br/><span class='sub'>translation table</span>"]:::highlight
    target["<img src="/icons/mdi-web.svg" alt="mdi:web" class="diagram-node-icon" width="24" height="24" style="display:block;margin:0 auto 4px;" /> example.com<br/><span class='sub'>93.184.216.34:443</span>"]:::external

    laptop -->|"outbound<br/>src 192.168.1.42:51000"| router
    router ==>|"rewritten<br/>src 203.0.113.17:60321"| target
    target -.->|"reply<br/>dst 203.0.113.17:60321"| router
    router -.->|"rewritten<br/>dst 192.168.1.42:51000"| laptop

    classDef highlight fill:#f3ebff,stroke:#7c3aed,stroke-width:1.5px,color:#4c1d95;

classDef client stroke:#64748b,stroke-width:1.75px;
classDef external stroke:#64748b,stroke-width:1.75px;

Your router rewrites source addresses so many private hosts share one public IP; the NAT table reverses the mapping on reply.

One public IP fronts many private hosts by multiplexing on the source port. NAT breaks when two sides both need to initiate — see WebRTC in §4c.

CIDR notation, quick reference#

192.168.1.0/24 means the first 24 bits are the network prefix, leaving 8 bits = 256 addresses (minus 2 for network + broadcast). Memorize these edge cases:

Prefix	Size	Common use
`/32`	1 address	single host
`/24`	256	small subnet, home LAN
`/16`	65,536	corp subnet
`/8`	16 M	legacy class A (`10.0.0.0/8` private)
`/0`	all	default route

RFC 1918 private ranges (never routable on the public internet): 10.0.0.0/8, 172.16.0.0/12, 192.168.0.0/16. 169.254.0.0/16 is link-local (what you get if DHCP fails). 127.0.0.0/8 is loopback.

3. Transport Layer#

IP gets a packet to a host. The transport layer gets it to the right program on that host, using a 16-bit port number. It also decides whether the stream is reliable, ordered, and flow-controlled (TCP) or fire-and-forget (UDP). Everything above this layer — HTTP, gRPC, DNS, SMTP — is just a convention layered on top of one of these two.

TCP: the 3-way handshake#

Before any data flows, TCP opens a connection by exchanging three segments. Each side picks a random initial sequence number (ISN) to defend against blind spoofing, and each side acks the other’s ISN + 1.

TCP 3-way handshake

sequenceDiagram
    autonumber
    participant C as Client
    participant S as Server

    Note over C: state: CLOSED
    Note over S: state: LISTEN
    rect rgb(219, 234, 254)
    C->>S: SYN · seq=x
    activate S
    Note over C: SYN_SENT
    Note over S: SYN_RCVD
    S-->>C: SYN·ACK · seq=y, ack=x+1
    C->>S: ACK · ack=y+1
    deactivate S
    end
    Note over C,S: ✅ ESTABLISHED — 1 full RTT before the first byte of payload

SYN → SYN-ACK → ACK — both sides exchange initial sequence numbers before any data flows. One RTT of unavoidable latency before the first byte.

Three segments, one RTT. Motivates keep-alive, connection pooling, and HTTP/2 multiplexing.

Two interview gotchas on the handshake:

SYN flood. If the server commits memory on every received SYN, an attacker can flood them and exhaust connection tables. The fix is SYN cookies — encode the connection state in the ACK sequence number and allocate memory only after the client’s final ACK proves it came from the real source.
Half-open connection. If the client crashes after the 3-way handshake, the server has no idea. Keep-alive probes (TCP or application-level) exist to detect this; defaulting to “forever-idle sockets are fine” is wrong.

Reliability, built from primitives#

TCP is a reliable, ordered, byte stream. It achieves that on top of an unreliable IP layer with four mechanisms layered on each other:

Sequence numbers on every byte. The receiver reorders out-of-order segments into a contiguous stream.
Cumulative ACKs. ack = N means “I have received everything up to byte N-1.”
Retransmission on timeout. The sender keeps a running estimate of RTT. If no ACK arrives within RTO, resend.
Fast retransmit. If the sender gets three duplicate ACKs (ack = K four times), it infers a single segment was lost and resends without waiting for the RTO.

The sender does not send one segment at a time — it keeps a whole window of bytes in-flight:

TCP reliability primitives

flowchart LR
    acked["<img src="/icons/mdi-check-all.svg" alt="mdi:check-all" class="diagram-node-icon" width="24" height="24" style="display:block;margin:0 auto 4px;" /> Bytes 1–3<br/><span class='sub'>acked</span>"]:::ok
    flight["<img src="/icons/mdi-timer-sand.svg" alt="mdi:timer-sand" class="diagram-node-icon" width="24" height="24" style="display:block;margin:0 auto 4px;" /> Bytes 4–7<br/><span class='sub'>in-flight</span>"]:::step
    avail["<img src="/icons/mdi-send-outline.svg" alt="mdi:send-outline" class="diagram-node-icon" width="24" height="24" style="display:block;margin:0 auto 4px;" /> Bytes 8–9<br/><span class='sub'>can send now</span>"]:::warn
    blocked["<img src="/icons/mdi-cancel.svg" alt="mdi:cancel" class="diagram-node-icon" width="24" height="24" style="display:block;margin:0 auto 4px;" /> Bytes 10–11<br/><span class='sub'>beyond window</span>"]:::bad

    acked ==>|"send base"| flight
    flight ==>|"next byte"| avail
    avail ===|"window edge"| blocked

classDef ok stroke:#16a34a,stroke-width:1.75px;
classDef warn stroke:#d97706,stroke-width:1.75px;
classDef bad stroke:#dc2626,stroke-width:1.75px;
classDef step stroke:#3b6fd6,stroke-width:1.75px;

Sequence numbers order bytes, ACKs confirm receipt, retransmit on timeout; flow control + congestion control throttle the sender to the slowest link.

The sender can have window = min(cwnd, rwnd) bytes in-flight without waiting for ACKs. Every ACK slides the window right; every loss shrinks it.

Flow control vs congestion control#

These sound alike but answer different questions, and interviewers will test whether you know the difference.

	Flow control	Congestion control
Protects	the receiver	the network
Lives on	the receiver advertises `rwnd`	the sender computes `cwnd`
Signal	receiver’s buffer space, sent back in every ACK	packet loss + RTT trends
Classic algorithm	simple: `rwnd` in header	Reno / CUBIC / BBR

Slow start + AIMD in one line each.

Slow start: new connection, cwnd doubles every RTT until loss.
AIMD (Reno / CUBIC after slow start): Additive Increase, Multiplicative Decrease. On each ACK, cwnd += 1/cwnd. On loss, cwnd /= 2.

BBR (Google’s newer algorithm, used on YouTube + many CDNs) breaks from AIMD entirely — it models the path’s bandwidth and RTT directly and ignores loss as the primary signal. Worth mentioning when the interviewer asks about modern TCP behavior under shallow-buffered bufferbloat networks.

The states an application actually touches#

TCP state machine — what the app sees

stateDiagram-v2
    direction TB
    [*] --> CLOSED
    CLOSED --> SYN_SENT: active open / send SYN
    CLOSED --> LISTEN: passive open
    SYN_SENT --> ESTABLISHED: recv SYN-ACK / send ACK
    LISTEN --> SYN_RCVD: recv SYN / send SYN-ACK
    SYN_RCVD --> ESTABLISHED: recv ACK
    ESTABLISHED --> FIN_WAIT_1: close() / send FIN
    ESTABLISHED --> CLOSE_WAIT: recv FIN / send ACK
    FIN_WAIT_1 --> FIN_WAIT_2: recv ACK
    FIN_WAIT_2 --> TIME_WAIT: recv FIN / send ACK
    TIME_WAIT --> CLOSED: 2 × MSL timeout
    CLOSE_WAIT --> LAST_ACK: close() / send FIN
    LAST_ACK --> CLOSED: recv ACK

    class ESTABLISHED ok
    class FIN_WAIT_1,FIN_WAIT_2,CLOSE_WAIT,LAST_ACK,TIME_WAIT warn

From CLOSED through ESTABLISHED to TIME_WAIT. Most app bugs live in CLOSE_WAIT (forgot to close) or TIME_WAIT (too many short-lived sockets).

Simplified TCP state machine. Left branch = active side (initiator). Right branch = passive side (usually the server). Full spec has 11 states; these are the ones you'll reference in a debugging story.

Why TIME_WAIT matters. After an active close, the initiator stays in TIME_WAIT for 2 × MSL (typically 60 s on Linux) before the quadruple (src-ip, src-port, dst-ip, dst-port) can be reused. On a server that opens many outbound connections (a payment gateway, a scraper) this can exhaust ephemeral ports. Mitigations: enable SO_REUSEADDR / net.ipv4.tcp_tw_reuse, use connection pooling, or reduce churn.

UDP: 8 bytes and a shrug#

UDP’s header is the minimum viable protocol.

No seq number. No ack. No state. Each datagram is independent — if it matters to you, you handle it in the app.

What UDP doesn’t do: no handshake, no ordering, no retransmit, no flow or congestion control. What it does do: stay out of your way. That makes it the substrate for DNS (usually 1 packet round-trip, no need for a connection), video/voice (loss is fine, reordering is worse than drop), game state (“where is the tank now” is more useful than “where was it 200 ms ago”), and, since QUIC, modern HTTP itself.

TCP vs UDP, side by side#

	TCP	UDP
Connection	handshake before data	none
Header	20 B minimum, up to 60 B	8 B
Ordering	guaranteed	app’s problem
Reliability	guaranteed	app’s problem
Flow control	yes (`rwnd`)	no
Congestion control	yes (CUBIC / BBR / …)	no (app must be well-behaved)
Latency floor	1 RTT + slow start	1 one-way trip
Typical users	HTTP/1.1, HTTP/2, gRPC, SSH, DB	DNS, QUIC (HTTP/3), games, voice, video

Head-of-line blocking, the reason HTTP/3 exists#

TCP’s ordering guarantee has a dark side. If segment 5 is dropped, segments 6–10 that did arrive must sit in the kernel’s receive buffer until segment 5 is retransmitted and filled in. Everything above TCP — including independent HTTP/2 streams — has to wait. This is head-of-line (HOL) blocking at the transport layer.

HTTP/3 fixes this by building on QUIC (which rides on UDP) and doing its own stream-level ordering: one lost packet stalls only its stream, not all the concurrent streams on the same connection. More on this in §4a.

Decision rubric#

Pick TCP when the correctness of the byte stream matters more than the 1-RTT cost and you’re OK waiting on retransmits:

HTTP/1.1, HTTP/2, gRPC-over-HTTP/2
SSH, SQL wire protocols, Kafka
File transfer / replication

Pick UDP when you can tolerate loss or you need to own reordering yourself:

DNS queries (1 packet, fits in MTU, retry at app level)
QUIC / HTTP/3
Real-time media (WebRTC, VoIP, game state)
Multicast / broadcast (TCP is strictly 1-to-1)

Go snippets#

TCP client with sane timeouts. The default net.Dial has no timeout and will happily hang forever.

1
package main
2

3
import (
4
    "net"
5
    "time"
6
)
7

8
func openConn() (net.Conn, error) {
9
    d := net.Dialer{
10
        Timeout:   3 * time.Second,   // connect timeout
11
        KeepAlive: 30 * time.Second,  // TCP keep-alive probes
12
    }
13
    conn, err := d.Dial("tcp", "example.com:443")
14
    if err != nil {
15
        return nil, err
16
    }
17
    // Deadlines for read/write, refreshed per operation.
18
    _ = conn.SetDeadline(time.Now().Add(10 * time.Second))
19
    return conn, nil
20
}

UDP echo server. The read loop is packet-oriented, not stream-oriented — one ReadFromUDP returns exactly one datagram. Framing is the app’s job.

1
package main
2

3
import (
4
    "log"
5
    "net"
6
)
7

8
func main() {
9
    addr, _ := net.ResolveUDPAddr("udp", ":9000")
10
    conn, err := net.ListenUDP("udp", addr)
11
    if err != nil {
12
        log.Fatal(err)
13
    }
14
    defer conn.Close()
15

16
    buf := make([]byte, 2048) // 1 datagram at a time
17
    for {
18
        n, peer, err := conn.ReadFromUDP(buf)
19
        if err != nil {
20
            log.Printf("read: %v", err)
21
            continue
22
        }
23
        // No framing guarantees: 'n' bytes are one logical message.
24
        if _, err := conn.WriteToUDP(buf[:n], peer); err != nil {
25
            log.Printf("write to %s: %v", peer, err)
26
        }
27
    }
28
}

Gotchas interviewers love#

Nagle’s algorithm. TCP coalesces small writes to reduce packet overhead, which interacts badly with TCP_ACK-delayed receivers. Latency-sensitive apps (interactive SSH, game clients) set TCP_NODELAY to disable it.
TIME_WAIT exhaustion. A high-churn outbound client (think: aggressive HTTP client with no keep-alive) can run out of ephemeral source ports. Reuse connections or bump ip_local_port_range.
MTU / PMTUD blackhole. If a middlebox drops ICMP “fragmentation needed” messages, the sender never learns to shrink its packets and the connection stalls on any segment larger than the path MTU. Common cause of “works on my laptop, times out on corp VPN.”
UDP amplification DDoS. A spoofed 50-byte DNS query can return a 3,000-byte response. Open resolvers and misconfigured NTP / memcached servers are classic reflectors. If you build a UDP service, cap the reply size and rate-limit per source.

4. Application Layer#

Above the transport layer, protocols express what the conversation is about. They’re organized by purpose, not by position in a stack: HTTP is a request/response protocol, SMTP is a mail protocol, gRPC is an RPC system. This section is split into three groups of the ones an interviewer will actually probe — 4a. HTTP family (with TLS), 4b. API styles: REST vs GraphQL vs gRPC, and 4c. Real-time: SSE vs WebSocket vs WebRTC.

4a. HTTP / HTTPS / HTTP/2 / HTTP/3#

HTTP is the protocol of the web — a stateless, text-based request/response protocol on TCP port 80 (or 443 with TLS). “Stateless” is the operative word: every request is self-contained from the server’s point of view. State (sessions, auth) lives in cookies, headers, or the database, not the protocol.

Anatomy of a request#

1
GET /users/42?include=orders HTTP/1.1
2
Host: api.example.com
3
Authorization: Bearer eyJhbGciOi...
4
Accept: application/json
5
Accept-Encoding: gzip, br
6
If-None-Match: "a3f7b9"
7
Connection: keep-alive

Each line is Header: Value. Blank line separates headers from body. The server responds with a status line, headers, and body:

1
HTTP/1.1 200 OK
2
Content-Type: application/json
3
Content-Length: 847
4
ETag: "a3f7b9"
5
Cache-Control: private, max-age=60
6
Connection: keep-alive
7

8
{"id":42,"name":"Ada","orders":[...]}

Status codes you must know cold#

Range	Meaning	Must-know examples
1xx	informational	101 Switching Protocols (WebSocket upgrade)
2xx	success	200 OK · 201 Created · 204 No Content · 206 Partial Content
3xx	redirect / cache	301 Moved Permanently · 302 Found · 304 Not Modified · 307/308 (preserve method)
4xx	client error	400 Bad Request · 401 Unauthorized · 403 Forbidden · 404 Not Found · 409 Conflict · 422 Unprocessable · 429 Too Many Requests
5xx	server error	500 Internal · 502 Bad Gateway · 503 Unavailable · 504 Gateway Timeout

Easy-to-mix-up pair: 401 means “I don’t know who you are” — re-auth and retry. 403 means “I know who you are, and you can’t.”

Idempotency matters#

Method	Idempotent	Safe (read-only)	Body	Cacheable
GET	✅	✅	no	✅
HEAD	✅	✅	no	✅
OPTIONS	✅	✅	no	—
PUT	✅	❌	yes	❌
DELETE	✅	❌	—	❌
POST	❌	❌	yes	rarely
PATCH	❌	❌	yes	❌

The interview gotcha is POST is not idempotent — if the client retries a POST because it didn’t see the response, it can create the same order twice. The canonical fix is an idempotency key: a client-generated unique string the server dedupes on. Stripe, AWS, and every payment-adjacent API does this.

Caching, briefly#

HTTP caching is coordinated between server, browser, and intermediaries (CDN, proxies). The knobs:

Cache-Control: public, max-age=3600 — how long, where.
ETag: "..." / Last-Modified: ... — identity of the current version. Client sends If-None-Match / If-Modified-Since on revalidation; server replies 304 Not Modified and no body.
Vary: Accept-Encoding, Authorization — split cache entries by these request headers.

CDNs use the same primitives — §6 covers the edge story.

HTTPS: HTTP + TLS#

HTTPS is HTTP encrypted with TLS. The TLS handshake runs once when the connection is established, producing symmetric keys for the rest of the connection.

HTTPS — HTTP on top of TLS

sequenceDiagram
    autonumber
    participant C as Client
    participant S as Server

    rect rgb(219, 234, 254)
    Note over C,S: TLS 1.3 — fresh handshake (1 RTT)
    C->>+S: ClientHello<br/>(supported ciphers · key share · SNI)
    S-->>-C: ServerHello · cert · Finished<br/>(picks cipher · sends its key share)
    Note over C,S: 🔑 both sides derive the session key
    C->>S: Finished (encrypted)
    C->>+S: HTTP GET / (encrypted)
    S-->>-C: HTTP 200 OK (encrypted)
    end

    rect rgb(220, 252, 231)
    Note over C,S: TLS 1.3 resumption — 0 RTT (early data)
    C->>+S: ClientHello + <b>early data</b> (encrypted with PSK)
    S-->>-C: ServerHello + response (no handshake wait)
    Note right of S: ⚠️ early data is <br/>replayable — don't use for<br/>state-changing requests
    end

TLS handshake negotiates ciphers, authenticates the server via its certificate, and derives a session key; HTTP rides on the resulting encrypted stream.

What TLS gives you — three things interviewers will ask:

Confidentiality — AEAD ciphers (AES-GCM, ChaCha20-Poly1305) encrypt the payload.
Integrity — the same AEAD MAC detects tampering.
Authentication — the server’s cert, signed by a CA the client trusts, proves you’re talking to example.com, not an attacker.

Common interview probes:

Why is TLS 1.3 faster than 1.2? One RTT vs two. TLS 1.2 separated key-exchange from Finished; TLS 1.3 combines them and removes obsolete ciphers.
What is SNI? Server Name Indication — the hostname the client is asking for, sent unencrypted in ClientHello. Lets one IP host multiple certs. Encrypted Client Hello (ECH) fixes the leak.
0-RTT trade-off? On resumed sessions, the client can send data in its first flight. Great for latency; the data is replayable if the attacker captures and retransmits it. Don’t use 0-RTT for state-changing requests.

HTTP/2: binary, multiplexed, one connection#

HTTP/1.1 opens one TCP connection per in-flight request (browsers cap ~6 per origin). HTTP/2 keeps a single connection and multiplexes many streams over it.

HTTP/2 — streams multiplexed on one connection

graph LR
    subgraph H11 ["HTTP/1.1 — many connections, serial"]
        direction TB
        C1["<img src="/icons/mdi-lan-connect.svg" alt="mdi:lan-connect" class="diagram-node-icon" width="24" height="24" style="display:block;margin:0 auto 4px;" /> TCP #1<br/><span class='sub'>/index.html</span>"]:::bad
        C2["<img src="/icons/mdi-lan-connect.svg" alt="mdi:lan-connect" class="diagram-node-icon" width="24" height="24" style="display:block;margin:0 auto 4px;" /> TCP #2<br/><span class='sub'>/app.js</span>"]:::bad
        C3["<img src="/icons/mdi-lan-connect.svg" alt="mdi:lan-connect" class="diagram-node-icon" width="24" height="24" style="display:block;margin:0 auto 4px;" /> TCP #3<br/><span class='sub'>/style.css</span>"]:::bad
        C4["<img src="/icons/mdi-lan-connect.svg" alt="mdi:lan-connect" class="diagram-node-icon" width="24" height="24" style="display:block;margin:0 auto 4px;" /> TCP #4<br/><span class='sub'>/logo.png</span>"]:::bad
    end

    subgraph H2 ["HTTP/2 — one connection, multiplexed"]
        direction TB
        T["<img src="/icons/mdi-lan.svg" alt="mdi:lan" class="diagram-node-icon" width="24" height="24" style="display:block;margin:0 auto 4px;" /> 1 TCP + TLS<br/><span class='sub'>shared</span>"]:::highlight
        T --> S1["<img src="/icons/mdi-transit-connection-variant.svg" alt="mdi:transit-connection-variant" class="diagram-node-icon" width="24" height="24" style="display:block;margin:0 auto 4px;" /> stream 1<br/><span class='sub'>/index.html</span>"]:::ok
        T --> S2["<img src="/icons/mdi-transit-connection-variant.svg" alt="mdi:transit-connection-variant" class="diagram-node-icon" width="24" height="24" style="display:block;margin:0 auto 4px;" /> stream 3<br/><span class='sub'>/app.js</span>"]:::ok
        T --> S3["<img src="/icons/mdi-transit-connection-variant.svg" alt="mdi:transit-connection-variant" class="diagram-node-icon" width="24" height="24" style="display:block;margin:0 auto 4px;" /> stream 5<br/><span class='sub'>/style.css</span>"]:::ok
        T --> S4["<img src="/icons/mdi-transit-connection-variant.svg" alt="mdi:transit-connection-variant" class="diagram-node-icon" width="24" height="24" style="display:block;margin:0 auto 4px;" /> stream 7<br/><span class='sub'>/logo.png</span>"]:::ok
    end

    classDef highlight fill:#f3ebff,stroke:#7c3aed,stroke-width:1.5px,color:#4c1d95;

    classDef clusterRed fill:transparent,stroke:#dc2626,stroke-dasharray:6 4
    classDef clusterGreen fill:transparent,stroke:#16a34a,stroke-dasharray:6 4
    class H11 clusterRed
    class H2 clusterGreen

classDef ok stroke:#16a34a,stroke-width:1.75px;
classDef bad stroke:#dc2626,stroke-width:1.75px;

Multiple requests share a single TCP connection via interleaved binary frames — no more head-of-line blocking at the connection level.

Key improvements over 1.1:

Binary framing. Every message is split into DATA / HEADERS / SETTINGS frames. Cheap to parse, no ambiguity.
Multiplexing. Many concurrent streams on one connection. No head-of-line blocking at the HTTP layer.
HPACK header compression. Redundant headers (cookies, UA, Host) are table-indexed instead of resent. Huge win for short requests.
Server push. Server can pre-send assets it knows the client will need. (Deprecated by most browsers — misused more often than helpful.)
Stream priorities. Clients can weight streams; used to deliver CSS/JS before images.

The catch. HTTP/2 multiplexing lives above TCP. If one packet is lost, TCP stalls delivery of all streams until retransmission arrives — HOL blocking at the transport layer, exactly the problem we flagged in §3.

HTTP/3: HTTP over QUIC over UDP#

HTTP/3 replaces TCP with QUIC, a transport protocol built on UDP that combines “TCP semantics + TLS 1.3” into one unified handshake with independent streams:

	HTTP/1.1	HTTP/2	HTTP/3
Transport	TCP	TCP	QUIC (UDP)
Security	optional TLS	TLS mandatory (practice)	TLS 1.3 mandatory, integrated
Framing	text	binary frames	binary frames
Multiplexing	no (multiple TCP)	yes (1 TCP)	yes (QUIC streams)
Connection open	TCP + TLS = 2-3 RTT	TCP + TLS = 2-3 RTT	1 RTT (0 RTT on resumption)
Head-of-line blocking	yes	yes, at TCP layer	no — per-stream loss
Connection migration	no (IP change breaks it)	no	yes (connection ID)
Deployed by	Everything	~70% of web	CDNs + big sites, growing

Connection migration is the underrated killer feature: QUIC identifies a connection by an ID in the header, not by the 4-tuple. Your phone switches from Wi-Fi to 5G, the IP changes, TCP would reset — QUIC just keeps going.

Go `http.Client`: the timeouts you must set#

The zero-value http.Client{} has no timeout. A single slow server can hang your entire service. Always configure:

1
package main
2

3
import (
4
    "context"
5
    "net"
6
    "net/http"
7
    "time"
8
)
9

10
// prodClient is a reusable, properly-bounded HTTP client.
11
// One per process — it pools connections internally.
12
var prodClient = &http.Client{
13
    Timeout: 10 * time.Second, // total request budget, covers retries-internal
14
    Transport: &http.Transport{
15
        DialContext: (&net.Dialer{
16
            Timeout:   3 * time.Second, // TCP connect
17
            KeepAlive: 30 * time.Second,
18
        }).DialContext,
19
        TLSHandshakeTimeout:   3 * time.Second,
20
        ResponseHeaderTimeout: 5 * time.Second,
21
        ExpectContinueTimeout: 1 * time.Second,
22
        IdleConnTimeout:       90 * time.Second,
23
        MaxIdleConns:          100,
24
        MaxIdleConnsPerHost:   10, // bump for high-throughput clients
25
        ForceAttemptHTTP2:     true,
26
    },
27
}
28

29
func fetch(ctx context.Context, url string) (*http.Response, error) {
30
    // Prefer request-level context over client Timeout when the deadline
31
    // must propagate across service boundaries.
32
    req, err := http.NewRequestWithContext(ctx, http.MethodGet, url, nil)
33
    if err != nil {
34
        return nil, err
35
    }
36
    req.Header.Set("Accept", "application/json")
37
    req.Header.Set("User-Agent", "myservice/1.0")
38
    return prodClient.Do(req)
39
}

Three timeouts in this snippet and why each exists:

Dialer.Timeout — how long TCP connect can take. Defends against unreachable hosts.
TLSHandshakeTimeout — how long TLS can take after connect.
ResponseHeaderTimeout — how long the server can take to send status + headers. A slow backend blocking here looks like a hung request — this bounds it without cutting off a legitimately large streaming body.

Bonus: request-level deadlines. Prefer ctx, cancel := context.WithTimeout(parent, 800*time.Millisecond) at the call site over mutating the client — the deadline then propagates cleanly through gRPC/HTTP/database layers downstream.

Interview gotchas for §4a#

Connection: close vs keep-alive. HTTP/1.0 closed by default. HTTP/1.1 keep-alive by default. Servers that emit Connection: close on every response will cripple your client’s connection pool.
Cookie scoping. Domain=example.com includes subdomains. Secure restricts to HTTPS. HttpOnly hides from JS. SameSite=Lax is the sane default to block CSRF.
Redirect traps. 301 is cached aggressively. If you deploy 301 /old → /new and later change your mind, clients may never retry. Use 302 or 307 during rollouts.
Content-Length vs Transfer-Encoding: chunked. A response has exactly one. If a reverse proxy (nginx, HAProxy) buffers a chunked response to add Content-Length, latency on streaming endpoints goes up. Turn buffering off at the proxy for SSE/gRPC-web.
HTTP/2 with self-signed certs. Go’s http.Transport disables HTTP/2 if you set TLSClientConfig.InsecureSkipVerify = true without also setting NextProtos = []string{"h2", "http/1.1"}. Debugging headache at 2am.

4b. REST vs GraphQL vs gRPC#

Three API styles, three different philosophies. The interviewer usually isn’t asking “which is best” — they want to see you reason about the trade-off for the problem at hand.

	REST	GraphQL	gRPC
Transport	HTTP/1.1 or 2	HTTP POST (usually `/graphql`)	HTTP/2
Serialization	JSON (usually)	JSON	Protobuf (binary)
Schema	OpenAPI (optional)	SDL (required)	`.proto` (required)
Endpoint shape	many resource URLs	single endpoint	RPC methods per service
Who picks the fields?	server	client	server
Over-/under-fetching	easy to hit	solved	solved per method
Streaming	chunked / SSE	subscriptions (via WS)	native: server / client / bidi
Browser-friendly	yes	yes	no (needs gRPC-Web or Connect)
Tooling	curl, Postman, every lang	any GraphQL client	protoc + code generation
Caching	HTTP cache works out of the box	hard; client-side libs (Relay, Apollo)	none built-in; app layer
Best fit	public APIs, CRUD, docs-as-product	many clients, varied views on same data	internal microservices, high-throughput

The over-fetching / under-fetching problem#

The single clearest argument for GraphQL. Imagine a mobile screen that needs user name, last order ID, and unread notification count.

REST over-fetch vs GraphQL shaped query

flowchart TB
    Need["<img src="/icons/mdi-cellphone.svg" alt="mdi:cellphone" class="diagram-node-icon" width="24" height="24" style="display:block;margin:0 auto 4px;" /> Mobile needs<br/><span class='sub'>name · lastOrderId · unreadCount</span>"]:::highlight

    subgraph REST ["REST — 3 round trips"]
        direction TB
        R1["<img src="/icons/mdi-web.svg" alt="mdi:web" class="diagram-node-icon" width="24" height="24" style="display:block;margin:0 auto 4px;" /> GET /users/42<br/><span class='sub'>20 fields, keeps 1</span>"]:::bad
        R2["<img src="/icons/mdi-web.svg" alt="mdi:web" class="diagram-node-icon" width="24" height="24" style="display:block;margin:0 auto 4px;" /> GET .../orders?limit=1<br/><span class='sub'>full Order, keeps id</span>"]:::bad
        R3["<img src="/icons/mdi-web.svg" alt="mdi:web" class="diagram-node-icon" width="24" height="24" style="display:block;margin:0 auto 4px;" /> GET .../unread-count<br/><span class='sub'>third trip</span>"]:::bad
        R1 --> R2 --> R3
    end

    subgraph GQL ["GraphQL — 1 round trip"]
        direction TB
        G1["<img src="/icons/logos-graphql.svg" alt="logos:graphql" class="diagram-node-icon" width="24" height="24" style="display:block;margin:0 auto 4px;" /> POST /graphql<br/><span class='sub'>exact shape</span>"]:::ok
    end

    subgraph GRPC ["gRPC — 1 round trip"]
        direction TB
        P1["<img src="/icons/mdi-cog.svg" alt="mdi:cog" class="diagram-node-icon" width="24" height="24" style="display:block;margin:0 auto 4px;" /> GetProfile(42)<br/><span class='sub'>server aggregate</span>"]:::step
    end

    Need ==> REST
    Need ==> GQL
    Need ==> GRPC

    classDef highlight fill:#f3ebff,stroke:#7c3aed,stroke-width:1.5px,color:#4c1d95;

    classDef clusterRed fill:transparent,stroke:#dc2626,stroke-dasharray:6 4
    classDef clusterGreen fill:transparent,stroke:#16a34a,stroke-dasharray:6 4
    classDef clusterBlue fill:transparent,stroke:#3b82f6,stroke-dasharray:6 4
    class REST clusterRed
    class GQL clusterGreen
    class GRPC clusterBlue

classDef ok stroke:#16a34a,stroke-width:1.75px;
classDef bad stroke:#dc2626,stroke-width:1.75px;
classDef step stroke:#3b6fd6,stroke-width:1.75px;

REST returns the whole resource; GraphQL lets the client name exactly the fields it needs, trading server complexity for payload efficiency.

REST gets you the JSON but wastes bandwidth (orders contains 20 fields you don’t need) or demands many round-trips. GraphQL lets the client ask for exactly what it wants. gRPC solves it too, but by defining a server-side aggregate method — if the mobile and web teams want different shapes you end up with GetProfileForWeb and GetProfileForMobile, which is fine for a handful of clients but doesn’t scale like GraphQL does.

REST, done well#

The mental model is resources (nouns) acted on by HTTP methods (verbs). URLs are hierarchical; methods are the verbs; status codes are the outcome.

1
GET    /v1/users                   list
2
GET    /v1/users/42                read
3
POST   /v1/users                   create
4
PUT    /v1/users/42                replace (idempotent)
5
PATCH  /v1/users/42                partial update
6
DELETE /v1/users/42                delete
7
GET    /v1/users/42/orders         sub-resource

Conventions that save you from bike-shed arguments:

Cursor pagination — ?cursor=eyJpZCI6Mjc...&limit=50. Not offset/limit — that’s O(N) on the DB.
Filter/sort as query params — ?status=active&sort=-createdAt.
Versioning — keep it in the path (/v1/, /v2/). Header-based versioning is clever and will bite you during debugging.
Errors — RFC 7807 application/problem+json: {"type":"...", "title":"...", "detail":"...", "instance":"..."}. Stop inventing shapes.

HATEOAS (hypermedia links in responses) is the theoretically pure form but is almost never shipped. If the interviewer asks, explain what it is and note most production APIs don’t bother.

GraphQL: one endpoint, client-shaped responses#

The server publishes a typed schema; the client sends a query describing the shape it wants; the runtime walks the query and calls resolvers to fetch each field.

1
type User {
2
  id: ID!
3
  name: String!
4
  email: String!
5
  orders(limit: Int = 10): [Order!]!
6
  unreadCount: Int!
7
}
8

9
type Order {
10
  id: ID!
11
  total: Money!
12
  items: [LineItem!]!
13
}
14

15
type Query {
16
  user(id: ID!): User
17
}

Client sends:

1
query Profile($id: ID!) {
2
  user(id: $id) {
3
    name
4
    orders(limit: 1) { id total { amount currency } }
5
    unreadCount
6
  }
7
}

The famous trap — N+1 queries. The orders resolver is called once per User. If you’re listing 50 users and blindly loop, that’s 50 DB round-trips. The fix is a DataLoader — batches + caches per-request:

1
// pseudo-Go: inside an HTTP request-scoped loader
2
loader := dataloader.New(func(ctx context.Context, ids []int) []*Order {
3
    // one SQL: SELECT ... WHERE user_id = ANY($1)
4
    return db.OrdersByUserIDs(ctx, ids)
5
})
6
// Each resolver call just does: loader.Load(userID) → coalesced into 1 query

Other GraphQL-isms to have an answer for:

Mutations are a separate root type; they execute sequentially (not parallel like Query fields).
Subscriptions push server → client; transport is usually WebSocket.
Persisted queries — client registers queries at build time; at runtime it only sends the query ID. Saves bandwidth, forbids arbitrary queries, defuses the “malicious client writes an expensive query” attack.
Caching is the hardest part. Apollo Client normalizes objects by __typename + id client-side; server-side is usually cache-miss territory unless you’re doing persisted queries + HTTP cache headers.

gRPC: typed RPCs over HTTP/2#

You write a .proto, the compiler generates typed client + server stubs in every language you use.

1
syntax = "proto3";
2
package user.v1;
3

4
service UserService {
5
  rpc GetProfile(GetProfileRequest) returns (UserProfile);
6
  rpc WatchProfile(GetProfileRequest) returns (stream UserProfile);   // server stream
7
  rpc ImportUsers(stream UserInput) returns (ImportSummary);          // client stream
8
  rpc Chat(stream ChatMessage) returns (stream ChatMessage);           // bidirectional
9
}
10

11
message GetProfileRequest { string user_id = 1; }
12

13
message UserProfile {
14
  string user_id = 1;
15
  string name    = 2;
16
  string email   = 3;
17
  int32  unread_count = 4;
18
}

Go server, unary method:

1
type userServer struct {
2
    pb.UnimplementedUserServiceServer
3
    db *sql.DB
4
}
5

6
func (s *userServer) GetProfile(
7
    ctx context.Context, req *pb.GetProfileRequest,
8
) (*pb.UserProfile, error) {
9
    // context carries the client's deadline + cancellation + metadata
10
    var p pb.UserProfile
11
    err := s.db.QueryRowContext(ctx,
12
        `SELECT user_id, name, email, unread_count FROM users WHERE user_id=$1`,
13
        req.GetUserId(),
14
    ).Scan(&p.UserId, &p.Name, &p.Email, &p.UnreadCount)
15
    if err == sql.ErrNoRows {
16
        return nil, status.Errorf(codes.NotFound, "user %s not found", req.GetUserId())
17
    }
18
    if err != nil {
19
        return nil, status.Errorf(codes.Internal, "db: %v", err)
20
    }
21
    return &p, nil
22
}

Go client call, with deadline:

1
conn, err := grpc.NewClient("user-svc:50051",
2
    grpc.WithTransportCredentials(insecure.NewCredentials()))
3
if err != nil { return err }
4
defer conn.Close()
5

6
client := pb.NewUserServiceClient(conn)
7
ctx, cancel := context.WithTimeout(ctx, 300*time.Millisecond)
8
defer cancel()
9

10
profile, err := client.GetProfile(ctx, &pb.GetProfileRequest{UserId: "42"})

The four streaming modes — interviewers love this picture:

gRPC — binary over HTTP/2

sequenceDiagram
    autonumber
    participant C as Client
    participant S as Server

    rect rgb(219, 234, 254)
    Note over C,S: 1️⃣ Unary — classic request / response
    C->>+S: GetProfile(id=42)
    S-->>-C: UserProfile
    end

    rect rgb(220, 252, 231)
    Note over C,S: 2️⃣ Server streaming — one request, many responses
    C->>+S: WatchProfile(id=42)
    S-->>C: UserProfile v1
    S-->>C: UserProfile v2
    S-->>-C: UserProfile v3 ...
    end

    rect rgb(254, 243, 199)
    Note over C,S: 3️⃣ Client streaming — many requests, one summary
    C->>+S: ImportUsers (user_1)
    C->>S: ImportUsers (user_2)
    C->>S: ImportUsers (user_3)
    S-->>-C: ImportSummary (count=3)
    end

    rect rgb(237, 214, 255)
    Note over C,S: 4️⃣ Bidirectional — full-duplex, interleaved
    C->>+S: Chat (hello)
    S-->>C: Chat (hi)
    C->>S: Chat (how are you)
    S-->>-C: Chat (good)
    end

Protobuf-defined schema, HTTP/2 transport, code-generated client and server. The internal service-to-service default at most scale companies.

Why gRPC wins for internal microservices:

Protobuf is small (1.5-5× smaller than JSON on the wire) and fast to marshal.
HTTP/2 multiplexing + long-lived connections → low latency + good head-of-line story within a service mesh.
context.Context — deadlines and cancellations propagate across service hops out of the box.
Status codes are a closed set (codes.NotFound, codes.DeadlineExceeded, …), not free-form strings.

Where gRPC hurts:

Browsers can’t speak gRPC natively (needs trailers HTTP/2 makes awkward in the browser). Solutions: gRPC-Web (proxy translates) or Connect (gRPC-compatible, works over HTTP/1.1 too).
Bigger learning curve — protoc toolchain, codegen in every language, backward-compat discipline (reserved, field numbers never reused).
Observability is trickier than REST (no URL pattern in logs; need OTel/tracing from day one).

Decision rubric#

Pick REST when:

The API is public or consumed by many external devs.
Humans browse it (Postman, curl, docs).
You want the HTTP cache to do real work (CDN, browser).
CRUD on resources is most of what you’re doing.

Pick GraphQL when:

Many clients (web, iOS, Android) need different slices of the same underlying data graph.
Backends-for-frontends would otherwise proliferate.
The team can invest in schema review, DataLoader discipline, and persisted-query infra.

Pick gRPC when:

Internal service-to-service traffic, especially polyglot teams (Go + Python + Java).
Strong typing across languages is worth the toolchain cost.
Streaming or low-latency RPC is a first-class need.
You have a service mesh, tracing, and observability to absorb the operational tax.

Interview gotchas for §4b#

REST isn’t an RFC. There’s no committee defining “correct REST.” Different teams mean different things. Lead with your own definition.
GraphQL security surface. A single expressive query can be a DDoS primitive — one field can resolve into 10,000 DB reads. Production deployments need query depth limits, query cost analysis, persisted queries, and rate-limiting by user, not by endpoint.
gRPC deadline inheritance. If a GetProfile handler calls three downstream services and just passes along the same context, the slowest of the three sees the full 300ms. Budgets out: you need to subtract expected work from each leg (or at least be intentional about it).
Version drift in protobuf. Once a .proto is deployed, you cannot reuse a field number. reserved 7, 9; prevents someone from re-assigning later. Forgetting this breaks wire compatibility silently.
“Why not just JSON-RPC?” — a valid interview probe. JSON-RPC is lighter-weight than gRPC but lacks streaming, codegen, and HTTP/2 flow-control. Fine for a small internal tool, not for a service mesh.

4c. SSE vs WebSocket vs WebRTC#

Three ways to do “real-time.” The interviewer wants to see you pick the simplest one that solves the problem — not the coolest.

Start with the decision tree#

Pick the right streaming primitive

flowchart TD
    Q{{"Need real-time updates?"}}:::ask
    D1{{"Direction?"}}:::ask
    D2{{"Payload type?"}}:::ask
    D3{{"Latency bound?"}}:::ask

    POLL["<img src="/icons/mdi-timer-outline.svg" alt="mdi:timer-outline" class="diagram-node-icon" width="24" height="24" style="display:block;margin:0 auto 4px;" /> Long-polling<br/><span class='sub'>plain HTTP is fine</span>"]:::step
    SSE["<img src="/icons/mdi-arrow-down-bold.svg" alt="mdi:arrow-down-bold" class="diagram-node-icon" width="24" height="24" style="display:block;margin:0 auto 4px;" /> SSE<br/><span class='sub'>server→client, reconnect</span>"]:::ok
    WS["<img src="/icons/mdi-swap-vertical-bold.svg" alt="mdi:swap-vertical-bold" class="diagram-node-icon" width="24" height="24" style="display:block;margin:0 auto 4px;" /> WebSocket<br/><span class='sub'>full-duplex</span>"]:::ok
    RTC["<img src="/icons/mdi-account-multiple.svg" alt="mdi:account-multiple" class="diagram-node-icon" width="24" height="24" style="display:block;margin:0 auto 4px;" /> WebRTC<br/><span class='sub'>p2p sub-100ms</span>"]:::ok

    Q -->|"no, can wait"| POLL
    Q -->|"yes"| D1
    D1 -->|"server → client"| SSE
    D1 -->|"both directions"| D2
    D2 -->|"text / structured"| WS
    D2 -->|"audio / video"| D3
    D3 -->|"every ms matters"| RTC
    D3 -->|"server relay OK"| WS

classDef ok stroke:#16a34a,stroke-width:1.75px;
classDef step stroke:#3b6fd6,stroke-width:1.75px;
classDef ask stroke:#6366f1,stroke-width:1.75px;

SSE for server → client text + auto-reconnect. WebSocket for full-duplex text or binary. WebRTC for sub-100 ms peer-to-peer media and data.

One-line rule of thumb: SSE for feeds, WebSocket for chat, WebRTC for media.

SSE — the boring answer that’s often right#

Server-Sent Events is just an HTTP response that never ends. The server holds the connection open and writes data: ...\n\n chunks whenever it has something to say. The browser has a built-in EventSource API that does reconnect + last-event-id for you.

SSE — Server-Sent Events

sequenceDiagram
    autonumber
    participant C as Browser
    participant S as Server

    rect rgb(219, 234, 254)
    Note over C,S: ① Open the stream — one HTTP request, never-ending response
    C->>+S: GET /stream · Accept: text/event-stream
    S-->>C: 200 OK · Content-Type: text/event-stream<br/>Cache-Control: no-store · Connection: keep-alive
    end

    rect rgb(220, 252, 231)
    Note over C,S: ② Server pushes events whenever it wants
    S-->>C: event: price · data: {BTC: 67123}
    S-->>C: event: price · data: {BTC: 67140}
    S-->>C: 💓 : heartbeat (comment line keeps proxies awake)
    S-->>-C: event: price · data: {BTC: 67089}
    end

    rect rgb(254, 226, 226)
    Note right of C: ⚠️ connection drops
    end

    rect rgb(254, 243, 199)
    Note over C,S: ③ EventSource auto-reconnects with Last-Event-ID
    C->>S: GET /stream · Last-Event-ID: 42
    end

One-way server → client stream over plain HTTP; browser handles auto-reconnect via Last-Event-ID. No WebSocket upgrade, no load-balancer surprises.

Why it’s underrated:

It’s just HTTP — CDNs, proxies, auth cookies, browser DevTools, all already work.
EventSource handles reconnect + backoff automatically.
id: + Last-Event-ID gives you exactly-once replay for free if the server indexes by ID.

Where it bites:

Unidirectional — for an upstream “ack,” use a second POST endpoint. Awkward if you need true bi-di.
Text only — send JSON, not binary. Encode binary if you must.
HTTP/1.1 6-connection limit per origin. If you open SSE on 7 tabs of your app, the 7th hangs. Fix: use HTTP/2 (same-origin streams are multiplexed).
Proxy buffering. Nginx / CDNs love to buffer responses. Disable per-route (proxy_buffering off;, X-Accel-Buffering: no) — otherwise clients see nothing until the server flushes 8 KB.

Go handler:

1
func priceStream(w http.ResponseWriter, r *http.Request) {
2
    // Headers SSE spec requires.
3
    w.Header().Set("Content-Type", "text/event-stream")
4
    w.Header().Set("Cache-Control", "no-store")
5
    w.Header().Set("Connection", "keep-alive")
6
    // Defeat proxy buffering — crucial for nginx, Cloudflare.
7
    w.Header().Set("X-Accel-Buffering", "no")
8

9
    // The flusher lets us force chunks out immediately.
10
    flusher, ok := w.(http.Flusher)
11
    if !ok {
12
        http.Error(w, "streaming unsupported", http.StatusInternalServerError)
13
        return
14
    }
15

16
    // Heartbeat every 15 s so intermediary idle timeouts don't close us.
17
    heartbeat := time.NewTicker(15 * time.Second)
18
    defer heartbeat.Stop()
19

20
    updates := subscribePrices(r.Context()) // chan PriceTick
21

22
    var id int
23
    for {
24
        select {
25
        case <-r.Context().Done():
26
            return // client disconnected
27
        case <-heartbeat.C:
28
            fmt.Fprint(w, ": ping\n\n") // comment line = keep-alive
29
            flusher.Flush()
30
        case tick, ok := <-updates:
31
            if !ok {
32
                return
33
            }
34
            id++
35
            // 'id:' makes it resumable via Last-Event-ID on reconnect.
36
            fmt.Fprintf(w, "id: %d\nevent: price\ndata: %s\n\n",
37
                id, tick.JSON())
38
            flusher.Flush()
39
        }
40
    }
41
}

WebSocket — when you need full-duplex#

WebSocket upgrades an HTTP connection into a long-lived, full-duplex, message-framed TCP stream. After the upgrade, the two sides exchange binary or text frames in either direction with almost no per-message overhead.

WebSocket — full-duplex over an upgraded connection

sequenceDiagram
    autonumber
    participant C as Client
    participant S as Server

    rect rgb(219, 234, 254)
    Note over C,S: ① HTTP upgrade — the only HTTP round-trip in a WS session
    C->>+S: GET /ws HTTP/1.1<br/>Upgrade: websocket<br/>Sec-WebSocket-Key: x3JJ…<br/>Sec-WebSocket-Version: 13
    S-->>-C: 🎉 101 Switching Protocols<br/>Upgrade: websocket<br/>Sec-WebSocket-Accept: hash(key + GUID)
    end

    rect rgb(220, 252, 231)
    Note over C,S: ② Full-duplex frames — either side sends any time
    C->>S: TEXT · {action: subscribe, room: 42}
    S-->>C: TEXT · {msg: welcome}
    S-->>C: 🔢 BINARY · 0xCAFE…
    end

    rect rgb(254, 243, 199)
    Note over C,S: ③ Keep-alive control frames
    C->>S: PING
    S-->>C: PONG
    end

    rect rgb(254, 226, 226)
    Note over C,S: ④ Graceful close
    C->>S: CLOSE (1000 normal)
    S-->>C: CLOSE (ack)
    end

HTTP Upgrade pins a TCP connection for bi-directional frames. Needs sticky routing or pub/sub fan-out to reach pods that hold the other end.

Highlights:

101 Switching Protocols is the magic status code — the server accepts the upgrade, TCP stays open, the protocol changes under it.
Client frames are masked (XOR with a per-frame key) to defeat cache-poisoning attacks on legacy proxies. Server frames are not.
TEXT frames must be valid UTF-8; BINARY frames don’t. Use BINARY for protobuf, msgpack, images.
PING/PONG frames are control-plane only. Use them; without keepalive, NAT timers + proxy idle timers (~60-120 s) will close the socket silently.

Go, with gorilla/ws (still the de-facto library even after its archival; gobwas/ws is the zero-alloc alternative):

1
var upgrader = websocket.Upgrader{
2
    ReadBufferSize:  1024,
3
    WriteBufferSize: 1024,
4
    CheckOrigin: func(r *http.Request) bool {
5
        // Enforce same-origin. WebSocket doesn't respect CORS —
6
        // you enforce origin policy yourself here.
7
        return r.Header.Get("Origin") == "https://app.example.com"
8
    },
9
}
10

11
func wsHandler(w http.ResponseWriter, r *http.Request) {
12
    conn, err := upgrader.Upgrade(w, r, nil)
13
    if err != nil {
14
        return // Upgrade already wrote the error response.
15
    }
16
    defer conn.Close()
17

18
    // Ping/pong: send pings every 30s, expect pongs within 60s.
19
    conn.SetReadDeadline(time.Now().Add(60 * time.Second))
20
    conn.SetPongHandler(func(string) error {
21
        conn.SetReadDeadline(time.Now().Add(60 * time.Second))
22
        return nil
23
    })
24

25
    go func() {
26
        t := time.NewTicker(30 * time.Second)
27
        defer t.Stop()
28
        for range t.C {
29
            if err := conn.WriteControl(
30
                websocket.PingMessage, nil,
31
                time.Now().Add(5*time.Second),
32
            ); err != nil {
33
                return
34
            }
35
        }
36
    }()
37

38
    for {
39
        msgType, data, err := conn.ReadMessage()
40
        if err != nil {
41
            return // client gone or deadline exceeded
42
        }
43
        // Echo server — replace with real routing.
44
        if err := conn.WriteMessage(msgType, data); err != nil {
45
            return
46
        }
47
    }
48
}

Gotchas:

No built-in auth. The upgrade request is HTTP, so do auth there (cookie / Authorization header). Some browsers strip the Authorization header on WS upgrades — use a cookie or a token-in-URL.
CheckOrigin is opt-in. Forget it and you’ve just built CSRF-over-WebSocket.
No request / response semantics. You send frames and hope. Implement a correlation-ID in your JSON envelope if you need req/resp on top.
Horizontal scaling. WS sockets are sticky to one pod. When pod dies, users reconnect — and may land on a pod with no state. Fan-out via Redis pub/sub or a broker (NATS, Redis Streams, Kafka).

WebRTC — peer-to-peer, media-grade#

WebRTC lets two browsers talk directly (peer-to-peer), bypassing your server for the heavy media streams. You still need a server — the signaling server — to help them find each other and exchange connection info.

WebRTC — peer-to-peer with signaling

sequenceDiagram
    autonumber
    participant A as Peer A
    participant SIG as Signaling<br/>(your WS server)
    participant ST as STUN
    participant TN as TURN
    participant B as Peer B

    rect rgb(219, 234, 254)
    Note over A,B: ① Signal (through your server — usually WebSocket)
    A->>+SIG: offer (SDP)
    SIG->>+B: offer (SDP)
    B->>-SIG: answer (SDP)
    SIG->>-A: answer (SDP)
    end

    rect rgb(254, 243, 199)
    Note over A,B: ② ICE — discover reachable addresses
    A->>+ST: "what's my public IP?"
    ST-->>-A: 203.0.113.1:51000
    A->>SIG: ICE candidate (host + reflexive)
    SIG->>B: (forwarded)
    B->>SIG: ICE candidate (its own)
    SIG->>A: (forwarded)
    end

    rect rgb(220, 252, 231)
    Note over A,B: ③ Connectivity check — try direct first
    A-->>B: STUN binding
    B-->>A: STUN binding reply
    Note over A,B: ✅ direct path works → done
    end

    rect rgb(254, 226, 226)
    Note over A,B: ④ Fallback: symmetric NAT blocks direct → TURN relay
    A-->>TN: relay allocate
    TN-->>B: relayed packet
    end

    rect rgb(233, 213, 255)
    Note over A,B: ⑤ Media / data flow — end-to-end encrypted
    A-->>B: 🎥 video · 🎤 audio · 📂 data channel<br/>(DTLS-SRTP or SCTP over DTLS, all over UDP)
    B-->>A: 🎥 video · 🎤 audio · 📂 data channel
    end

Signaling server exchanges SDP + ICE candidates; actual media and data flow peer-to-peer, traversing NAT via STUN/TURN when needed.

Four things you must know to pass a WebRTC interview:

Signaling is your problem. WebRTC doesn’t dictate how the SDP offers/answers get across. People use WebSocket, SSE, long-poll, whatever. Pick one from the earlier part of this section.
NAT traversal is why this is hard. Most peers are behind NAT (§2). STUN tells a peer its public IP. TURN relays traffic when STUN can’t produce a workable path (symmetric NAT, strict firewalls). Budget for ~10-20 % of calls needing TURN — and TURN bandwidth is your bill.
ICE (Interactive Connectivity Establishment) is the algorithm that collects and prioritizes candidate addresses (host, server-reflexive, relayed), pings each pair, and picks the best one that works.
Two flavors of traffic: media streams use DTLS-SRTP (encrypted RTP over UDP). Non-media data uses the DataChannel API, which is SCTP over DTLS over UDP. Both end-to-end encrypted.

When to reach for it (and when not to):

✅ Video/audio calls, screen share, live “remote desktop.”
✅ Sub-50 ms data — multiplayer games, collaborative tools where every ms shows.
❌ Chat. A WebSocket through your server is simpler, cheaper, and easier to moderate.
❌ Anything you need to log / record server-side. P2P means you’re not in the path.

Comparison, side by side#

	SSE	WebSocket	WebRTC
Direction	server → client	bi-directional	p2p (both)
Transport	HTTP text stream	TCP (post upgrade)	UDP (SCTP / SRTP)
Auto-reconnect	✅ built-in	❌ DIY	❌ renegotiate
Binary	❌ (text only)	✅	✅
Auth	cookies / headers	same as HTTP	out of band (signaling)
Works through strict proxies	✅ (it’s HTTP)	mostly	often needs TURN
Infra complexity	lowest	medium	highest
Sample use cases	stock tickers, log tails, AI token stream	chat, dashboards, collaborative cursors	Zoom, Meet, Discord voice

Interview gotchas for §4c#

“Why not long-poll?” — a classic warm-up. Long-polling works but each update is a new TCP + TLS handshake. For a dozen updates/sec, SSE/WS are dramatically cheaper.
Scale the fan-out, not the socket. For 1 M concurrent WebSocket users, the limit isn’t TCP — it’s how fast you can broadcast a message to 1 M sockets. Keep a per-room subscriber index, fan-out via Redis pub/sub or Kafka, pin users to regions.
AI token streaming. Both SSE and WebSocket work. Most LLM APIs (OpenAI, Anthropic) ship SSE — it’s simpler, and the stream is strictly server → client.
wss:// is mandatory in production. Mobile carriers and corporate proxies routinely strip or block plain ws://.
WebRTC without a TURN budget is a demo. Your team-coffee prototype works in the office because everyone’s on the same NAT. Real users need TURN, and TURN bandwidth costs real money.

5. Load Balancing#

A load balancer is the thing that lets you say “run N copies of my service” instead of “run my service.” It does three jobs at once:

Horizontal scaling — spread load across replicas.
Availability — health-check backends, take dead ones out of rotation.
Deployment flexibility — gate traffic into new versions for blue/green, canary, rolling.

Every system-design interview touches one of these three.

Client-side vs dedicated load balancing#

Two fundamentally different architectures, with very different failure modes.

Client-side vs dedicated load balancing

flowchart LR
    subgraph CLIENT ["Client-side LB"]
        direction TB
        C1["<img src="/icons/mdi-laptop.svg" alt="mdi:laptop" class="diagram-node-icon" width="24" height="24" style="display:block;margin:0 auto 4px;" /> Client<br/><span class='sub'>registry cache</span>"]:::client
        REG["<img src="/icons/mdi-format-list-bulleted.svg" alt="mdi:format-list-bulleted" class="diagram-node-icon" width="24" height="24" style="display:block;margin:0 auto 4px;" /> Registry<br/><span class='sub'>Consul · etcd</span>"]:::compute
        B1["<img src="/icons/mdi-server-outline.svg" alt="mdi:server-outline" class="diagram-node-icon" width="24" height="24" style="display:block;margin:0 auto 4px;" /> Backend A"]:::ok
        B2["<img src="/icons/mdi-server-outline.svg" alt="mdi:server-outline" class="diagram-node-icon" width="24" height="24" style="display:block;margin:0 auto 4px;" /> Backend B"]:::ok
        B3["<img src="/icons/mdi-server-outline.svg" alt="mdi:server-outline" class="diagram-node-icon" width="24" height="24" style="display:block;margin:0 auto 4px;" /> Backend C"]:::ok
        C1 -.->|"refresh"| REG
        C1 -->|"picks a peer"| B1
        C1 --> B2
        C1 --> B3
    end

    subgraph DED ["Dedicated LB"]
        direction TB
        C2["<img src="/icons/mdi-laptop.svg" alt="mdi:laptop" class="diagram-node-icon" width="24" height="24" style="display:block;margin:0 auto 4px;" /> Client"]:::client
        VIP["<img src="/icons/logos-nginx.svg" alt="logos:nginx" class="diagram-node-icon" width="24" height="24" style="display:block;margin:0 auto 4px;" /> Load balancer<br/><span class='sub'>Envoy · ALB</span>"]:::highlight
        D1["<img src="/icons/mdi-server-outline.svg" alt="mdi:server-outline" class="diagram-node-icon" width="24" height="24" style="display:block;margin:0 auto 4px;" /> Backend A"]:::ok
        D2["<img src="/icons/mdi-server-outline.svg" alt="mdi:server-outline" class="diagram-node-icon" width="24" height="24" style="display:block;margin:0 auto 4px;" /> Backend B"]:::ok
        D3["<img src="/icons/mdi-server-outline.svg" alt="mdi:server-outline" class="diagram-node-icon" width="24" height="24" style="display:block;margin:0 auto 4px;" /> Backend C"]:::ok
        C2 --> VIP
        VIP --> D1
        VIP --> D2
        VIP --> D3
        VIP -.->|"health checks"| D1
    end

    classDef highlight fill:#f3ebff,stroke:#7c3aed,stroke-width:1.5px,color:#4c1d95;

    classDef clusterOrange fill:transparent,stroke:#f97316,stroke-dasharray:6 4
    classDef clusterBlue fill:transparent,stroke:#3b82f6,stroke-dasharray:6 4
    class CLIENT clusterOrange
    class DED clusterBlue

classDef client stroke:#64748b,stroke-width:1.75px;
classDef compute stroke:#7c3aed,stroke-width:1.75px;
classDef ok stroke:#16a34a,stroke-width:1.75px;

Client-side = service registry + in-process picker (one less hop). Dedicated LB = network appliance (central policy, visible metrics). Pick by who owns the ops.

	Client-side	Dedicated
Who picks the backend?	the client library	a box in the middle
Extra network hop?	no	yes
Failure blast radius	one client affected	LB down = everything affected
Health-check work	every client	centralized at the LB
Best fit	internal RPC (gRPC, Finagle, mesh sidecars)	HTTP from unknown clients (browsers, mobile)
Typical infra	Consul / etcd + client library	ALB / NLB / nginx / Envoy / k8s Service

Kubernetes Service objects are quietly a distributed dedicated LB: each node’s kube-proxy programs iptables/IPVS rules to DNAT traffic to a healthy pod, so there is no single chokepoint box. That’s a nice hybrid — client code talks to a single VIP, but the data plane is per-node.

L4 vs L7 — the axis you must know cold#

	L4 (transport)	L7 (application)
Inspects	IP + port + TCP flags	HTTP method, path, headers, cookies, gRPC metadata
Routing rules	”all :443 → pool X"	"`/api/v2/*` → pool A, cookie `beta=1` → pool B”
TLS	passthrough (SNI-only)	terminate + re-inspect
CPU cost	low	high (parse each request)
Connection reuse	transparent	LB owns the pool to backends
Typical products	AWS NLB, HAProxy `mode tcp`, IPVS	ALB, nginx, Envoy, Traefik, Kong

Rule of thumb: if you need header-based routing, canary by cookie, path rewrites, or per-route rate limits → L7. If you need raw throughput for arbitrary TCP (Redis, Postgres replicas, WebSocket pass-through) → L4.

Balancing algorithms#

Assume 4 backends. Each algorithm decides where request N+1 goes.

Stateless LB algorithms — round-robin & weighted

flowchart LR
    R["<img src="/icons/mdi-tray-arrow-down.svg" alt="mdi:tray-arrow-down" class="diagram-node-icon" width="24" height="24" style="display:block;margin:0 auto 4px;" /> Requests<br/><span class='sub'>1…10</span>"]:::highlight

    subgraph RR ["Round-robin"]
        direction TB
        A1["<img src="/icons/mdi-server-outline.svg" alt="mdi:server-outline" class="diagram-node-icon" width="24" height="24" style="display:block;margin:0 auto 4px;" /> A<br/><span class='sub'>1, 5, 9</span>"]:::ok
        A2["<img src="/icons/mdi-server-outline.svg" alt="mdi:server-outline" class="diagram-node-icon" width="24" height="24" style="display:block;margin:0 auto 4px;" /> B<br/><span class='sub'>2, 6, 10</span>"]:::ok
        A3["<img src="/icons/mdi-server-outline.svg" alt="mdi:server-outline" class="diagram-node-icon" width="24" height="24" style="display:block;margin:0 auto 4px;" /> C<br/><span class='sub'>3, 7</span>"]:::ok
        A4["<img src="/icons/mdi-server-outline.svg" alt="mdi:server-outline" class="diagram-node-icon" width="24" height="24" style="display:block;margin:0 auto 4px;" /> D<br/><span class='sub'>4, 8</span>"]:::ok
    end

    subgraph WRR ["Weighted (A=3·B=C=D=1)"]
        direction TB
        B1["<img src="/icons/mdi-server-outline.svg" alt="mdi:server-outline" class="diagram-node-icon" width="24" height="24" style="display:block;margin:0 auto 4px;" /> A<br/><span class='sub'>1,2,3,7,8,9</span>"]:::ok
        B2["<img src="/icons/mdi-server-outline.svg" alt="mdi:server-outline" class="diagram-node-icon" width="24" height="24" style="display:block;margin:0 auto 4px;" /> B<br/><span class='sub'>4, 10</span>"]:::ok
        B3["<img src="/icons/mdi-server-outline.svg" alt="mdi:server-outline" class="diagram-node-icon" width="24" height="24" style="display:block;margin:0 auto 4px;" /> C<br/><span class='sub'>5</span>"]:::ok
        B4["<img src="/icons/mdi-server-outline.svg" alt="mdi:server-outline" class="diagram-node-icon" width="24" height="24" style="display:block;margin:0 auto 4px;" /> D<br/><span class='sub'>6</span>"]:::ok
    end

    R ==> RR
    R ==> WRR

    classDef highlight fill:#f3ebff,stroke:#7c3aed,stroke-width:1.5px,color:#4c1d95;

    classDef clusterBlue fill:transparent,stroke:#3b82f6,stroke-dasharray:6 4
    classDef clusterGreen fill:transparent,stroke:#16a34a,stroke-dasharray:6 4
    class RR clusterBlue
    class WRR clusterGreen

classDef ok stroke:#16a34a,stroke-width:1.75px;

No per-request state. Round-robin rotates evenly; weighted skews the rotation so a bigger backend takes proportionally more (A=3 here).

Load-aware LB algorithms — least-connections & P2C

flowchart LR
    R["<img src="/icons/mdi-tray-arrow-down.svg" alt="mdi:tray-arrow-down" class="diagram-node-icon" width="24" height="24" style="display:block;margin:0 auto 4px;" /> Requests<br/><span class='sub'>route by load</span>"]:::highlight

    subgraph LC ["Least-connections"]
        direction TB
        C1["<img src="/icons/mdi-server-outline.svg" alt="mdi:server-outline" class="diagram-node-icon" width="24" height="24" style="display:block;margin:0 auto 4px;" /> A<br/><span class='sub'>3 open</span>"]:::warn
        C2["<img src="/icons/mdi-server-outline.svg" alt="mdi:server-outline" class="diagram-node-icon" width="24" height="24" style="display:block;margin:0 auto 4px;" /> B<br/><span class='sub'>4 open</span>"]:::bad
        C3["<img src="/icons/mdi-server-outline.svg" alt="mdi:server-outline" class="diagram-node-icon" width="24" height="24" style="display:block;margin:0 auto 4px;" /> C<br/><span class='sub'>1 open ← next</span>"]:::ok
        C4["<img src="/icons/mdi-server-outline.svg" alt="mdi:server-outline" class="diagram-node-icon" width="24" height="24" style="display:block;margin:0 auto 4px;" /> D<br/><span class='sub'>2 open</span>"]:::warn
    end

    subgraph P2C ["Power of two choices"]
        direction TB
        D1["<img src="/icons/mdi-dice-2.svg" alt="mdi:dice-2" class="diagram-node-icon" width="24" height="24" style="display:block;margin:0 auto 4px;" /> Pick 2 random<br/><span class='sub'>C=1 vs A=3</span>"]:::step
        D2["<img src="/icons/mdi-server-outline.svg" alt="mdi:server-outline" class="diagram-node-icon" width="24" height="24" style="display:block;margin:0 auto 4px;" /> Send to C<br/><span class='sub'>fewer in-flight</span>"]:::ok
        D1 --> D2
    end

    R ==> LC
    R ==> P2C

    classDef highlight fill:#f3ebff,stroke:#7c3aed,stroke-width:1.5px,color:#4c1d95;

    classDef clusterOrange fill:transparent,stroke:#f97316,stroke-dasharray:6 4
    classDef clusterPurple fill:transparent,stroke:#a855f7,stroke-dasharray:6 4
    class LC clusterOrange
    class P2C clusterPurple

classDef ok stroke:#16a34a,stroke-width:1.75px;
classDef warn stroke:#d97706,stroke-width:1.75px;
classDef bad stroke:#dc2626,stroke-width:1.75px;
classDef step stroke:#3b6fd6,stroke-width:1.75px;

Both react to live load. Least-connections needs a global in-flight count; power-of-two-choices samples two at random and picks the lighter — near-optimal with no global state.

Quick take on each:

Round-robin — zero state; pathological when backends have different capacity or when request cost varies.
Weighted round-robin — tell the LB “backend A is 3× the size,” traffic splits 3:1:1:1. Typical during canary ramp.
Least connections — usually the right default for long-lived HTTP and request-response with long tail. Requires the LB to count in-flight.
Power of two choices (P2C) — pick two backends at random, send to the one with fewer active requests. Surprisingly close to optimal, no global state needed. Used by Envoy, Finagle, NGINX Plus.
IP hash / session affinity (sticky sessions) — hashes source IP (or a cookie) to a backend. Use only when the app has in-memory session state; prefer refactoring the state out.

Consistent hashing, for cache locality and stateful services#

When each backend holds different data (a sharded cache, a stateful computation, a per-user queue) you can’t send requests to just anyone — the answer is only on one specific node. Naive hash(key) % N sends 100 % of keys to new nodes when N changes. Consistent hashing shifts only ~1/N.

Consistent hashing — minimal remapping on churn

flowchart LR
    subgraph RING ["Hash ring (mod 2³²)"]
        direction TB
        A["<img src="/icons/mdi-server-outline.svg" alt="mdi:server-outline" class="diagram-node-icon" width="24" height="24" style="display:block;margin:0 auto 4px;" /> Backend A<br/><span class='sub'>vnodes 0,120,250</span>"]:::ok
        B["<img src="/icons/mdi-server-outline.svg" alt="mdi:server-outline" class="diagram-node-icon" width="24" height="24" style="display:block;margin:0 auto 4px;" /> Backend B<br/><span class='sub'>vnodes 60,180,310</span>"]:::step
        C["<img src="/icons/mdi-server-outline.svg" alt="mdi:server-outline" class="diagram-node-icon" width="24" height="24" style="display:block;margin:0 auto 4px;" /> Backend C<br/><span class='sub'>vnodes 90,200,340</span>"]:::warn
    end

    K1["<img src="/icons/mdi-key-outline.svg" alt="mdi:key-outline" class="diagram-node-icon" width="24" height="24" style="display:block;margin:0 auto 4px;" /> user:42<br/><span class='sub'>hash 85</span>"]:::client
    K2["<img src="/icons/mdi-key-outline.svg" alt="mdi:key-outline" class="diagram-node-icon" width="24" height="24" style="display:block;margin:0 auto 4px;" /> session:abc<br/><span class='sub'>hash 155</span>"]:::client
    K3["<img src="/icons/mdi-key-outline.svg" alt="mdi:key-outline" class="diagram-node-icon" width="24" height="24" style="display:block;margin:0 auto 4px;" /> order:7<br/><span class='sub'>hash 220</span>"]:::client

    K1 ==>|"nearest cw = 90"| C
    K2 ==>|"180"| B
    K3 ==>|"250"| A

    classDef clusterOrange fill:transparent,stroke:#f97316,stroke-dasharray:6 4
    class RING clusterOrange

classDef client stroke:#64748b,stroke-width:1.75px;
classDef ok stroke:#16a34a,stroke-width:1.75px;
classDef warn stroke:#d97706,stroke-width:1.75px;
classDef step stroke:#3b6fd6,stroke-width:1.75px;

Ring of hash positions; adding or removing a node only moves keys in the adjacent arc. Critical for stateful services (caches, shards) where relocation is expensive.

Vnodes (virtual nodes) are the trick that makes the distribution uniform. Each physical backend claims ~150 random positions on the ring; a new backend pulls roughly the right slice of keys off its neighbors instead of inheriting whichever contiguous arc happened to be next to its single position. Redis Cluster, DynamoDB, Cassandra, memcached clients — all consistent-hash with vnodes under the hood.

Health checks — active and passive, together#

Two complementary signals. In production you want both.

Active — the LB probes GET /healthz every 2–5 s. A failure → take out of rotation after N consecutive misses. Cheap, predictable, but adds background load.
Passive — the LB observes live traffic: 5xx responses, connection resets, timeouts. If a backend’s error rate spikes above a threshold, eject it for a cooldown (Envoy’s outlier detection).

The /healthz vs /readyz distinction matters for Kubernetes:

/livez (liveness) — “am I alive?” If this fails, k8s restarts the container. Keep it trivial; don’t check DB here (otherwise one slow DB takes out every pod).
/readyz (readiness) — “am I ready to serve traffic?” If this fails, k8s takes you out of the Service’s endpoint list but does not restart you. Check dependencies here (DB, caches, downstream auth service).

Slow-start / warm-up. When a new backend comes up, don’t immediately send it 25% of traffic — its connection pool is cold, JIT isn’t warm, the JVM hasn’t C2-compiled, Postgres connection handshakes amortize. Envoy has slow_start_config; nginx has slow_start=30s in upstream. Without it, the first pod in a rolling deploy absorbs a latency spike every time.

TLS termination: where does the crypto happen?#

TLS termination — where the crypto happens

flowchart LR
    CL["<img src="/icons/mdi-laptop.svg" alt="mdi:laptop" class="diagram-node-icon" width="24" height="24" style="display:block;margin:0 auto 4px;" /> Client"]:::client -->|"HTTPS"| LB1

    subgraph TERM_LB ["① Terminate at LB"]
        direction TB
        LB1["<img src="/icons/mdi-shield-check.svg" alt="mdi:shield-check" class="diagram-node-icon" width="24" height="24" style="display:block;margin:0 auto 4px;" /> LB<br/><span class='sub'>holds cert + key</span>"]:::edge
        B1["<img src="/icons/mdi-server-outline.svg" alt="mdi:server-outline" class="diagram-node-icon" width="24" height="24" style="display:block;margin:0 auto 4px;" /> Backend"]:::ok
        LB1 -->|"HTTP / re-encrypt"| B1
    end

    subgraph TERM_BACK ["② Passthrough"]
        direction TB
        LB2["<img src="/icons/mdi-router-wireless.svg" alt="mdi:router-wireless" class="diagram-node-icon" width="24" height="24" style="display:block;margin:0 auto 4px;" /> LB<br/><span class='sub'>L4, SNI-only</span>"]:::edge
        B2["<img src="/icons/mdi-shield-check.svg" alt="mdi:shield-check" class="diagram-node-icon" width="24" height="24" style="display:block;margin:0 auto 4px;" /> Backend<br/><span class='sub'>holds cert + key</span>"]:::ok
        LB2 -->|"HTTPS passthrough"| B2
    end

    subgraph TERM_MESH ["③ Mesh sidecar (mTLS)"]
        direction TB
        SC1["<img src="/icons/mdi-shield-lock.svg" alt="mdi:shield-lock" class="diagram-node-icon" width="24" height="24" style="display:block;margin:0 auto 4px;" /> Envoy sidecar"]:::edge
        B3["<img src="/icons/mdi-server-outline.svg" alt="mdi:server-outline" class="diagram-node-icon" width="24" height="24" style="display:block;margin:0 auto 4px;" /> Backend"]:::ok
        SC1 -->|"localhost plaintext"| B3
        SC1 -.->|"mTLS to peers"| SC2["<img src="/icons/mdi-shield-lock.svg" alt="mdi:shield-lock" class="diagram-node-icon" width="24" height="24" style="display:block;margin:0 auto 4px;" /> other sidecars"]:::edge
    end

    classDef clusterBlue fill:transparent,stroke:#3b82f6,stroke-dasharray:6 4
    classDef clusterGreen fill:transparent,stroke:#16a34a,stroke-dasharray:6 4
    classDef clusterPurple fill:transparent,stroke:#a855f7,stroke-dasharray:6 4
    class TERM_LB clusterBlue
    class TERM_BACK clusterGreen
    class TERM_MESH clusterPurple

classDef client stroke:#64748b,stroke-width:1.75px;
classDef edge stroke:#3b82f6,stroke-width:1.75px;
classDef ok stroke:#16a34a,stroke-width:1.75px;

At the edge LB (simplest, plaintext inside the mesh). At the pod (end-to-end, heavier certs). Re-encrypt at the LB for mTLS inside. Pick by threat model.

	Terminate at LB	Passthrough	Mesh sidecar
Cert lives on	LB	every backend	sidecar + LB
L7 routing?	✅	❌ (L4 only)	✅
End-to-end encrypted?	❌ unless LB→backend re-encrypts	✅	✅ (mTLS)
Crypto CPU cost	centralized on LB	spread to backends	on sidecars
Typical use	public web app	pinned certs, compliance	zero-trust service mesh

Global / geo load balancing#

The LB box above is regional. Routing users to the closest healthy region happens one level up:

DNS-level — the authoritative DNS server returns different A records based on the resolver’s location (AWS Route 53 latency-based routing, GeoDNS). TTL is the enemy: a dead region bleeds traffic until TTLs expire everywhere. Keep global-health TTLs low (30-60 s).
Anycast — the same IP is announced from multiple BGP points; routers pick the topologically nearest. CDNs and DNS root servers use anycast. Failover is sub-second because it’s a routing update, not a DNS refresh.
App-level — the app decides, possibly overriding DNS. E.g. the web app pins users to their home shard after login.

§6 goes deeper into CDN / regional.

Here’s the minimum viable production stack — one region, the seven boxes an interviewer expects you to name:

Geo-DNS + anycast global routing

flowchart LR
    user["<img src="/icons/mdi-account.svg" alt="mdi:account" class="diagram-node-icon" width="24" height="24" style="display:block;margin:0 auto 4px;" /> User"]:::client
    cdn["<img src="/icons/logos-cloudflare.svg" alt="logos:cloudflare" class="diagram-node-icon" width="24" height="24" style="display:block;margin:0 auto 4px;" /> CDN<br/><span class='sub'>anycast edge</span>"]:::edge

    subgraph region ["Region (one of many)"]
        direction TB
        lb["<img src="/icons/logos-nginx.svg" alt="logos:nginx" class="diagram-node-icon" width="24" height="24" style="display:block;margin:0 auto 4px;" /> LB<br/><span class='sub'>nginx</span>"]:::highlight
        app["<img src="/icons/mdi-server-outline.svg" alt="mdi:server-outline" class="diagram-node-icon" width="24" height="24" style="display:block;margin:0 auto 4px;" /> API<br/><span class='sub'>Go</span>"]:::compute
        cache["<img src="/icons/logos-redis.svg" alt="logos:redis" class="diagram-node-icon" width="24" height="24" style="display:block;margin:0 auto 4px;" /> Cache<br/><span class='sub'>Redis</span>"]:::cache
        db["<img src="/icons/logos-postgresql.svg" alt="logos:postgresql" class="diagram-node-icon" width="24" height="24" style="display:block;margin:0 auto 4px;" /> DB<br/><span class='sub'>Postgres</span>"]:::database
    end

    kafka["{iconify:logos:apache-kafka} Kafka<br/><span class='sub'>cross-region</span>"]:::queue

    user ==>|"HTTPS"| cdn
    cdn ==>|"miss → origin"| lb
    lb ==>|"route"| app
    app ==>|"read/write"| cache
    app ==>|"sync write"| db
    db -.->|"CDC events"| kafka

    classDef highlight fill:#f3ebff,stroke:#7c3aed,stroke-width:1.5px,color:#4c1d95;

    classDef clusterOrange fill:transparent,stroke:#f97316,stroke-dasharray:6 4
    class region clusterOrange

classDef client stroke:#64748b,stroke-width:1.75px;
classDef edge stroke:#3b82f6,stroke-width:1.75px;
classDef compute stroke:#7c3aed,stroke-width:1.75px;
classDef database stroke:#f97316,stroke-width:1.75px;
classDef cache stroke:#10b981,stroke-width:1.75px;
classDef queue stroke:#a855f7,stroke-width:1.75px;

DNS answers resolve to the nearest regional endpoint; anycast advertises one IP from many BGP speakers so packets reach the nearest edge automatically.

Six logos, one focal point (the LB), one flow from left to right. Replicate the region group N times for multi-region; the Kafka bus is the only thing that actually crosses the boundary.

Deployment patterns the LB enables#

Blue/green — keep blue running, deploy green alongside, flip all traffic in one LB config change. Roll back = flip back.
Canary — weighted routing: 1 % → 10 % → 50 % → 100 % to the new version, watching error rate + latency at each step.
Rolling — replace pods N at a time; LB takes the restarting pod out of rotation via readiness.

All three require the LB to separate “in rotation” from “running,” which is exactly what readiness probes + weighted pools give you.

Go — a minimal round-robin L7 reverse proxy#

Illustrative, not production:

1
package main
2

3
import (
4
    "net/http"
5
    "net/http/httputil"
6
    "net/url"
7
    "sync/atomic"
8
    "time"
9
)
10

11
type Backend struct {
12
    URL      *url.URL
13
    Healthy  atomic.Bool
14
    Proxy    *httputil.ReverseProxy
15
}
16

17
type Pool struct {
18
    backends []*Backend
19
    idx      atomic.Uint64 // for round-robin
20
}
21

22
func NewPool(urls []string) *Pool {
23
    p := &Pool{}
24
    for _, raw := range urls {
25
        u, _ := url.Parse(raw)
26
        b := &Backend{URL: u}
27
        b.Proxy = httputil.NewSingleHostReverseProxy(u)
28
        // Mark backend unhealthy on transport errors.
29
        b.Proxy.ErrorHandler = func(w http.ResponseWriter, r *http.Request, err error) {
30
            b.Healthy.Store(false)
31
            http.Error(w, "bad gateway", http.StatusBadGateway)
32
        }
33
        b.Healthy.Store(true)
34
        p.backends = append(p.backends, b)
35
    }
36
    return p
37
}
38

39
func (p *Pool) NextHealthy() *Backend {
40
    n := uint64(len(p.backends))
41
    for i := uint64(0); i < n; i++ {
42
        b := p.backends[p.idx.Add(1)%n]
43
        if b.Healthy.Load() {
44
            return b
45
        }
46
    }
47
    return nil
48
}
49

50
func (p *Pool) ServeHTTP(w http.ResponseWriter, r *http.Request) {
51
    b := p.NextHealthy()
52
    if b == nil {
53
        http.Error(w, "no healthy backend", http.StatusServiceUnavailable)
54
        return
55
    }
56
    b.Proxy.ServeHTTP(w, r)
57
}
58

59
// Active health check: every 5s, GET /healthz on each backend.
60
func (p *Pool) HealthLoop() {
61
    client := &http.Client{Timeout: 1 * time.Second}
62
    t := time.NewTicker(5 * time.Second)
63
    for range t.C {
64
        for _, b := range p.backends {
65
            resp, err := client.Get(b.URL.String() + "/healthz")
66
            b.Healthy.Store(err == nil && resp != nil && resp.StatusCode == 200)
67
            if resp != nil {
68
                resp.Body.Close()
69
            }
70
        }
71
    }
72
}

Shortcuts this deliberately takes: no retry on 5xx, no P2C, no TLS backend, no hot-reload of the pool. Production LBs (Envoy, nginx) do all of those plus connection pooling, HTTP/2 multiplexing, circuit breaking, and metrics — the reason “just write your own” is almost always the wrong answer.

Interview gotchas for §5#

Thundering herd after an LB event. When the LB restarts or many backends come up in a rolling deploy, naive round-robin sends one big wave to the newest pod. Slow-start mode (Envoy, NGINX Plus) ramps weight up over N seconds. Ask about it.
Session affinity is a scaling debt. Breaks the symmetry that lets you kill any pod without user impact. If you must, key affinity on user_id (cookie), not source IP — phones roam between Wi-Fi and LTE.
L7 LBs are CPU-bound on TLS, not bandwidth. Plan capacity on handshakes-per-second, not gbps. Session resumption + OCSP stapling + HTTP/2 keepalive ease the bill.
keepalive_timeout mismatches → 502 storms. If the LB’s idle timeout is longer than the backend’s, the LB will send new requests on sockets the backend is about to close. Always keep LB idle ≤ backend idle − a couple of seconds.
DNS TTL vs failover. A 300 s TTL means a dead region bleeds traffic for 300 s worldwide. Lower the TTL before you need to fail over (not during), or put anycast in front of the name.
Don’t put an L7 LB in front of another L7 LB unless you love chasing ghost 502s. One hop that parses HTTP is enough; the second hop just adds surface area for header drift, keepalive mismatch, and H1↔H2 translation bugs.

6. Deep Dives — CDN, regional, resilience#

Up to this point every protocol assumed the network cooperates. It doesn’t. Packets drop, regions fail, downstream services slow to a crawl, and your latency budget is shorter than any single hop’s tail. This section is the kit of patterns you reach for to keep a system standing when things go sideways.

The latency / availability math interviewers expect#

Two numbers you should be able to derive on the whiteboard:

SLA	Budget per year	Per month	Per week
99.0%	3.65 days	7.2 h	1.68 h
99.9% (three 9s)	8.76 h	43.8 min	10.1 min
99.99% (four 9s)	52.6 min	4.4 min	60.5 s
99.999% (five 9s)	5.26 min	26.3 s	6 s

Composition rule. If you depend on N downstream services each at 99.9%, your availability is 0.999^N. Ten dependencies ≈ 99%. That’s why resilience patterns exist — they recover availability from components that individually aren’t good enough.

CDNs: push the edge closer#

A CDN caches static (and increasingly dynamic) responses at points of presence (PoPs) near your users. A request hits the nearest PoP; if cached, served from there. If not, the PoP fetches from origin, caches, returns. First byte goes from ~200 ms trans-Pacific to ~20 ms same-city.

CDN — pull-through caching at the edge

flowchart LR
    user["<img src="/icons/mdi-account.svg" alt="mdi:account" class="diagram-node-icon" width="24" height="24" style="display:block;margin:0 auto 4px;" /> User"]:::client
    pop["<img src="/icons/logos-cloudflare.svg" alt="logos:cloudflare" class="diagram-node-icon" width="24" height="24" style="display:block;margin:0 auto 4px;" /> Edge PoP<br/><span class='sub'>nearest cache</span>"]:::highlight
    origin["<img src="/icons/mdi-server-outline.svg" alt="mdi:server-outline" class="diagram-node-icon" width="24" height="24" style="display:block;margin:0 auto 4px;" /> Origin<br/><span class='sub'>app server</span>"]:::compute
    bucket["<img src="/icons/logos-aws-s3.svg" alt="logos:aws-s3" class="diagram-node-icon" width="24" height="24" style="display:block;margin:0 auto 4px;" /> Object store<br/><span class='sub'>S3</span>"]:::storage

    user ==>|"request"| pop
    pop ==>|"cache miss"| origin
    origin -->|"read"| bucket
    bucket -.->|"response"| origin
    origin -.->|"fill cache"| pop
    pop -.->|"response"| user

    classDef highlight fill:#f3ebff,stroke:#7c3aed,stroke-width:1.5px,color:#4c1d95;

classDef client stroke:#64748b,stroke-width:1.75px;
classDef compute stroke:#7c3aed,stroke-width:1.75px;
classDef storage stroke:#f59e0b,stroke-width:1.75px;

First request for each object fetches from origin; the edge PoP caches and serves subsequent requests from 10 ms away instead of cross-continent.

Four knobs that actually matter in an interview:

Cache keys. By default, URL = key. Vary: Accept-Language splits entries by language header. Get the key wrong → serve the wrong user’s content.
TTL vs stale-while-revalidate. Cache-Control: max-age=60, stale-while-revalidate=3600 = serve cached, asynchronously refetch after 60 s, serve a stale copy for up to 1 h if the origin is down. Trade freshness for resilience.
Cache-stampede protection. When a popular URL expires, 10,000 clients hit the origin simultaneously. Fix: request coalescing at the edge (one request fans out), or stale-while-revalidate.
Purging. Pushing a fix? Tag-based invalidation (Cache-Tag: article-42) is far better than URL-based when one piece of content appears in many URLs.

Regional partitioning: blast-radius management#

Single-region = single blast radius. If you lose us-east-1, you lose everything. Three typical multi-region postures:

Regional partitioning — blast-radius control

flowchart LR
    subgraph AP ["Active / Passive"]
        direction TB
        APpri["<img src="/icons/mdi-server-outline.svg" alt="mdi:server-outline" class="diagram-node-icon" width="24" height="24" style="display:block;margin:0 auto 4px;" /> Primary<br/><span class='sub'>100% traffic</span>"]:::ok
        APstd["<img src="/icons/mdi-server-outline.svg" alt="mdi:server-outline" class="diagram-node-icon" width="24" height="24" style="display:block;margin:0 auto 4px;" /> Standby<br/><span class='sub'>0% · replicated</span>"]:::warn
        APpri -.->|"async replication"| APstd
    end

    subgraph AA ["Active / Active"]
        direction TB
        AA1["<img src="/icons/mdi-earth.svg" alt="mdi:earth" class="diagram-node-icon" width="24" height="24" style="display:block;margin:0 auto 4px;" /> Region A<br/><span class='sub'>50%</span>"]:::ok
        AA2["<img src="/icons/mdi-earth.svg" alt="mdi:earth" class="diagram-node-icon" width="24" height="24" style="display:block;margin:0 auto 4px;" /> Region B<br/><span class='sub'>50%</span>"]:::ok
        AA1 <==>|"bi-directional sync"| AA2
    end

    subgraph CELL ["Cell-based"]
        direction TB
        C1["<img src="/icons/mdi-view-grid.svg" alt="mdi:view-grid" class="diagram-node-icon" width="24" height="24" style="display:block;margin:0 auto 4px;" /> Cell 1<br/><span class='sub'>users 0-33%</span>"]:::step
        C2["<img src="/icons/mdi-view-grid.svg" alt="mdi:view-grid" class="diagram-node-icon" width="24" height="24" style="display:block;margin:0 auto 4px;" /> Cell 2<br/><span class='sub'>users 33-66%</span>"]:::step
        C3["<img src="/icons/mdi-view-grid.svg" alt="mdi:view-grid" class="diagram-node-icon" width="24" height="24" style="display:block;margin:0 auto 4px;" /> Cell 3<br/><span class='sub'>users 66-100%</span>"]:::step
    end

    classDef clusterBlue fill:transparent,stroke:#3b82f6,stroke-dasharray:6 4
    classDef clusterGreen fill:transparent,stroke:#16a34a,stroke-dasharray:6 4
    classDef clusterPurple fill:transparent,stroke:#a855f7,stroke-dasharray:6 4
    class AP clusterBlue
    class AA clusterGreen
    class CELL clusterPurple

classDef ok stroke:#16a34a,stroke-width:1.75px;
classDef warn stroke:#d97706,stroke-width:1.75px;
classDef step stroke:#3b6fd6,stroke-width:1.75px;

Shard infrastructure per region or cell; a failure contaminates one shard, not the fleet. Slack, Twilio, and every serious SaaS runs this way.

Model	Failover time	Data loss risk	Operational cost
Active / Passive	minutes (DNS / BGP flip)	last async-replication window (seconds to minutes)	low — stand-by is cheap
Active / Active	seconds (already serving)	merge conflicts if both wrote	high — multi-master sync
Cell-based	blast radius = one cell	only that cell’s users	medium — many small cells

Cell-based is AWS’s favorite pattern: each “cell” is a self-contained stack serving a slice of users. When a cell goes bad, only that slice is affected. Adding capacity = adding cells, not scaling one giant region.

Timeouts: the most underappreciated resilience primitive#

If you remember nothing else: every network call needs a timeout. The default behavior of most language HTTP libraries is “wait forever” — which translates to goroutines/threads piling up, connection pools exhausting, the whole service grinding to a halt because of one slow downstream.

Timeout budget — carry a deadline through the call stack, not a timeout per hop. If the top-level request has 800 ms left, an internal call can’t use 500 ms if two more hops come after it.

Timeout layers — each tighter than its caller

sequenceDiagram
    autonumber
    participant U as User
    participant A as API (budget: 800ms)
    participant B as Service B (budget: 400ms)
    participant C as Service C (budget: 150ms)

    U->>A: request
    Note over A: deadline = now + 800ms
    A->>B: call (ctx deadline = 400ms)
    Note over B: deadline = now + 400ms
    B->>C: call (ctx deadline = 150ms)
    Note over C: deadline = now + 150ms
    C-->>B: 80ms
    B-->>A: 220ms
    A-->>U: 370ms ✓ within budget

Client > LB > service > DB. If any inner timeout exceeds an outer one, the outer caller retries while the inner is still running — amplifies load under pressure.

In Go, context.Context does this for you — context.WithDeadline(parent, t) clamps the child’s deadline to whichever is tighter. gRPC and well-behaved HTTP libraries read ctx.Deadline() and fail fast if there’s no time left.

Retries with exponential backoff + jitter#

When a transient error happens (DNS blip, upstream restart, rate-limit), the first instinct is to retry. Done naively, retries turn a 1-second outage into a 30-second outage as clients synchronize their retry storms.

Three rules to retry well:

Retry only idempotent operations. GET, PUT, DELETE — yes. POST — only if you carry an idempotency key.
Cap the number of attempts. Usually 3–5. Infinite retries is a DDoS on yourself.
Exponential backoff + jitter. Double the delay each attempt, then add randomness so N clients don’t all retry at the exact same moment.

Exponential backoff with full jitter

sequenceDiagram
    autonumber
    participant C as Client
    participant S as Upstream

    C->>S: GET /resource
    S--xC: 503 Service Unavailable

    Note over C: wait 100-300ms<br/>(base 200ms + jitter)
    C->>S: attempt 2
    S--xC: 503

    Note over C: wait 200-600ms<br/>(base 400ms + jitter)
    C->>S: attempt 3
    S--xC: 503

    Note over C: wait 400-1200ms<br/>(base 800ms + jitter)
    C->>S: attempt 4
    S-->>C: 200 OK ✅

delay = random(0, base × 2^attempt). Jitter decorrelates retry storms from simultaneous clients — the classic thundering-herd prevention.

The jitter is crucial. Without it, all clients that failed at t=0 retry simultaneously at t=200, crushing the service again. With full jitter (sleep = rand(0, base * 2^attempt)), retries smear across the whole interval.

Go — the retry that you copy-paste into every new service:

1
import (
2
    "context"
3
    "errors"
4
    "math/rand"
5
    "time"
6
)
7

8
type retryableFn func(ctx context.Context) error
9

10
// retry runs fn with full-jitter exponential backoff, capped at maxAttempts.
11
// Returns the last error if all attempts fail. Respects the context deadline.
12
func retry(ctx context.Context, maxAttempts int, base time.Duration, fn retryableFn) error {
13
    var err error
14
    for attempt := 0; attempt < maxAttempts; attempt++ {
15
        err = fn(ctx)
16
        if err == nil {
17
            return nil
18
        }
19
        if !isRetryable(err) || attempt == maxAttempts-1 {
20
            return err
21
        }
22
        // Full jitter: sleep = random in [0, base * 2^attempt]
23
        maxSleep := base * (1 << attempt)
24
        sleep := time.Duration(rand.Int63n(int64(maxSleep)))
25

26
        select {
27
        case <-time.After(sleep):
28
        case <-ctx.Done():
29
            return errors.Join(err, ctx.Err())
30
        }
31
    }
32
    return err
33
}
34

35
func isRetryable(err error) bool {
36
    // 5xx, connection reset, deadline exceeded — yes.
37
    // 4xx (client error), validation errors — no.
38
    var te interface{ Timeout() bool }
39
    if errors.As(err, &te) && te.Timeout() { return true }
40
    // + protocol-specific heuristics (HTTP status, gRPC codes.Unavailable) …
41
    return false
42
}

Circuit breaker: stop hitting a dead service#

A circuit breaker wraps calls to a downstream and fails fast when failures cross a threshold. Think of it as a fuse that opens to protect the rest of the system from cascading failures.

Circuit breaker — CLOSED · OPEN · HALF-OPEN

stateDiagram-v2
    direction LR
    [*] --> Closed

    Closed --> Open : failures > threshold<br/>in rolling window
    Open --> HalfOpen : after cool-down<br/>(e.g. 30s)
    HalfOpen --> Closed : probe succeeds
    HalfOpen --> Open : probe fails

    note right of Closed
        normal — calls go through
        failure count incremented
    end note
    note right of Open
        fail fast — no calls
        return cached / fallback / 503
    end note
    note right of HalfOpen
        let one probe through
        decide based on result
    end note

    class Closed ok
    class Open bad
    class HalfOpen warn

CLOSED → OPEN on failure threshold → HALF-OPEN probes → CLOSED on success. Fails fast when the downstream is known-bad instead of queuing every call to time out.

Why this matters:

Without a breaker: a dead downstream takes 10 s to time out per call. At 1000 req/s incoming, that’s 10,000 goroutines stuck in flight → OOM within seconds.
With a breaker in Open state: calls fail in microseconds with a known error. Goroutines complete, connection pools stay healthy, upstream clients can be told “this feature is degraded, retry in 30 s” instead of “your entire page timed out.”

Go with sony/gobreaker:

1
import "github.com/sony/gobreaker/v2"
2

3
var paymentBreaker = gobreaker.NewCircuitBreaker[*PaymentResult](gobreaker.Settings{
4
    Name:        "payment-gateway",
5
    MaxRequests: 3,             // allowed through in Half-Open
6
    Interval:    60 * time.Second, // rolling window
7
    Timeout:     30 * time.Second, // Open → Half-Open cool-down
8
    ReadyToTrip: func(counts gobreaker.Counts) bool {
9
        failureRate := float64(counts.TotalFailures) / float64(counts.Requests)
10
        return counts.Requests >= 20 && failureRate >= 0.5
11
    },
12
    OnStateChange: func(name string, from, to gobreaker.State) {
13
        log.Printf("breaker %s: %s → %s", name, from, to)
14
    },
15
})
16

17
func chargeCard(ctx context.Context, req ChargeRequest) (*PaymentResult, error) {
18
    return paymentBreaker.Execute(func() (*PaymentResult, error) {
19
        return paymentClient.Charge(ctx, req)
20
    })
21
}

The knobs that matter (with reasonable defaults):

Failure threshold — failure rate ≥ 50% over a rolling window of 20+ requests. Lower = more sensitive, more false trips.
Cool-down — time in Open before trying Half-Open. Usually 15-60 s. Too short = hammer while still sick; too long = slow recovery.
Half-Open probe count — how many requests before declaring healthy again. 1-5. Too many = expose more traffic to a still-broken service.

Bulkheads: isolate the blast radius#

Don’t share pools across unrelated features. If payment and search both draw from one http.Transport.MaxIdleConns = 100 pool, a payment-gateway slowdown can starve search. Give each downstream its own pool (separate http.Client), or per-tenant pools for multi-tenant systems.

Paired with a breaker, this means one dead dependency degrades only its feature — the rest of the app keeps working.

Rate limiting, three places#

At the edge (CDN / API gateway) — by IP / API key. Rejects floods before they touch your code.
At the service (middleware) — by user / tenant / endpoint. Enforces per-customer contracts.
At the downstream call site (client-side) — token bucket per upstream. Shields your dependencies from you.

The classic algorithm is token bucket: capacity tokens refilled at rate/sec. Each request costs 1. If no tokens, 429. Bursts up to capacity, steady-state rate. Go’s golang.org/x/time/rate.Limiter is idiomatic.

Putting it together — the resilience stack#

The resilience stack for a downstream call

flowchart TB
    REQ["<img src="/icons/mdi-tray-arrow-down.svg" alt="mdi:tray-arrow-down" class="diagram-node-icon" width="24" height="24" style="display:block;margin:0 auto 4px;" /> Incoming request"]:::highlight
    TMO["<img src="/icons/mdi-timer-outline.svg" alt="mdi:timer-outline" class="diagram-node-icon" width="24" height="24" style="display:block;margin:0 auto 4px;" /> ① Timeout budget<br/><span class='sub'>ctx.WithDeadline</span>"]:::step
    RLT["<img src="/icons/mdi-speedometer.svg" alt="mdi:speedometer" class="diagram-node-icon" width="24" height="24" style="display:block;margin:0 auto 4px;" /> ② Rate limiter<br/><span class='sub'>token bucket</span>"]:::step
    BLK["<img src="/icons/mdi-wall.svg" alt="mdi:wall" class="diagram-node-icon" width="24" height="24" style="display:block;margin:0 auto 4px;" /> ③ Bulkhead<br/><span class='sub'>per-dep pool</span>"]:::step
    CB["<img src="/icons/mdi-electric-switch.svg" alt="mdi:electric-switch" class="diagram-node-icon" width="24" height="24" style="display:block;margin:0 auto 4px;" /> ④ Circuit breaker<br/><span class='sub'>closed/open/half</span>"]:::step
    RET["<img src="/icons/mdi-refresh.svg" alt="mdi:refresh" class="diagram-node-icon" width="24" height="24" style="display:block;margin:0 auto 4px;" /> ⑤ Retry<br/><span class='sub'>backoff + jitter</span>"]:::step
    CALL["<img src="/icons/mdi-server-network.svg" alt="mdi:server-network" class="diagram-node-icon" width="24" height="24" style="display:block;margin:0 auto 4px;" /> Downstream call"]:::ok
    FALLBACK["<img src="/icons/mdi-alert-outline.svg" alt="mdi:alert-outline" class="diagram-node-icon" width="24" height="24" style="display:block;margin:0 auto 4px;" /> Fallback / 503<br/><span class='sub'>cached</span>"]:::warn

    REQ --> TMO --> RLT --> BLK --> CB
    CB -->|"closed"| RET --> CALL
    CB -->|"open"| FALLBACK
    CB -->|"half-open"| CALL

    classDef highlight fill:#f3ebff,stroke:#7c3aed,stroke-width:1.5px,color:#4c1d95;

classDef ok stroke:#16a34a,stroke-width:1.75px;
classDef warn stroke:#d97706,stroke-width:1.75px;
classDef step stroke:#3b6fd6,stroke-width:1.75px;

Timeout wraps everything. Inside: circuit breaker → retry → bulkhead → rate limit. Below: load balancer + health check. Each layer answers a different failure mode.

Order matters: timeouts first so nothing can run forever, rate-limit before the expensive work, bulkhead to isolate, breaker to fail fast, retry on retryable errors only, then the actual call. Getting the order wrong (e.g. retrying before the breaker) amplifies bad behavior instead of absorbing it.

Interview gotchas for §6#

Retry storms after a mass timeout. If you set retries = 3 on every layer (client, gateway, service, downstream), a single slow call multiplies into 3⁴ = 81 attempts. Pick one layer to retry; the others pass the error up.
DELETE isn’t always safe to retry. It’s idempotent semantically (DELETE x twice = same state) but the second DELETE on a not-found resource may return 404 — your caller needs to treat 404-after-DELETE as success, not failure.
Breaker + retry interaction. The retry layer inside a breaker means one user retrying 3× accounts for 3 failures in the breaker’s window, tripping it faster than you’d expect. Decide: retry outside the breaker (breaker is the ultimate source of truth) or inside (retries are “part of one operation”).
Cold-start after Open → HalfOpen. If your downstream just came back and you send all your traffic in the first second, you kill it again. Use MaxRequests in Half-Open, or add a gradual weight ramp-up (see §5 slow-start).
Monitor OnStateChange. A breaker silently tripping is worse than no breaker — users see fallbacks and you don’t know why. Page / log every state transition.

7. Interview cheat sheet#

Three ways to use this section:

Night-before review — read only this page, open the diagrams you don’t remember.
During the interview — when the interviewer drops a keyword, the tables below have a starting sentence.
Mock warm-up — cover the right column and quiz yourself.

Answer template for any networking question#

Strong answers have four beats in order. Missing one is the usual reason a good technical answer feels mediocre:

Frame the trade-off. Name the two or three things we’re choosing between (latency vs throughput, consistency vs availability, correctness vs cost).
Pick a default. Give a concrete choice with numbers where you can.
Call out the failure mode. Say out loud when your default breaks and what you’d reach for next.
Tie to the specific system in the prompt. Generic answers rate generic.

When you hear X → how to open#

The right-hand column is the first sentence, not the whole answer. Expand from there.

Design decisions

Interviewer says	Open with
”What happens when I type a URL?”	DNS → TCP 3-way → TLS 1.3 (1 RTT) → HTTP request → render. First byte floor = one RTT. HTTP/3 folds the handshake into QUIC.
”TCP or UDP for this?”	Default TCP for correctness; UDP when ordering/retransmits are the app’s job (DNS, media, QUIC) or when HOL blocking matters.
”REST, GraphQL, or gRPC?”	REST for public / CRUD / cacheable. GraphQL when one graph × many client shapes. gRPC for internal polyglot services with streaming.
”WebSocket, SSE, or WebRTC?”	SSE for server → client feeds. WebSocket for bi-di text/binary. WebRTC only if you need media or sub-50ms peer-to-peer.
”301 vs 302?“	301 = permanent, cached aggressively, pain to roll back. 302 = temporary, not cached. Use 307/308 to preserve the HTTP method.
”How do you encrypt service-to-service traffic?“	mTLS, usually delegated to a service mesh sidecar (Envoy / Linkerd). The mesh owns cert rotation, identity, and policy.

Scaling

Interviewer says	Open with
”Scale this stateless service.”	One LB (L4 for raw throughput, L7 for routing / rewriting) fronts N replicas, with state in a DB or cache. Add health checks + slow-start to prevent thundering herd on rollouts.
”Design a rate limiter.”	Token bucket: capacity `C`, refill rate `R`. Bursts up to `C`, steady-state `R`. Key by tenant / user / IP, persist counters in Redis for multi-node consistency.
”Design for 1 M concurrent connections.”	The bottleneck is fan-out, not the sockets themselves. Per-room subscriber index, pub/sub broker (Redis / Kafka / NATS) to broadcast, region-pinned pods, connection-count health signal.
”Deploy with zero downtime.”	Readiness gates rotation. Rolling replaces N pods at a time; blue/green keeps both versions hot and flips the LB; canary shifts traffic by weight (1% → 10% → 100%) while watching error rate.
”Multi-region strategy?”	Active/passive if the cost of replication lag is tolerable; active/active if the app can handle conflict resolution; cell-based when blast radius is the primary concern.

Failure + resilience

Interviewer says	Open with
”Postgres goes down — what happens?”	Clients carry deadlines; a circuit breaker opens after threshold so we fail fast instead of piling up goroutines. Serve cached / read-replica if the endpoint tolerates it; return a structured degradation (503 with a Retry-After) otherwise.
”Why are your latencies spiking?”	Separate p50 from p99 first. Likely suspects: GC pauses, connection pool saturation, downstream tail latency, TCP retransmits on a flaky path, or a cold cache. Instrument with distributed tracing to find the hop.
”Your service keeps 502-ing.”	Usually an LB ↔ backend keep-alive mismatch: LB reuses a connection the backend just closed. Align `keepalive_timeout` (LB < backend) and watch `upstream_reset` logs.
”One user’s bad request is taking down the service.”	You need bulkheads. Separate connection pools per downstream so one slow dependency doesn’t starve the rest; rate-limit per user/tenant, not just globally.
”What’s wrong with retrying every error?”	Retry storms. Each layer retries 3× → stacks multiplicatively (3⁴ = 81 attempts). Retry in one place (client), use exponential backoff with full jitter, and only for idempotent operations or calls with an idempotency key.

If nothing else sticks, memorize this:

Decision	Default	Why
Transport	TCP for correctness, UDP / QUIC for real-time	TCP head-of-line blocking is at the stream layer
HTTP version	HTTP/2 within a datacenter, HTTP/3 at the edge for mobile	/3 rides QUIC → no HOL, connection migration across networks
API style	REST public, gRPC internal, GraphQL when one schema × many clients	Each pattern matches a distinct constraint
Real-time	SSE server→client, WebSocket bi-di, WebRTC media / sub-50ms	Pick the simplest channel that solves the problem
Load balancing	L4 for raw throughput, L7 for HTTP-aware routing	L7 is CPU-bound on TLS, not bandwidth
LB algorithm	Least-connections as default, P2C when stateless, consistent hashing for shard affinity
Resilience stack	`timeout → rate-limit → bulkhead → breaker → retry → call`	Order matters — retries before the breaker amplify failures
Retries	Exponential backoff with full jitter, cap 3–5 attempts, idempotent only	Prevents retry storms
Multi-region	Cell-based for blast-radius control, active / active for sub-minute RTO	Active/passive cheapest, active/active costliest
Caching	`Cache-Control: max-age=N, stale-while-revalidate=M`	Resilience usually beats freshness

Pitfalls to volunteer#

Interviewers reward candidates who surface failure modes before being asked. The list below is short enough to scan the day of; drop one or two where they fit the scenario:

POST is not idempotent. Safe retries need an idempotency key propagated through every layer.
TIME_WAIT port exhaustion on high-churn outbound clients. Use connection pools; avoid tcp_tw_recycle (deprecated, breaks NAT).
Connection: close emitted on every response cripples the client pool.
WebSocket without CheckOrigin is CSRF-over-WebSocket waiting to happen.
WebRTC without a TURN budget is a demo, not a product — plan for 10–20 % of calls to need the relay.
Session affinity is scaling debt. It breaks the symmetry that lets you terminate any pod.
keepalive_timeout asymmetry between LB and backend produces 502 storms.
DNS TTL during failover. A 300 s TTL means 300 s of bleeding traffic to a dead region.
Retry inside a breaker double-counts failures. Decide which layer owns retry semantics.
Reusing a protobuf field number silently breaks wire compatibility. Use reserved.
Unbounded GraphQL queries are a DoS primitive. Enforce depth limits and persisted queries in production.
0-RTT TLS early data is replayable. Never use it for state-changing requests.
HTTP/2 with a self-signed cert in Go needs NextProtos = ["h2", "http/1.1"] — otherwise the client silently falls back to HTTP/1.1.
PMTUD black-holing. A middlebox dropping ICMP “fragmentation needed” packets stalls any segment over the path MTU.

Wrapping up#

A working mental model of networking is cumulative. You will not acquire it in one sitting — you will notice one week that a problem at the layer above is easier because you understood the one below. Use this post as the spine; fill the gaps with whatever production mystery you’re chasing that week.

Corrections and sharper phrasings are welcome. Open an issue on the blog’s repo and I’ll update with attribution.

1. Networking 101#

What each layer actually does#

What happens when you type https://example.com and press Enter#

Ports you are expected to know#

2. Network Layer#

IPv4 vs IPv6 in one table#

What’s in an IPv4 header#

Routing, in one paragraph#

NAT: why your home IP is a lie#

CIDR notation, quick reference#

3. Transport Layer#

TCP: the 3-way handshake#

Reliability, built from primitives#

Flow control vs congestion control#

The states an application actually touches#

UDP: 8 bytes and a shrug#

TCP vs UDP, side by side#

Head-of-line blocking, the reason HTTP/3 exists#

Decision rubric#

Go snippets#

Gotchas interviewers love#

4. Application Layer#

4a. HTTP / HTTPS / HTTP/2 / HTTP/3#

Anatomy of a request#

Status codes you must know cold#

Idempotency matters#

Caching, briefly#

HTTPS: HTTP + TLS#

HTTP/2: binary, multiplexed, one connection#

HTTP/3: HTTP over QUIC over UDP#

Go http.Client: the timeouts you must set#

Interview gotchas for §4a#

4b. REST vs GraphQL vs gRPC#

The over-fetching / under-fetching problem#

REST, done well#

GraphQL: one endpoint, client-shaped responses#

gRPC: typed RPCs over HTTP/2#

Decision rubric#

Interview gotchas for §4b#

4c. SSE vs WebSocket vs WebRTC#

Start with the decision tree#

SSE — the boring answer that’s often right#

WebSocket — when you need full-duplex#

WebRTC — peer-to-peer, media-grade#

Comparison, side by side#

Interview gotchas for §4c#

5. Load Balancing#

Client-side vs dedicated load balancing#

L4 vs L7 — the axis you must know cold#

Balancing algorithms#

Consistent hashing, for cache locality and stateful services#

Health checks — active and passive, together#

TLS termination: where does the crypto happen?#

Global / geo load balancing#

Deployment patterns the LB enables#

Go — a minimal round-robin L7 reverse proxy#

Interview gotchas for §5#

6. Deep Dives — CDN, regional, resilience#

The latency / availability math interviewers expect#

CDNs: push the edge closer#

Regional partitioning: blast-radius management#

Timeouts: the most underappreciated resilience primitive#

Retries with exponential backoff + jitter#

Circuit breaker: stop hitting a dead service#

Bulkheads: isolate the blast radius#

Rate limiting, three places#

Putting it together — the resilience stack#

Interview gotchas for §6#

7. Interview cheat sheet#

Answer template for any networking question#

When you hear X → how to open#

The one-pager#

Pitfalls to volunteer#

Further reading#

Wrapping up#

What happens when you type `https://example.com` and press Enter#

Go `http.Client`: the timeouts you must set#