<?xml version="1.0" encoding="UTF-8"?><rss version="2.0" xmlns:content="http://purl.org/rss/1.0/modules/content/"><channel><title>Thomas Engineer</title><description>Backend notes · Go · Distributed systems</description><link>https://fuwari.vercel.app/</link><language>en</language><item><title>Networking Essentials</title><link>https://fuwari.vercel.app/posts/networking-essentials/</link><guid isPermaLink="true">https://fuwari.vercel.app/posts/networking-essentials/</guid><description>Everything you need for a system-design interview on networking — layers, TCP/UDP, HTTP versions, REST/GraphQL/gRPC, WebSocket/SSE/WebRTC, load balancing, CDN, and resilience patterns. Diagrams + Go code.</description><pubDate>Mon, 20 Apr 2026 08:37:20 GMT</pubDate><content:encoded>&lt;p&gt;Most system-design interviews touch networking. You don&apos;t need to recite RFCs, but you &lt;em&gt;do&lt;/em&gt; need to choose the right protocol, explain &lt;em&gt;why&lt;/em&gt;, and anticipate the failure modes.&lt;/p&gt;
&lt;p&gt;This post is my working cheat sheet — structured for re-reading before an interview, with diagrams, Go snippets, and the trade-offs that actually come up. I rewrote my original HelloInterview notes into seven sections that build on each other: the layer model, then each layer from the wire up to application protocols, then load balancing, then resilience.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;How to use this:&lt;/strong&gt; skim the headings and diagrams first. Second pass, read the &quot;why it matters&quot; paragraphs. Third pass, the code. Don&apos;t memorize — understand the trade-off.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;hr /&gt;
&lt;h2&gt;1. Networking 101&lt;/h2&gt;
&lt;p&gt;Every network interaction is a stack of responsibilities. Each layer talks only to the one directly above and below it, so you can swap implementations without breaking the others — the same HTTP request works whether it rides on Ethernet, Wi-Fi, or LTE.&lt;/p&gt;
&lt;p&gt;Textbooks teach the 7-layer OSI model. In practice everyone uses the &lt;strong&gt;4-layer TCP/IP model&lt;/strong&gt; because the three top OSI layers collapse into &quot;the application decides.&quot; Know both names for the interview; use the 4-layer one when reasoning.&lt;/p&gt;
&lt;p&gt;&amp;lt;figure class=&quot;my-6&quot;&amp;gt;
&amp;lt;svg viewBox=&quot;0 0 560 340&quot; xmlns=&quot;http://www.w3.org/2000/svg&quot; class=&quot;w-full max-w-[520px] mx-auto block&quot; role=&quot;img&quot; aria-label=&quot;4-layer TCP/IP stack&quot;&amp;gt;
&amp;lt;defs&amp;gt;
&amp;lt;marker id=&quot;stack-arrow&quot; viewBox=&quot;0 0 10 10&quot; refX=&quot;9&quot; refY=&quot;5&quot; markerWidth=&quot;7&quot; markerHeight=&quot;7&quot; orient=&quot;auto&quot;&amp;gt;
&amp;lt;path d=&quot;M0,0 L10,5 L0,10 z&quot; fill=&quot;#1e40af&quot;/&amp;gt;
&amp;lt;/marker&amp;gt;
&amp;lt;/defs&amp;gt;&lt;/p&gt;
&lt;p&gt;&amp;lt;!-- L7 Application — blue --&amp;gt;
&amp;lt;rect x=&quot;80&quot; y=&quot;10&quot; width=&quot;400&quot; height=&quot;60&quot; rx=&quot;10&quot; fill=&quot;#dbeafe&quot; stroke=&quot;#3b6fd6&quot; stroke-width=&quot;1.8&quot;/&amp;gt;
&amp;lt;text x=&quot;20&quot;  y=&quot;44&quot; font-family=&quot;ui-monospace,monospace&quot; font-size=&quot;13&quot; font-weight=&quot;600&quot; fill=&quot;#6b7b9a&quot;&amp;gt;L7&amp;lt;/text&amp;gt;
&amp;lt;text x=&quot;100&quot; y=&quot;35&quot; font-family=&quot;Geist Variable, Inter, sans-serif&quot; font-size=&quot;15&quot; font-weight=&quot;600&quot; fill=&quot;#0f172a&quot;&amp;gt;Application&amp;lt;/text&amp;gt;
&amp;lt;text x=&quot;100&quot; y=&quot;56&quot; font-family=&quot;ui-monospace,monospace&quot; font-size=&quot;12&quot; fill=&quot;#475569&quot;&amp;gt;HTTP · DNS · gRPC · SSH&amp;lt;/text&amp;gt;&lt;/p&gt;
&lt;p&gt;&amp;lt;!-- L4 Transport — green --&amp;gt;
&amp;lt;rect x=&quot;80&quot; y=&quot;90&quot; width=&quot;400&quot; height=&quot;60&quot; rx=&quot;10&quot; fill=&quot;#dcfce7&quot; stroke=&quot;#16a34a&quot; stroke-width=&quot;1.8&quot;/&amp;gt;
&amp;lt;text x=&quot;20&quot;  y=&quot;124&quot; font-family=&quot;ui-monospace,monospace&quot; font-size=&quot;13&quot; font-weight=&quot;600&quot; fill=&quot;#6b7b9a&quot;&amp;gt;L4&amp;lt;/text&amp;gt;
&amp;lt;text x=&quot;100&quot; y=&quot;115&quot; font-family=&quot;Geist Variable, Inter, sans-serif&quot; font-size=&quot;15&quot; font-weight=&quot;600&quot; fill=&quot;#14532d&quot;&amp;gt;Transport&amp;lt;/text&amp;gt;
&amp;lt;text x=&quot;100&quot; y=&quot;136&quot; font-family=&quot;ui-monospace,monospace&quot; font-size=&quot;12&quot; fill=&quot;#475569&quot;&amp;gt;TCP · UDP · QUIC&amp;lt;/text&amp;gt;&lt;/p&gt;
&lt;p&gt;&amp;lt;!-- L3 Internet — amber --&amp;gt;
&amp;lt;rect x=&quot;80&quot; y=&quot;170&quot; width=&quot;400&quot; height=&quot;60&quot; rx=&quot;10&quot; fill=&quot;#fef3c7&quot; stroke=&quot;#ca8a04&quot; stroke-width=&quot;1.8&quot;/&amp;gt;
&amp;lt;text x=&quot;20&quot;  y=&quot;204&quot; font-family=&quot;ui-monospace,monospace&quot; font-size=&quot;13&quot; font-weight=&quot;600&quot; fill=&quot;#6b7b9a&quot;&amp;gt;L3&amp;lt;/text&amp;gt;
&amp;lt;text x=&quot;100&quot; y=&quot;195&quot; font-family=&quot;Geist Variable, Inter, sans-serif&quot; font-size=&quot;15&quot; font-weight=&quot;600&quot; fill=&quot;#713f12&quot;&amp;gt;Internet&amp;lt;/text&amp;gt;
&amp;lt;text x=&quot;100&quot; y=&quot;216&quot; font-family=&quot;ui-monospace,monospace&quot; font-size=&quot;12&quot; fill=&quot;#475569&quot;&amp;gt;IP · ICMP · routing&amp;lt;/text&amp;gt;&lt;/p&gt;
&lt;p&gt;&amp;lt;!-- L2 Link — purple --&amp;gt;
&amp;lt;rect x=&quot;80&quot; y=&quot;250&quot; width=&quot;400&quot; height=&quot;60&quot; rx=&quot;10&quot; fill=&quot;#e9d5ff&quot; stroke=&quot;#7c3aed&quot; stroke-width=&quot;1.8&quot;/&amp;gt;
&amp;lt;text x=&quot;20&quot;  y=&quot;284&quot; font-family=&quot;ui-monospace,monospace&quot; font-size=&quot;13&quot; font-weight=&quot;600&quot; fill=&quot;#6b7b9a&quot;&amp;gt;L2&amp;lt;/text&amp;gt;
&amp;lt;text x=&quot;100&quot; y=&quot;275&quot; font-family=&quot;Geist Variable, Inter, sans-serif&quot; font-size=&quot;15&quot; font-weight=&quot;600&quot; fill=&quot;#4c1d95&quot;&amp;gt;Link&amp;lt;/text&amp;gt;
&amp;lt;text x=&quot;100&quot; y=&quot;296&quot; font-family=&quot;ui-monospace,monospace&quot; font-size=&quot;12&quot; fill=&quot;#475569&quot;&amp;gt;Ethernet · Wi-Fi · ARP · MAC&amp;lt;/text&amp;gt;&lt;/p&gt;
&lt;p&gt;&amp;lt;!-- Encapsulation arrows between layers --&amp;gt;
&amp;lt;g stroke=&quot;#1e40af&quot; stroke-width=&quot;2&quot; fill=&quot;none&quot; marker-end=&quot;url(#stack-arrow)&quot;&amp;gt;
&amp;lt;path d=&quot;M 280 72 L 280 86&quot;/&amp;gt;
&amp;lt;path d=&quot;M 280 152 L 280 166&quot;/&amp;gt;
&amp;lt;path d=&quot;M 280 232 L 280 246&quot;/&amp;gt;
&amp;lt;/g&amp;gt;&lt;/p&gt;
&lt;p&gt;&amp;lt;!-- Encapsulation direction label --&amp;gt;
&amp;lt;g font-family=&quot;ui-monospace,monospace&quot; font-size=&quot;11&quot; fill=&quot;#6b7b9a&quot; text-anchor=&quot;middle&quot;&amp;gt;
&amp;lt;text x=&quot;280&quot; y=&quot;328&quot;&amp;gt;encapsulates ↓&amp;lt;/text&amp;gt;
&amp;lt;/g&amp;gt;
&amp;lt;/svg&amp;gt;
&amp;lt;figcaption class=&quot;text-center text-sm text-50 -mt-2&quot;&amp;gt;The 4-layer TCP/IP stack. Each layer wraps the payload from the one above.&amp;lt;/figcaption&amp;gt;
&amp;lt;/figure&amp;gt;&lt;/p&gt;
&lt;p&gt;&amp;lt;p class=&quot;text-center text-sm text-50 -mt-2&quot;&amp;gt;The 4-layer TCP/IP stack. Each layer wraps the payload from the layer above.&amp;lt;/p&amp;gt;&lt;/p&gt;
&lt;h3&gt;What each layer actually does&lt;/h3&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Layer&lt;/th&gt;
&lt;th&gt;Job in one sentence&lt;/th&gt;
&lt;th&gt;Unit&lt;/th&gt;
&lt;th&gt;Addresses&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Application&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&quot;What does this message mean?&quot;&lt;/td&gt;
&lt;td&gt;message&lt;/td&gt;
&lt;td&gt;URLs, hostnames&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Transport&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&quot;Who on that machine should get it, and is it reliable?&quot;&lt;/td&gt;
&lt;td&gt;segment (TCP) / datagram (UDP)&lt;/td&gt;
&lt;td&gt;port numbers&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Internet&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&quot;Which machine on the internet, and how do we route there?&quot;&lt;/td&gt;
&lt;td&gt;packet&lt;/td&gt;
&lt;td&gt;IP addresses&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Link&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&quot;How do we put bits on this physical medium?&quot;&lt;/td&gt;
&lt;td&gt;frame&lt;/td&gt;
&lt;td&gt;MAC addresses&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;A packet at the link layer is literally a nested envelope: &lt;code&gt;[ Ethernet [ IP [ TCP [ HTTP ... ] ] ] ]&lt;/code&gt;. Each device along the route peels off the link-layer envelope to decide where to forward next, then reseals with a new one.&lt;/p&gt;
&lt;h3&gt;What happens when you type &lt;code&gt;https://example.com&lt;/code&gt; and press Enter&lt;/h3&gt;
&lt;p&gt;This is the single most common warm-up question. The full answer touches every layer:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;sequenceDiagram
    autonumber
    participant C as 🖥️ Client
    participant D as 🗺️ DNS
    participant S as 🌐 Server :443

    rect rgb(254, 249, 195)
    Note over C,D: Phase 1 — DNS resolution
    C-&amp;gt;&amp;gt;D: query A example.com
    activate D
    D--&amp;gt;&amp;gt;C: 93.184.216.34
    deactivate D
    end

    rect rgb(219, 234, 254)
    Note over C,S: Phase 2 — TCP 3-way handshake
    C-&amp;gt;&amp;gt;S: SYN  seq=x
    activate S
    S--&amp;gt;&amp;gt;C: SYN·ACK  seq=y, ack=x+1
    C-&amp;gt;&amp;gt;S: ACK  ack=y+1
    end

    rect rgb(220, 252, 231)
    Note over C,S: Phase 3 — TLS 1.3 handshake (1 RTT)
    C-&amp;gt;&amp;gt;S: ClientHello + key share
    S--&amp;gt;&amp;gt;C: ServerHello + cert + Finished
    Note over C,S: both sides derive the session key
    end

    rect rgb(233, 213, 255)
    Note over C,S: Phase 4 — HTTP over TLS
    C-&amp;gt;&amp;gt;S: GET / HTTP/1.1
    S--&amp;gt;&amp;gt;C: 200 OK · Content-Type: text/html
    deactivate S
    end
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&amp;lt;p class=&quot;text-center text-sm text-50 -mt-2&quot;&amp;gt;A single HTTPS request touches DNS, TCP, TLS, and HTTP. HTTP/2 and /3 fold steps 2 + 3 + 4 into fewer round-trips.&amp;lt;/p&amp;gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Mental shortcut for the interview:&lt;/strong&gt;&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Resolve&lt;/strong&gt; — DNS turns &lt;code&gt;example.com&lt;/code&gt; into an IP (UDP 53, or TCP 53 for large answers).&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Connect&lt;/strong&gt; — TCP 3-way handshake (SYN, SYN-ACK, ACK) to the IP on port 443.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Secure&lt;/strong&gt; — TLS handshake negotiates a session key; with TLS 1.3 this is 1 RTT, sometimes 0-RTT on resumption.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Request&lt;/strong&gt; — the client sends an HTTP request over the encrypted stream.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Respond&lt;/strong&gt; — the server sends HTML. The browser parses, finds asset URLs, and repeats.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Every one of those steps is a potential failure mode an interviewer can probe. Keep it at the tip of your tongue.&lt;/p&gt;
&lt;h3&gt;Ports you are expected to know&lt;/h3&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Port&lt;/th&gt;
&lt;th&gt;Protocol&lt;/th&gt;
&lt;th&gt;What&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;22&lt;/td&gt;
&lt;td&gt;TCP&lt;/td&gt;
&lt;td&gt;SSH&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;53&lt;/td&gt;
&lt;td&gt;UDP / TCP&lt;/td&gt;
&lt;td&gt;DNS&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;80&lt;/td&gt;
&lt;td&gt;TCP&lt;/td&gt;
&lt;td&gt;HTTP&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;443&lt;/td&gt;
&lt;td&gt;TCP / UDP&lt;/td&gt;
&lt;td&gt;HTTPS (UDP if HTTP/3)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;6379&lt;/td&gt;
&lt;td&gt;TCP&lt;/td&gt;
&lt;td&gt;Redis&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;5432&lt;/td&gt;
&lt;td&gt;TCP&lt;/td&gt;
&lt;td&gt;Postgres&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;9092&lt;/td&gt;
&lt;td&gt;TCP&lt;/td&gt;
&lt;td&gt;Kafka&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;hr /&gt;
&lt;h2&gt;2. Network Layer&lt;/h2&gt;
&lt;p&gt;The network (L3) layer&apos;s job is to get a packet from a source IP to a destination IP, possibly across many routers. It doesn&apos;t care about ports, connections, or reliability — those are the transport layer&apos;s problem.&lt;/p&gt;
&lt;h3&gt;IPv4 vs IPv6 in one table&lt;/h3&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;IPv4&lt;/th&gt;
&lt;th&gt;IPv6&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Address size&lt;/td&gt;
&lt;td&gt;32 bits&lt;/td&gt;
&lt;td&gt;128 bits&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Total addresses&lt;/td&gt;
&lt;td&gt;~4.3 × 10⁹ (exhausted since 2011)&lt;/td&gt;
&lt;td&gt;~3.4 × 10³⁸&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Notation&lt;/td&gt;
&lt;td&gt;&lt;code&gt;192.168.1.1&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;2001:db8::1&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Header&lt;/td&gt;
&lt;td&gt;variable (20–60 B), checksummed&lt;/td&gt;
&lt;td&gt;fixed 40 B, no checksum&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;NAT needed?&lt;/td&gt;
&lt;td&gt;yes, universally&lt;/td&gt;
&lt;td&gt;designed to not need it&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Packet fragmentation&lt;/td&gt;
&lt;td&gt;routers can fragment&lt;/td&gt;
&lt;td&gt;only the sender; routers drop + PMTUD&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Configuration&lt;/td&gt;
&lt;td&gt;DHCP or static&lt;/td&gt;
&lt;td&gt;SLAAC (stateless) + DHCPv6&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;In interviews: IPv6 adoption is slow because NAT + CGNAT let IPv4 limp along, and because dual-stack migration is politically painful. Design for both when you can, deploy behind a load balancer that terminates either.&lt;/p&gt;
&lt;h3&gt;What&apos;s in an IPv4 header&lt;/h3&gt;
&lt;p&gt;You don&apos;t need to memorize byte offsets, but knowing what fields exist explains a lot of real behavior — MTU issues, &lt;code&gt;traceroute&lt;/code&gt; output, and why &lt;code&gt;iptables&lt;/code&gt; rules reference TTL.&lt;/p&gt;
&lt;p&gt;&amp;lt;figure&amp;gt;
&amp;lt;svg viewBox=&quot;0 0 640 240&quot; xmlns=&quot;http://www.w3.org/2000/svg&quot; class=&quot;w-full max-w-[640px] mx-auto my-4&quot; role=&quot;img&quot; aria-label=&quot;IPv4 packet header layout&quot;&amp;gt;
&amp;lt;text x=&quot;80&quot;  y=&quot;16&quot; font-family=&quot;ui-monospace,monospace&quot; font-size=&quot;10&quot; font-weight=&quot;600&quot; fill=&quot;#6b7b9a&quot; text-anchor=&quot;middle&quot;&amp;gt;0&amp;lt;/text&amp;gt;
&amp;lt;text x=&quot;180&quot; y=&quot;16&quot; font-family=&quot;ui-monospace,monospace&quot; font-size=&quot;10&quot; font-weight=&quot;600&quot; fill=&quot;#6b7b9a&quot; text-anchor=&quot;middle&quot;&amp;gt;8&amp;lt;/text&amp;gt;
&amp;lt;text x=&quot;360&quot; y=&quot;16&quot; font-family=&quot;ui-monospace,monospace&quot; font-size=&quot;10&quot; font-weight=&quot;600&quot; fill=&quot;#6b7b9a&quot; text-anchor=&quot;middle&quot;&amp;gt;16&amp;lt;/text&amp;gt;
&amp;lt;text x=&quot;620&quot; y=&quot;16&quot; font-family=&quot;ui-monospace,monospace&quot; font-size=&quot;10&quot; font-weight=&quot;600&quot; fill=&quot;#6b7b9a&quot; text-anchor=&quot;middle&quot;&amp;gt;31 bits&amp;lt;/text&amp;gt;&lt;/p&gt;
&lt;p&gt;&amp;lt;g fill=&quot;#f8fafc&quot; stroke=&quot;#6b7b9a&quot; stroke-width=&quot;1&quot;&amp;gt;
&amp;lt;rect x=&quot;40&quot;  y=&quot;24&quot; width=&quot;60&quot;  height=&quot;36&quot;/&amp;gt;
&amp;lt;rect x=&quot;100&quot; y=&quot;24&quot; width=&quot;60&quot;  height=&quot;36&quot;/&amp;gt;
&amp;lt;rect x=&quot;160&quot; y=&quot;24&quot; width=&quot;120&quot; height=&quot;36&quot;/&amp;gt;
&amp;lt;rect x=&quot;280&quot; y=&quot;24&quot; width=&quot;320&quot; height=&quot;36&quot;/&amp;gt;
&amp;lt;rect x=&quot;40&quot;  y=&quot;60&quot; width=&quot;240&quot; height=&quot;36&quot;/&amp;gt;
&amp;lt;rect x=&quot;280&quot; y=&quot;60&quot; width=&quot;80&quot;  height=&quot;36&quot;/&amp;gt;
&amp;lt;rect x=&quot;360&quot; y=&quot;60&quot; width=&quot;240&quot; height=&quot;36&quot;/&amp;gt;
&amp;lt;rect x=&quot;240&quot; y=&quot;96&quot; width=&quot;360&quot; height=&quot;36&quot;/&amp;gt;
&amp;lt;/g&amp;gt;
&amp;lt;g fill=&quot;#dbeafe&quot; stroke=&quot;#3b6fd6&quot; stroke-width=&quot;1&quot;&amp;gt;
&amp;lt;rect x=&quot;40&quot;  y=&quot;96&quot;  width=&quot;80&quot;  height=&quot;36&quot;/&amp;gt;
&amp;lt;rect x=&quot;120&quot; y=&quot;96&quot;  width=&quot;120&quot; height=&quot;36&quot;/&amp;gt;
&amp;lt;rect x=&quot;40&quot;  y=&quot;132&quot; width=&quot;560&quot; height=&quot;36&quot;/&amp;gt;
&amp;lt;rect x=&quot;40&quot;  y=&quot;168&quot; width=&quot;560&quot; height=&quot;36&quot;/&amp;gt;
&amp;lt;/g&amp;gt;
&amp;lt;rect x=&quot;40&quot; y=&quot;204&quot; width=&quot;560&quot; height=&quot;28&quot; fill=&quot;#f8fafc&quot; stroke=&quot;#6b7b9a&quot; stroke-width=&quot;1&quot; stroke-dasharray=&quot;3 3&quot;/&amp;gt;&lt;/p&gt;
&lt;p&gt;&amp;lt;g font-family=&quot;Geist Variable, Inter, sans-serif&quot; font-size=&quot;11&quot; font-weight=&quot;500&quot; fill=&quot;#0f172a&quot; text-anchor=&quot;middle&quot;&amp;gt;
&amp;lt;text x=&quot;70&quot;  y=&quot;44&quot;&amp;gt;Ver&amp;lt;/text&amp;gt;
&amp;lt;text x=&quot;130&quot; y=&quot;44&quot;&amp;gt;IHL&amp;lt;/text&amp;gt;
&amp;lt;text x=&quot;220&quot; y=&quot;44&quot;&amp;gt;ToS / DSCP&amp;lt;/text&amp;gt;
&amp;lt;text x=&quot;440&quot; y=&quot;44&quot;&amp;gt;Total length (bytes)&amp;lt;/text&amp;gt;
&amp;lt;text x=&quot;160&quot; y=&quot;80&quot;&amp;gt;Identification&amp;lt;/text&amp;gt;
&amp;lt;text x=&quot;320&quot; y=&quot;80&quot;&amp;gt;Flags&amp;lt;/text&amp;gt;
&amp;lt;text x=&quot;480&quot; y=&quot;80&quot;&amp;gt;Fragment offset&amp;lt;/text&amp;gt;
&amp;lt;text x=&quot;80&quot;  y=&quot;116&quot;&amp;gt;TTL&amp;lt;/text&amp;gt;
&amp;lt;text x=&quot;180&quot; y=&quot;116&quot;&amp;gt;Protocol&amp;lt;/text&amp;gt;
&amp;lt;text x=&quot;420&quot; y=&quot;116&quot;&amp;gt;Header checksum&amp;lt;/text&amp;gt;
&amp;lt;text x=&quot;320&quot; y=&quot;152&quot;&amp;gt;Source IP address&amp;lt;/text&amp;gt;
&amp;lt;text x=&quot;320&quot; y=&quot;188&quot;&amp;gt;Destination IP address&amp;lt;/text&amp;gt;
&amp;lt;text x=&quot;320&quot; y=&quot;222&quot;&amp;gt;Options (rarely used) + payload ↓&amp;lt;/text&amp;gt;
&amp;lt;/g&amp;gt;
&amp;lt;g font-family=&quot;ui-monospace, monospace&quot; font-size=&quot;10&quot; fill=&quot;#6b7b9a&quot; text-anchor=&quot;middle&quot;&amp;gt;
&amp;lt;text x=&quot;70&quot;  y=&quot;56&quot;&amp;gt;4&amp;lt;/text&amp;gt;
&amp;lt;text x=&quot;130&quot; y=&quot;56&quot;&amp;gt;4&amp;lt;/text&amp;gt;
&amp;lt;text x=&quot;220&quot; y=&quot;56&quot;&amp;gt;8&amp;lt;/text&amp;gt;
&amp;lt;text x=&quot;440&quot; y=&quot;56&quot;&amp;gt;16&amp;lt;/text&amp;gt;
&amp;lt;text x=&quot;160&quot; y=&quot;92&quot;&amp;gt;16&amp;lt;/text&amp;gt;
&amp;lt;text x=&quot;320&quot; y=&quot;92&quot;&amp;gt;3&amp;lt;/text&amp;gt;
&amp;lt;text x=&quot;480&quot; y=&quot;92&quot;&amp;gt;13&amp;lt;/text&amp;gt;
&amp;lt;text x=&quot;80&quot;  y=&quot;128&quot;&amp;gt;8&amp;lt;/text&amp;gt;
&amp;lt;text x=&quot;180&quot; y=&quot;128&quot;&amp;gt;8 (6=TCP, 17=UDP)&amp;lt;/text&amp;gt;
&amp;lt;text x=&quot;420&quot; y=&quot;128&quot;&amp;gt;16&amp;lt;/text&amp;gt;
&amp;lt;text x=&quot;320&quot; y=&quot;164&quot;&amp;gt;32&amp;lt;/text&amp;gt;
&amp;lt;text x=&quot;320&quot; y=&quot;200&quot;&amp;gt;32&amp;lt;/text&amp;gt;
&amp;lt;/g&amp;gt;
&amp;lt;/svg&amp;gt;
&amp;lt;figcaption class=&quot;text-center text-sm text-50&quot;&amp;gt;The highlighted fields are the ones you&apos;ll reference in interviews: TTL, Protocol, and the addresses. Total length caps at 65,535 bytes.&amp;lt;/figcaption&amp;gt;
&amp;lt;/figure&amp;gt;&lt;/p&gt;
&lt;p&gt;The two fields that come up the most:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;TTL&lt;/strong&gt; (Time To Live) — a hop counter. Each router decrements it; when it hits 0 the packet is dropped and an ICMP &quot;time exceeded&quot; is sent back. &lt;code&gt;traceroute&lt;/code&gt; exploits this by sending probes with &lt;code&gt;TTL=1, 2, 3…&lt;/code&gt; and listening for the ICMP replies.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Protocol&lt;/strong&gt; — tells the receiver how to interpret the payload: 6 for TCP, 17 for UDP, 1 for ICMP.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;Routing, in one paragraph&lt;/h3&gt;
&lt;p&gt;Routers maintain &lt;strong&gt;routing tables&lt;/strong&gt; that map &quot;destination prefix → next hop.&quot; When a packet arrives, the router looks up the &lt;em&gt;longest matching prefix&lt;/em&gt; of the destination IP and forwards to the corresponding next hop. On the internet, routing tables are built dynamically by BGP between autonomous systems. On your laptop, the table has two useful entries — your subnet (&lt;code&gt;192.168.1.0/24 → direct&lt;/code&gt;) and everything else (&lt;code&gt;0.0.0.0/0 → your gateway&lt;/code&gt;).&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;# See your routing table
$ ip route
default via 192.168.1.1 dev wlan0
192.168.1.0/24 dev wlan0 proto kernel scope link src 192.168.1.42

# Trace the hops to a destination
$ traceroute -n example.com
 1  192.168.1.1         1.4 ms
 2  100.64.0.1          8.2 ms       # CGNAT inside the ISP
 3  203.0.113.5         9.1 ms
 ...
&lt;/code&gt;&lt;/pre&gt;
&lt;h3&gt;NAT: why your home IP is a lie&lt;/h3&gt;
&lt;p&gt;Your laptop&apos;s &lt;code&gt;192.168.1.42&lt;/code&gt; is a &lt;strong&gt;private&lt;/strong&gt; IP, invalid on the public internet. Your router does &lt;strong&gt;Network Address Translation&lt;/strong&gt;: when you send a packet, it rewrites the source IP to the router&apos;s public IP and remembers the translation in a table. When the reply comes back, it rewrites the destination back to &lt;code&gt;192.168.1.42&lt;/code&gt; and forwards to your laptop.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;flowchart LR
    laptop[&quot;&amp;lt;b&amp;gt;Your laptop&amp;lt;/b&amp;gt;&amp;lt;br/&amp;gt;&amp;lt;code&amp;gt;192.168.1.42&amp;lt;/code&amp;gt;&amp;lt;br/&amp;gt;src port 51000&quot;]
    router[&quot;&amp;lt;b&amp;gt;Router (NAT)&amp;lt;/b&amp;gt;&amp;lt;br/&amp;gt;priv 192.168.1.1&amp;lt;br/&amp;gt;pub 203.0.113.17&amp;lt;br/&amp;gt;&amp;lt;i&amp;gt;keeps translation table&amp;lt;/i&amp;gt;&quot;]
    target[&quot;&amp;lt;b&amp;gt;example.com&amp;lt;/b&amp;gt;&amp;lt;br/&amp;gt;&amp;lt;code&amp;gt;93.184.216.34&amp;lt;/code&amp;gt;&amp;lt;br/&amp;gt;dst port 443&quot;]

    laptop --&amp;gt;|&quot;outbound&amp;lt;br/&amp;gt;src 192.168.1.42:51000&quot;| router
    router --&amp;gt;|&quot;rewritten&amp;lt;br/&amp;gt;src 203.0.113.17:60321&quot;| target
    target --&amp;gt;|&quot;reply&amp;lt;br/&amp;gt;dst 203.0.113.17:60321&quot;| router
    router --&amp;gt;|&quot;rewritten&amp;lt;br/&amp;gt;dst 192.168.1.42:51000&quot;| laptop

    classDef neutral fill:#dbeafe,stroke:#3b6fd6,stroke-width:1.5px,color:#0f172a;
    classDef highlight fill:#e9d5ff,stroke:#7c3aed,stroke-width:2px,color:#4c1d95;
    class laptop,target neutral
    class router highlight
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&amp;lt;p class=&quot;text-center text-sm text-50 -mt-2&quot;&amp;gt;One public IP fronts many private hosts by multiplexing on the source port. NAT breaks when two sides both need to initiate — see WebRTC in §4c.&amp;lt;/p&amp;gt;&lt;/p&gt;
&lt;h3&gt;CIDR notation, quick reference&lt;/h3&gt;
&lt;p&gt;&lt;code&gt;192.168.1.0/24&lt;/code&gt; means the first 24 bits are the network prefix, leaving 8 bits = 256 addresses (minus 2 for network + broadcast). Memorize these edge cases:&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Prefix&lt;/th&gt;
&lt;th&gt;Size&lt;/th&gt;
&lt;th&gt;Common use&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;/32&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;1 address&lt;/td&gt;
&lt;td&gt;single host&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;/24&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;256&lt;/td&gt;
&lt;td&gt;small subnet, home LAN&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;/16&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;65,536&lt;/td&gt;
&lt;td&gt;corp subnet&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;/8&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;16 M&lt;/td&gt;
&lt;td&gt;legacy class A (&lt;code&gt;10.0.0.0/8&lt;/code&gt; private)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;/0&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;all&lt;/td&gt;
&lt;td&gt;default route&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;RFC 1918 private ranges (never routable on the public internet): &lt;code&gt;10.0.0.0/8&lt;/code&gt;, &lt;code&gt;172.16.0.0/12&lt;/code&gt;, &lt;code&gt;192.168.0.0/16&lt;/code&gt;. &lt;code&gt;169.254.0.0/16&lt;/code&gt; is link-local (what you get if DHCP fails). &lt;code&gt;127.0.0.0/8&lt;/code&gt; is loopback.&lt;/p&gt;
&lt;hr /&gt;
&lt;h2&gt;3. Transport Layer&lt;/h2&gt;
&lt;p&gt;IP gets a packet to a &lt;strong&gt;host&lt;/strong&gt;. The transport layer gets it to the &lt;strong&gt;right program on that host&lt;/strong&gt;, using a 16-bit port number. It also decides whether the stream is reliable, ordered, and flow-controlled (TCP) or fire-and-forget (UDP). Everything above this layer — HTTP, gRPC, DNS, SMTP — is just a convention layered on top of one of these two.&lt;/p&gt;
&lt;h3&gt;TCP: the 3-way handshake&lt;/h3&gt;
&lt;p&gt;Before any data flows, TCP opens a connection by exchanging three segments. Each side picks a random &lt;strong&gt;initial sequence number&lt;/strong&gt; (ISN) to defend against blind spoofing, and each side acks the other&apos;s ISN + 1.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;sequenceDiagram
    autonumber
    participant C as 🖥️ Client
    participant S as 🌐 Server

    Note over C: state: CLOSED
    Note over S: state: LISTEN
    rect rgb(219, 234, 254)
    C-&amp;gt;&amp;gt;S: SYN · seq=x
    activate S
    Note over C: SYN_SENT
    Note over S: SYN_RCVD
    S--&amp;gt;&amp;gt;C: SYN·ACK · seq=y, ack=x+1
    C-&amp;gt;&amp;gt;S: ACK · ack=y+1
    deactivate S
    end
    Note over C,S: ✅ ESTABLISHED — 1 full RTT before the first byte of payload
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&amp;lt;p class=&quot;text-center text-sm text-50 -mt-2&quot;&amp;gt;Three segments, one RTT. Motivates keep-alive, connection pooling, and HTTP/2 multiplexing.&amp;lt;/p&amp;gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Two interview gotchas on the handshake:&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;SYN flood.&lt;/strong&gt; If the server commits memory on every received &lt;code&gt;SYN&lt;/code&gt;, an attacker can flood them and exhaust connection tables. The fix is &lt;strong&gt;SYN cookies&lt;/strong&gt; — encode the connection state in the ACK sequence number and allocate memory only after the client&apos;s final &lt;code&gt;ACK&lt;/code&gt; proves it came from the real source.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Half-open connection.&lt;/strong&gt; If the client crashes after the 3-way handshake, the server has no idea. Keep-alive probes (TCP or application-level) exist to detect this; defaulting to &quot;forever-idle sockets are fine&quot; is wrong.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;Reliability, built from primitives&lt;/h3&gt;
&lt;p&gt;TCP is a &lt;strong&gt;reliable, ordered, byte stream&lt;/strong&gt;. It achieves that on top of an unreliable IP layer with four mechanisms layered on each other:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Sequence numbers&lt;/strong&gt; on every byte. The receiver reorders out-of-order segments into a contiguous stream.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Cumulative ACKs.&lt;/strong&gt; &lt;code&gt;ack = N&lt;/code&gt; means &quot;I have received everything up to byte &lt;code&gt;N-1&lt;/code&gt;.&quot;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Retransmission on timeout.&lt;/strong&gt; The sender keeps a running estimate of RTT. If no ACK arrives within &lt;code&gt;RTO&lt;/code&gt;, resend.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Fast retransmit.&lt;/strong&gt; If the sender gets three duplicate ACKs (&lt;code&gt;ack = K&lt;/code&gt; four times), it infers a single segment was lost and resends &lt;em&gt;without&lt;/em&gt; waiting for the RTO.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;The sender does not send one segment at a time — it keeps a whole &lt;strong&gt;window&lt;/strong&gt; of bytes in-flight:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;flowchart LR
    B1[&quot;1&quot;]:::acked --- B2[&quot;2&quot;]:::acked --- B3[&quot;3&quot;]:::acked
    B3 ==&amp;gt;|&quot;send base&quot;| B4
    B4[&quot;4&quot;]:::flight --- B5[&quot;5&quot;]:::flight --- B6[&quot;6&quot;]:::flight --- B7[&quot;7&quot;]:::flight
    B7 ==&amp;gt;|&quot;next byte to send&quot;| B8
    B8[&quot;8&quot;]:::avail --- B9[&quot;9&quot;]:::avail
    B9 ===|&quot;window edge&quot;| B10
    B10[&quot;10&quot;]:::blocked --- B11[&quot;11&quot;]:::blocked

    subgraph legend [&quot;Legend&quot;]
        direction LR
        L1[&quot; acked &quot;]:::acked
        L2[&quot; in-flight (unacked) &quot;]:::flight
        L3[&quot; can send now &quot;]:::avail
        L4[&quot; blocked (beyond window) &quot;]:::blocked
    end

    classDef acked   fill:#e5e7eb,stroke:#6b7b9a,color:#475569;
    classDef flight  fill:#93c5fd,stroke:#3b6fd6,stroke-width:1.5px,color:#0f172a;
    classDef avail   fill:#f1f5f9,stroke:#6b7b9a,color:#475569;
    classDef blocked fill:#fafafa,stroke:#6b7280,stroke-dasharray:3 3,color:#6b7280;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&amp;lt;p class=&quot;text-center text-sm text-50 -mt-2&quot;&amp;gt;The sender can have &amp;lt;code&amp;gt;window = min(cwnd, rwnd)&amp;lt;/code&amp;gt; bytes in-flight without waiting for ACKs. Every ACK slides the window right; every loss shrinks it.&amp;lt;/p&amp;gt;&lt;/p&gt;
&lt;h3&gt;Flow control vs congestion control&lt;/h3&gt;
&lt;p&gt;These sound alike but answer different questions, and interviewers &lt;em&gt;will&lt;/em&gt; test whether you know the difference.&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;Flow control&lt;/th&gt;
&lt;th&gt;Congestion control&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Protects&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;the receiver&lt;/td&gt;
&lt;td&gt;the network&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Lives on&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;the receiver advertises &lt;code&gt;rwnd&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;the sender computes &lt;code&gt;cwnd&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Signal&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;receiver&apos;s buffer space, sent back in every ACK&lt;/td&gt;
&lt;td&gt;packet loss + RTT trends&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Classic algorithm&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;simple: &lt;code&gt;rwnd&lt;/code&gt; in header&lt;/td&gt;
&lt;td&gt;Reno / CUBIC / BBR&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;&lt;strong&gt;Slow start + AIMD in one line each.&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;em&gt;Slow start&lt;/em&gt;: new connection, &lt;code&gt;cwnd&lt;/code&gt; doubles every RTT until loss.&lt;/li&gt;
&lt;li&gt;&lt;em&gt;AIMD&lt;/em&gt; (Reno / CUBIC after slow start): &lt;strong&gt;A&lt;/strong&gt;dditive &lt;strong&gt;I&lt;/strong&gt;ncrease, &lt;strong&gt;M&lt;/strong&gt;ultiplicative &lt;strong&gt;D&lt;/strong&gt;ecrease. On each ACK, &lt;code&gt;cwnd += 1/cwnd&lt;/code&gt;. On loss, &lt;code&gt;cwnd /= 2&lt;/code&gt;.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;BBR (Google&apos;s newer algorithm, used on YouTube + many CDNs) breaks from AIMD entirely — it models the path&apos;s bandwidth and RTT directly and ignores loss as the primary signal. Worth mentioning when the interviewer asks about modern TCP behavior under shallow-buffered bufferbloat networks.&lt;/p&gt;
&lt;h3&gt;The states an application actually touches&lt;/h3&gt;
&lt;pre&gt;&lt;code&gt;stateDiagram-v2
    direction TB
    [*] --&amp;gt; CLOSED
    CLOSED --&amp;gt; SYN_SENT: active open&amp;lt;br/&amp;gt;send SYN
    CLOSED --&amp;gt; LISTEN: passive open
    SYN_SENT --&amp;gt; ESTABLISHED: recv SYN-ACK&amp;lt;br/&amp;gt;send ACK
    LISTEN --&amp;gt; SYN_RCVD: recv SYN&amp;lt;br/&amp;gt;send SYN-ACK
    SYN_RCVD --&amp;gt; ESTABLISHED: recv ACK
    ESTABLISHED --&amp;gt; FIN_WAIT_1: close()&amp;lt;br/&amp;gt;send FIN
    ESTABLISHED --&amp;gt; CLOSE_WAIT: recv FIN&amp;lt;br/&amp;gt;send ACK
    FIN_WAIT_1 --&amp;gt; FIN_WAIT_2: recv ACK
    FIN_WAIT_2 --&amp;gt; TIME_WAIT: recv FIN&amp;lt;br/&amp;gt;send ACK
    TIME_WAIT --&amp;gt; CLOSED: 2 × MSL timeout
    CLOSE_WAIT --&amp;gt; LAST_ACK: close()&amp;lt;br/&amp;gt;send FIN
    LAST_ACK --&amp;gt; CLOSED: recv ACK

    classDef good fill:#dcfce7,stroke:#16a34a,stroke-width:2px,color:#14532d;
    classDef closing fill:#fef3c7,stroke:#ca8a04,stroke-width:1.5px,color:#713f12;
    classDef terminal fill:#f3f4f6,stroke:#6b7280,stroke-width:1.5px,color:#1f2937;
    class ESTABLISHED good
    class FIN_WAIT_1,FIN_WAIT_2,CLOSE_WAIT,LAST_ACK,TIME_WAIT closing
    class CLOSED terminal
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&amp;lt;p class=&quot;text-center text-sm text-50 -mt-2&quot;&amp;gt;Simplified TCP state machine. Left branch = active side (initiator). Right branch = passive side (usually the server). Full spec has 11 states; these are the ones you&apos;ll reference in a debugging story.&amp;lt;/p&amp;gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Why &lt;code&gt;TIME_WAIT&lt;/code&gt; matters.&lt;/strong&gt; After an active close, the initiator stays in &lt;code&gt;TIME_WAIT&lt;/code&gt; for 2 × MSL (typically 60 s on Linux) before the quadruple &lt;code&gt;(src-ip, src-port, dst-ip, dst-port)&lt;/code&gt; can be reused. On a server that opens many outbound connections (a payment gateway, a scraper) this can exhaust ephemeral ports. Mitigations: enable &lt;code&gt;SO_REUSEADDR&lt;/code&gt; / &lt;code&gt;net.ipv4.tcp_tw_reuse&lt;/code&gt;, use connection pooling, or reduce churn.&lt;/p&gt;
&lt;h3&gt;UDP: 8 bytes and a shrug&lt;/h3&gt;
&lt;p&gt;UDP&apos;s header is the minimum viable protocol.&lt;/p&gt;
&lt;p&gt;&amp;lt;figure&amp;gt;
&amp;lt;svg viewBox=&quot;0 0 640 120&quot; xmlns=&quot;http://www.w3.org/2000/svg&quot; class=&quot;w-full max-w-[640px] mx-auto my-4&quot; role=&quot;img&quot; aria-label=&quot;UDP header layout (8 bytes)&quot;&amp;gt;
&amp;lt;g font-family=&quot;ui-monospace, monospace&quot; font-size=&quot;10&quot; font-weight=&quot;600&quot; fill=&quot;#6b7b9a&quot; text-anchor=&quot;middle&quot;&amp;gt;
&amp;lt;text x=&quot;80&quot;  y=&quot;16&quot;&amp;gt;0&amp;lt;/text&amp;gt;
&amp;lt;text x=&quot;340&quot; y=&quot;16&quot;&amp;gt;16&amp;lt;/text&amp;gt;
&amp;lt;text x=&quot;620&quot; y=&quot;16&quot;&amp;gt;31 bits&amp;lt;/text&amp;gt;
&amp;lt;/g&amp;gt;&lt;/p&gt;
&lt;p&gt;&amp;lt;g stroke=&quot;#3b6fd6&quot; stroke-width=&quot;1&quot; fill=&quot;#dbeafe&quot;&amp;gt;
&amp;lt;rect x=&quot;40&quot;  y=&quot;24&quot; width=&quot;280&quot; height=&quot;36&quot;/&amp;gt;
&amp;lt;rect x=&quot;320&quot; y=&quot;24&quot; width=&quot;280&quot; height=&quot;36&quot;/&amp;gt;
&amp;lt;/g&amp;gt;
&amp;lt;g stroke=&quot;#6b7b9a&quot; stroke-width=&quot;1&quot; fill=&quot;#f8fafc&quot;&amp;gt;
&amp;lt;rect x=&quot;40&quot;  y=&quot;60&quot; width=&quot;280&quot; height=&quot;36&quot;/&amp;gt;
&amp;lt;rect x=&quot;320&quot; y=&quot;60&quot; width=&quot;280&quot; height=&quot;36&quot;/&amp;gt;
&amp;lt;/g&amp;gt;&lt;/p&gt;
&lt;p&gt;&amp;lt;g font-family=&quot;Geist Variable, Inter, sans-serif&quot; font-size=&quot;11&quot; font-weight=&quot;500&quot; fill=&quot;#0f172a&quot; text-anchor=&quot;middle&quot;&amp;gt;
&amp;lt;text x=&quot;180&quot; y=&quot;44&quot;&amp;gt;Source port&amp;lt;/text&amp;gt;
&amp;lt;text x=&quot;460&quot; y=&quot;44&quot;&amp;gt;Destination port&amp;lt;/text&amp;gt;
&amp;lt;text x=&quot;180&quot; y=&quot;80&quot;&amp;gt;Length (header + data)&amp;lt;/text&amp;gt;
&amp;lt;text x=&quot;460&quot; y=&quot;80&quot;&amp;gt;Checksum (optional on IPv4)&amp;lt;/text&amp;gt;
&amp;lt;/g&amp;gt;
&amp;lt;g font-family=&quot;ui-monospace, monospace&quot; font-size=&quot;10&quot; fill=&quot;#6b7b9a&quot; text-anchor=&quot;middle&quot;&amp;gt;
&amp;lt;text x=&quot;180&quot; y=&quot;56&quot;&amp;gt;16&amp;lt;/text&amp;gt;
&amp;lt;text x=&quot;460&quot; y=&quot;56&quot;&amp;gt;16&amp;lt;/text&amp;gt;
&amp;lt;text x=&quot;180&quot; y=&quot;92&quot;&amp;gt;16&amp;lt;/text&amp;gt;
&amp;lt;text x=&quot;460&quot; y=&quot;92&quot;&amp;gt;16&amp;lt;/text&amp;gt;
&amp;lt;/g&amp;gt;
&amp;lt;/svg&amp;gt;
&amp;lt;figcaption class=&quot;text-center text-sm text-50&quot;&amp;gt;No seq number. No ack. No state. Each datagram is independent — if it matters to you, you handle it in the app.&amp;lt;/figcaption&amp;gt;
&amp;lt;/figure&amp;gt;&lt;/p&gt;
&lt;p&gt;What UDP &lt;em&gt;doesn&apos;t&lt;/em&gt; do: no handshake, no ordering, no retransmit, no flow or congestion control. What it &lt;em&gt;does&lt;/em&gt; do: stay out of your way. That makes it the substrate for DNS (usually 1 packet round-trip, no need for a connection), video/voice (loss is fine, reordering is worse than drop), game state (&quot;where is the tank &lt;em&gt;now&lt;/em&gt;&quot; is more useful than &quot;where was it 200 ms ago&quot;), and, since QUIC, modern HTTP itself.&lt;/p&gt;
&lt;h3&gt;TCP vs UDP, side by side&lt;/h3&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;TCP&lt;/th&gt;
&lt;th&gt;UDP&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Connection&lt;/td&gt;
&lt;td&gt;handshake before data&lt;/td&gt;
&lt;td&gt;none&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Header&lt;/td&gt;
&lt;td&gt;20 B minimum, up to 60 B&lt;/td&gt;
&lt;td&gt;8 B&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Ordering&lt;/td&gt;
&lt;td&gt;guaranteed&lt;/td&gt;
&lt;td&gt;app&apos;s problem&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Reliability&lt;/td&gt;
&lt;td&gt;guaranteed&lt;/td&gt;
&lt;td&gt;app&apos;s problem&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Flow control&lt;/td&gt;
&lt;td&gt;yes (&lt;code&gt;rwnd&lt;/code&gt;)&lt;/td&gt;
&lt;td&gt;no&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Congestion control&lt;/td&gt;
&lt;td&gt;yes (CUBIC / BBR / …)&lt;/td&gt;
&lt;td&gt;no (app must be well-behaved)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Latency floor&lt;/td&gt;
&lt;td&gt;1 RTT + slow start&lt;/td&gt;
&lt;td&gt;1 one-way trip&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Typical users&lt;/td&gt;
&lt;td&gt;HTTP/1.1, HTTP/2, gRPC, SSH, DB&lt;/td&gt;
&lt;td&gt;DNS, QUIC (HTTP/3), games, voice, video&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;h3&gt;Head-of-line blocking, the reason HTTP/3 exists&lt;/h3&gt;
&lt;p&gt;TCP&apos;s ordering guarantee has a dark side. If segment 5 is dropped, segments 6–10 that &lt;em&gt;did&lt;/em&gt; arrive must sit in the kernel&apos;s receive buffer until segment 5 is retransmitted and filled in. Everything above TCP — including &lt;em&gt;independent&lt;/em&gt; HTTP/2 streams — has to wait. This is &lt;strong&gt;head-of-line (HOL) blocking&lt;/strong&gt; at the transport layer.&lt;/p&gt;
&lt;p&gt;HTTP/3 fixes this by building on QUIC (which rides on UDP) and doing its own stream-level ordering: one lost packet stalls only &lt;em&gt;its&lt;/em&gt; stream, not all the concurrent streams on the same connection. More on this in §4a.&lt;/p&gt;
&lt;h3&gt;Decision rubric&lt;/h3&gt;
&lt;p&gt;&lt;strong&gt;Pick TCP when&lt;/strong&gt; the correctness of the byte stream matters more than the 1-RTT cost and you&apos;re OK waiting on retransmits:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;HTTP/1.1, HTTP/2, gRPC-over-HTTP/2&lt;/li&gt;
&lt;li&gt;SSH, SQL wire protocols, Kafka&lt;/li&gt;
&lt;li&gt;File transfer / replication&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Pick UDP when&lt;/strong&gt; you can tolerate loss or you need to own reordering yourself:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;DNS queries (1 packet, fits in MTU, retry at app level)&lt;/li&gt;
&lt;li&gt;QUIC / HTTP/3&lt;/li&gt;
&lt;li&gt;Real-time media (WebRTC, VoIP, game state)&lt;/li&gt;
&lt;li&gt;Multicast / broadcast (TCP is strictly 1-to-1)&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;Go snippets&lt;/h3&gt;
&lt;p&gt;&lt;strong&gt;TCP client with sane timeouts.&lt;/strong&gt; The default &lt;code&gt;net.Dial&lt;/code&gt; has no timeout and will happily hang forever.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;package main

import (
    &quot;net&quot;
    &quot;time&quot;
)

func openConn() (net.Conn, error) {
    d := net.Dialer{
        Timeout:   3 * time.Second,   // connect timeout
        KeepAlive: 30 * time.Second,  // TCP keep-alive probes
    }
    conn, err := d.Dial(&quot;tcp&quot;, &quot;example.com:443&quot;)
    if err != nil {
        return nil, err
    }
    // Deadlines for read/write, refreshed per operation.
    _ = conn.SetDeadline(time.Now().Add(10 * time.Second))
    return conn, nil
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;strong&gt;UDP echo server.&lt;/strong&gt; The read loop is packet-oriented, not stream-oriented — one &lt;code&gt;ReadFromUDP&lt;/code&gt; returns exactly one datagram. Framing is the app&apos;s job.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;package main

import (
    &quot;log&quot;
    &quot;net&quot;
)

func main() {
    addr, _ := net.ResolveUDPAddr(&quot;udp&quot;, &quot;:9000&quot;)
    conn, err := net.ListenUDP(&quot;udp&quot;, addr)
    if err != nil {
        log.Fatal(err)
    }
    defer conn.Close()

    buf := make([]byte, 2048) // 1 datagram at a time
    for {
        n, peer, err := conn.ReadFromUDP(buf)
        if err != nil {
            log.Printf(&quot;read: %v&quot;, err)
            continue
        }
        // No framing guarantees: &apos;n&apos; bytes are one logical message.
        if _, err := conn.WriteToUDP(buf[:n], peer); err != nil {
            log.Printf(&quot;write to %s: %v&quot;, peer, err)
        }
    }
}
&lt;/code&gt;&lt;/pre&gt;
&lt;h3&gt;Gotchas interviewers love&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Nagle&apos;s algorithm.&lt;/strong&gt; TCP coalesces small writes to reduce packet overhead, which interacts badly with &lt;code&gt;TCP_ACK&lt;/code&gt;-delayed receivers. Latency-sensitive apps (interactive SSH, game clients) set &lt;code&gt;TCP_NODELAY&lt;/code&gt; to disable it.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;&lt;code&gt;TIME_WAIT&lt;/code&gt; exhaustion.&lt;/strong&gt; A high-churn outbound client (think: aggressive HTTP client with no keep-alive) can run out of ephemeral source ports. Reuse connections or bump &lt;code&gt;ip_local_port_range&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;MTU / PMTUD blackhole.&lt;/strong&gt; If a middlebox drops ICMP &quot;fragmentation needed&quot; messages, the sender never learns to shrink its packets and the connection stalls on any segment larger than the path MTU. Common cause of &quot;works on my laptop, times out on corp VPN.&quot;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;UDP amplification DDoS.&lt;/strong&gt; A spoofed 50-byte DNS query can return a 3,000-byte response. Open resolvers and misconfigured NTP / memcached servers are classic reflectors. If you build a UDP service, cap the reply size and rate-limit per source.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;4. Application Layer&lt;/h2&gt;
&lt;p&gt;Above the transport layer, protocols express &lt;em&gt;what&lt;/em&gt; the conversation is about. They&apos;re organized by purpose, not by position in a stack: HTTP is a request/response protocol, SMTP is a mail protocol, gRPC is an RPC system. This section is split into three groups of the ones an interviewer will actually probe — &lt;strong&gt;4a. HTTP family (with TLS)&lt;/strong&gt;, &lt;strong&gt;4b. API styles: REST vs GraphQL vs gRPC&lt;/strong&gt;, and &lt;strong&gt;4c. Real-time: SSE vs WebSocket vs WebRTC&lt;/strong&gt;.&lt;/p&gt;
&lt;h3&gt;4a. HTTP / HTTPS / HTTP/2 / HTTP/3&lt;/h3&gt;
&lt;p&gt;HTTP is the protocol of the web — a stateless, text-based request/response protocol on TCP port 80 (or 443 with TLS). &quot;Stateless&quot; is the operative word: every request is self-contained from the server&apos;s point of view. State (sessions, auth) lives in cookies, headers, or the database, not the protocol.&lt;/p&gt;
&lt;h4&gt;Anatomy of a request&lt;/h4&gt;
&lt;pre&gt;&lt;code&gt;GET /users/42?include=orders HTTP/1.1
Host: api.example.com
Authorization: Bearer eyJhbGciOi...
Accept: application/json
Accept-Encoding: gzip, br
If-None-Match: &quot;a3f7b9&quot;
Connection: keep-alive
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Each line is &lt;code&gt;Header: Value&lt;/code&gt;. Blank line separates headers from body. The server responds with a status line, headers, and body:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;HTTP/1.1 200 OK
Content-Type: application/json
Content-Length: 847
ETag: &quot;a3f7b9&quot;
Cache-Control: private, max-age=60
Connection: keep-alive

{&quot;id&quot;:42,&quot;name&quot;:&quot;Ada&quot;,&quot;orders&quot;:[...]}
&lt;/code&gt;&lt;/pre&gt;
&lt;h4&gt;Status codes you must know cold&lt;/h4&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Range&lt;/th&gt;
&lt;th&gt;Meaning&lt;/th&gt;
&lt;th&gt;Must-know examples&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;1xx&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;informational&lt;/td&gt;
&lt;td&gt;101 Switching Protocols (WebSocket upgrade)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;2xx&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;success&lt;/td&gt;
&lt;td&gt;200 OK · 201 Created · 204 No Content · 206 Partial Content&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;3xx&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;redirect / cache&lt;/td&gt;
&lt;td&gt;301 Moved Permanently · 302 Found · 304 Not Modified · 307/308 (preserve method)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;4xx&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;client error&lt;/td&gt;
&lt;td&gt;400 Bad Request · 401 Unauthorized · 403 Forbidden · 404 Not Found · 409 Conflict · 422 Unprocessable · 429 Too Many Requests&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;5xx&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;server error&lt;/td&gt;
&lt;td&gt;500 Internal · 502 Bad Gateway · 503 Unavailable · 504 Gateway Timeout&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;&lt;strong&gt;Easy-to-mix-up pair&lt;/strong&gt;: &lt;em&gt;401 means &quot;I don&apos;t know who you are&quot; — re-auth and retry. 403 means &quot;I know who you are, and you can&apos;t.&quot;&lt;/em&gt;&lt;/p&gt;
&lt;h4&gt;Idempotency matters&lt;/h4&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Method&lt;/th&gt;
&lt;th&gt;Idempotent&lt;/th&gt;
&lt;th&gt;Safe (read-only)&lt;/th&gt;
&lt;th&gt;Body&lt;/th&gt;
&lt;th&gt;Cacheable&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;GET&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;no&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;HEAD&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;no&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;OPTIONS&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;no&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;PUT&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;td&gt;yes&lt;/td&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;DELETE&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;POST&lt;/td&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;td&gt;yes&lt;/td&gt;
&lt;td&gt;rarely&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;PATCH&lt;/td&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;td&gt;yes&lt;/td&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;The interview gotcha is &lt;strong&gt;POST is not idempotent&lt;/strong&gt; — if the client retries a POST because it didn&apos;t see the response, it can create the same order twice. The canonical fix is an &lt;strong&gt;idempotency key&lt;/strong&gt;: a client-generated unique string the server dedupes on. Stripe, AWS, and every payment-adjacent API does this.&lt;/p&gt;
&lt;h4&gt;Caching, briefly&lt;/h4&gt;
&lt;p&gt;HTTP caching is coordinated between server, browser, and intermediaries (CDN, proxies). The knobs:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;Cache-Control: public, max-age=3600&lt;/code&gt; — how long, where.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;ETag: &quot;...&quot;&lt;/code&gt; / &lt;code&gt;Last-Modified: ...&lt;/code&gt; — identity of the current version. Client sends &lt;code&gt;If-None-Match&lt;/code&gt; / &lt;code&gt;If-Modified-Since&lt;/code&gt; on revalidation; server replies &lt;code&gt;304 Not Modified&lt;/code&gt; and no body.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;Vary: Accept-Encoding, Authorization&lt;/code&gt; — split cache entries by these request headers.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;CDNs use the same primitives — §6 covers the edge story.&lt;/p&gt;
&lt;h3&gt;HTTPS: HTTP + TLS&lt;/h3&gt;
&lt;p&gt;HTTPS is HTTP encrypted with TLS. The TLS handshake runs once when the connection is established, producing symmetric keys for the rest of the connection.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;sequenceDiagram
    autonumber
    participant C as 🔐 Client
    participant S as 🌐 Server

    rect rgb(219, 234, 254)
    Note over C,S: TLS 1.3 — fresh handshake (1 RTT)
    C-&amp;gt;&amp;gt;+S: ClientHello&amp;lt;br/&amp;gt;(supported ciphers · key share · SNI)
    S--&amp;gt;&amp;gt;-C: ServerHello · cert · Finished&amp;lt;br/&amp;gt;(picks cipher · sends its key share)
    Note over C,S: 🔑 both sides derive the session key
    C-&amp;gt;&amp;gt;S: Finished (encrypted)
    C-&amp;gt;&amp;gt;+S: HTTP GET / (encrypted)
    S--&amp;gt;&amp;gt;-C: HTTP 200 OK (encrypted)
    end

    rect rgb(220, 252, 231)
    Note over C,S: TLS 1.3 resumption — 0 RTT (early data)
    C-&amp;gt;&amp;gt;+S: ClientHello + &amp;lt;b&amp;gt;early data&amp;lt;/b&amp;gt; (encrypted with PSK)
    S--&amp;gt;&amp;gt;-C: ServerHello + response (no handshake wait)
    Note right of S: ⚠️ early data is &amp;lt;br/&amp;gt;replayable — don&apos;t use for&amp;lt;br/&amp;gt;state-changing requests
    end
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;strong&gt;What TLS gives you&lt;/strong&gt; — three things interviewers will ask:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Confidentiality&lt;/strong&gt; — AEAD ciphers (AES-GCM, ChaCha20-Poly1305) encrypt the payload.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Integrity&lt;/strong&gt; — the same AEAD MAC detects tampering.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Authentication&lt;/strong&gt; — the server&apos;s cert, signed by a CA the client trusts, proves you&apos;re talking to &lt;code&gt;example.com&lt;/code&gt;, not an attacker.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;&lt;strong&gt;Common interview probes:&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;em&gt;Why is TLS 1.3 faster than 1.2?&lt;/em&gt; One RTT vs two. TLS 1.2 separated key-exchange from Finished; TLS 1.3 combines them and removes obsolete ciphers.&lt;/li&gt;
&lt;li&gt;&lt;em&gt;What is SNI?&lt;/em&gt; Server Name Indication — the hostname the client is asking for, sent &lt;em&gt;unencrypted&lt;/em&gt; in ClientHello. Lets one IP host multiple certs. Encrypted Client Hello (ECH) fixes the leak.&lt;/li&gt;
&lt;li&gt;&lt;em&gt;0-RTT trade-off?&lt;/em&gt; On resumed sessions, the client can send data in its first flight. Great for latency; the data is &lt;em&gt;replayable&lt;/em&gt; if the attacker captures and retransmits it. Don&apos;t use 0-RTT for state-changing requests.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;HTTP/2: binary, multiplexed, one connection&lt;/h3&gt;
&lt;p&gt;HTTP/1.1 opens one TCP connection per in-flight request (browsers cap ~6 per origin). HTTP/2 keeps a single connection and &lt;strong&gt;multiplexes&lt;/strong&gt; many streams over it.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;graph LR
    subgraph H11 [&quot;🐢 HTTP/1.1 — many TCP connections, head-of-line serial&quot;]
        direction TB
        C1[&quot;TCP #1&amp;lt;br/&amp;gt;GET /index.html&quot;]
        C2[&quot;TCP #2&amp;lt;br/&amp;gt;GET /app.js&quot;]
        C3[&quot;TCP #3&amp;lt;br/&amp;gt;GET /style.css&quot;]
        C4[&quot;TCP #4&amp;lt;br/&amp;gt;GET /logo.png&quot;]
    end

    subgraph H2 [&quot;🚀 HTTP/2 — one connection, multiplexed streams&quot;]
        direction TB
        T([&quot;1 TCP + TLS&quot;])
        T --&amp;gt; S1[&quot;stream 1&amp;lt;br/&amp;gt;/index.html&quot;]
        T --&amp;gt; S2[&quot;stream 3&amp;lt;br/&amp;gt;/app.js&quot;]
        T --&amp;gt; S3[&quot;stream 5&amp;lt;br/&amp;gt;/style.css&quot;]
        T --&amp;gt; S4[&quot;stream 7&amp;lt;br/&amp;gt;/logo.png&quot;]
    end

    classDef old fill:#fee2e2,stroke:#dc2626,stroke-width:1.5px,color:#7f1d1d;
    classDef new fill:#dcfce7,stroke:#16a34a,stroke-width:1.5px,color:#14532d;
    classDef hub fill:#dbeafe,stroke:#3b6fd6,stroke-width:2px,color:#0f172a;
    class C1,C2,C3,C4 old
    class S1,S2,S3,S4 new
    class T hub
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Key improvements over 1.1:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Binary framing.&lt;/strong&gt; Every message is split into &lt;code&gt;DATA&lt;/code&gt; / &lt;code&gt;HEADERS&lt;/code&gt; / &lt;code&gt;SETTINGS&lt;/code&gt; frames. Cheap to parse, no ambiguity.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Multiplexing.&lt;/strong&gt; Many concurrent streams on one connection. No head-of-line blocking at the &lt;em&gt;HTTP&lt;/em&gt; layer.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;HPACK header compression.&lt;/strong&gt; Redundant headers (cookies, UA, &lt;code&gt;Host&lt;/code&gt;) are table-indexed instead of resent. Huge win for short requests.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Server push.&lt;/strong&gt; Server can pre-send assets it knows the client will need. (Deprecated by most browsers — misused more often than helpful.)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Stream priorities.&lt;/strong&gt; Clients can weight streams; used to deliver CSS/JS before images.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;The catch.&lt;/strong&gt; HTTP/2 multiplexing lives &lt;em&gt;above&lt;/em&gt; TCP. If one packet is lost, TCP stalls delivery of &lt;em&gt;all&lt;/em&gt; streams until retransmission arrives — HOL blocking at the transport layer, exactly the problem we flagged in §3.&lt;/p&gt;
&lt;h3&gt;HTTP/3: HTTP over QUIC over UDP&lt;/h3&gt;
&lt;p&gt;HTTP/3 replaces TCP with &lt;strong&gt;QUIC&lt;/strong&gt;, a transport protocol built on UDP that combines &quot;TCP semantics + TLS 1.3&quot; into one unified handshake with independent streams:&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;HTTP/1.1&lt;/th&gt;
&lt;th&gt;HTTP/2&lt;/th&gt;
&lt;th&gt;HTTP/3&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Transport&lt;/td&gt;
&lt;td&gt;TCP&lt;/td&gt;
&lt;td&gt;TCP&lt;/td&gt;
&lt;td&gt;QUIC (UDP)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Security&lt;/td&gt;
&lt;td&gt;optional TLS&lt;/td&gt;
&lt;td&gt;TLS mandatory (practice)&lt;/td&gt;
&lt;td&gt;TLS 1.3 mandatory, integrated&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Framing&lt;/td&gt;
&lt;td&gt;text&lt;/td&gt;
&lt;td&gt;binary frames&lt;/td&gt;
&lt;td&gt;binary frames&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Multiplexing&lt;/td&gt;
&lt;td&gt;no (multiple TCP)&lt;/td&gt;
&lt;td&gt;yes (1 TCP)&lt;/td&gt;
&lt;td&gt;yes (QUIC streams)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Connection open&lt;/td&gt;
&lt;td&gt;TCP + TLS = 2-3 RTT&lt;/td&gt;
&lt;td&gt;TCP + TLS = 2-3 RTT&lt;/td&gt;
&lt;td&gt;1 RTT (0 RTT on resumption)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Head-of-line blocking&lt;/td&gt;
&lt;td&gt;yes&lt;/td&gt;
&lt;td&gt;yes, at TCP layer&lt;/td&gt;
&lt;td&gt;no — per-stream loss&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Connection migration&lt;/td&gt;
&lt;td&gt;no (IP change breaks it)&lt;/td&gt;
&lt;td&gt;no&lt;/td&gt;
&lt;td&gt;yes (connection ID)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Deployed by&lt;/td&gt;
&lt;td&gt;Everything&lt;/td&gt;
&lt;td&gt;~70% of web&lt;/td&gt;
&lt;td&gt;CDNs + big sites, growing&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;&lt;strong&gt;Connection migration&lt;/strong&gt; is the underrated killer feature: QUIC identifies a connection by an ID in the header, not by the 4-tuple. Your phone switches from Wi-Fi to 5G, the IP changes, TCP would reset — QUIC just keeps going.&lt;/p&gt;
&lt;h3&gt;Go &lt;code&gt;http.Client&lt;/code&gt;: the timeouts you must set&lt;/h3&gt;
&lt;p&gt;The zero-value &lt;code&gt;http.Client{}&lt;/code&gt; has &lt;em&gt;no&lt;/em&gt; timeout. A single slow server can hang your entire service. Always configure:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;package main

import (
    &quot;context&quot;
    &quot;net&quot;
    &quot;net/http&quot;
    &quot;time&quot;
)

// prodClient is a reusable, properly-bounded HTTP client.
// One per process — it pools connections internally.
var prodClient = &amp;amp;http.Client{
    Timeout: 10 * time.Second, // total request budget, covers retries-internal
    Transport: &amp;amp;http.Transport{
        DialContext: (&amp;amp;net.Dialer{
            Timeout:   3 * time.Second, // TCP connect
            KeepAlive: 30 * time.Second,
        }).DialContext,
        TLSHandshakeTimeout:   3 * time.Second,
        ResponseHeaderTimeout: 5 * time.Second,
        ExpectContinueTimeout: 1 * time.Second,
        IdleConnTimeout:       90 * time.Second,
        MaxIdleConns:          100,
        MaxIdleConnsPerHost:   10, // bump for high-throughput clients
        ForceAttemptHTTP2:     true,
    },
}

func fetch(ctx context.Context, url string) (*http.Response, error) {
    // Prefer request-level context over client Timeout when the deadline
    // must propagate across service boundaries.
    req, err := http.NewRequestWithContext(ctx, http.MethodGet, url, nil)
    if err != nil {
        return nil, err
    }
    req.Header.Set(&quot;Accept&quot;, &quot;application/json&quot;)
    req.Header.Set(&quot;User-Agent&quot;, &quot;myservice/1.0&quot;)
    return prodClient.Do(req)
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;strong&gt;Three timeouts in this snippet and why each exists:&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;&lt;code&gt;Dialer.Timeout&lt;/code&gt;&lt;/strong&gt; — how long TCP connect can take. Defends against unreachable hosts.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;&lt;code&gt;TLSHandshakeTimeout&lt;/code&gt;&lt;/strong&gt; — how long TLS can take &lt;em&gt;after&lt;/em&gt; connect.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;&lt;code&gt;ResponseHeaderTimeout&lt;/code&gt;&lt;/strong&gt; — how long the server can take to send status + headers. A slow backend blocking here looks like a hung request — this bounds it without cutting off a legitimately large streaming body.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Bonus: request-level deadlines.&lt;/strong&gt; Prefer &lt;code&gt;ctx, cancel := context.WithTimeout(parent, 800*time.Millisecond)&lt;/code&gt; at the call site over mutating the client — the deadline then propagates cleanly through gRPC/HTTP/database layers downstream.&lt;/p&gt;
&lt;h3&gt;Interview gotchas for §4a&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;&lt;code&gt;Connection: close&lt;/code&gt; vs keep-alive.&lt;/strong&gt; HTTP/1.0 closed by default. HTTP/1.1 keep-alive by default. Servers that emit &lt;code&gt;Connection: close&lt;/code&gt; on every response will cripple your client&apos;s connection pool.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Cookie scoping.&lt;/strong&gt; &lt;code&gt;Domain=example.com&lt;/code&gt; includes subdomains. &lt;code&gt;Secure&lt;/code&gt; restricts to HTTPS. &lt;code&gt;HttpOnly&lt;/code&gt; hides from JS. &lt;code&gt;SameSite=Lax&lt;/code&gt; is the sane default to block CSRF.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Redirect traps.&lt;/strong&gt; &lt;code&gt;301&lt;/code&gt; is cached aggressively. If you deploy &lt;code&gt;301 /old → /new&lt;/code&gt; and later change your mind, clients may never retry. Use &lt;code&gt;302&lt;/code&gt; or &lt;code&gt;307&lt;/code&gt; during rollouts.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;&lt;code&gt;Content-Length&lt;/code&gt; vs &lt;code&gt;Transfer-Encoding: chunked&lt;/code&gt;.&lt;/strong&gt; A response has exactly one. If a reverse proxy (nginx, HAProxy) buffers a chunked response to add &lt;code&gt;Content-Length&lt;/code&gt;, latency on streaming endpoints goes up. Turn buffering off at the proxy for SSE/gRPC-web.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;HTTP/2 with self-signed certs.&lt;/strong&gt; Go&apos;s &lt;code&gt;http.Transport&lt;/code&gt; disables HTTP/2 if you set &lt;code&gt;TLSClientConfig.InsecureSkipVerify = true&lt;/code&gt; without also setting &lt;code&gt;NextProtos = []string{&quot;h2&quot;, &quot;http/1.1&quot;}&lt;/code&gt;. Debugging headache at 2am.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;4b. REST vs GraphQL vs gRPC&lt;/h3&gt;
&lt;p&gt;Three API styles, three different philosophies. The interviewer usually isn&apos;t asking &quot;which is best&quot; — they want to see you reason about the trade-off for the problem at hand.&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;REST&lt;/strong&gt;&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;GraphQL&lt;/strong&gt;&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;gRPC&lt;/strong&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Transport&lt;/td&gt;
&lt;td&gt;HTTP/1.1 or 2&lt;/td&gt;
&lt;td&gt;HTTP POST (usually &lt;code&gt;/graphql&lt;/code&gt;)&lt;/td&gt;
&lt;td&gt;HTTP/2&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Serialization&lt;/td&gt;
&lt;td&gt;JSON (usually)&lt;/td&gt;
&lt;td&gt;JSON&lt;/td&gt;
&lt;td&gt;Protobuf (binary)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Schema&lt;/td&gt;
&lt;td&gt;OpenAPI (optional)&lt;/td&gt;
&lt;td&gt;SDL (required)&lt;/td&gt;
&lt;td&gt;&lt;code&gt;.proto&lt;/code&gt; (required)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Endpoint shape&lt;/td&gt;
&lt;td&gt;many resource URLs&lt;/td&gt;
&lt;td&gt;single endpoint&lt;/td&gt;
&lt;td&gt;RPC methods per service&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Who picks the fields?&lt;/td&gt;
&lt;td&gt;server&lt;/td&gt;
&lt;td&gt;client&lt;/td&gt;
&lt;td&gt;server&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Over-/under-fetching&lt;/td&gt;
&lt;td&gt;easy to hit&lt;/td&gt;
&lt;td&gt;solved&lt;/td&gt;
&lt;td&gt;solved per method&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Streaming&lt;/td&gt;
&lt;td&gt;chunked / SSE&lt;/td&gt;
&lt;td&gt;subscriptions (via WS)&lt;/td&gt;
&lt;td&gt;native: server / client / bidi&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Browser-friendly&lt;/td&gt;
&lt;td&gt;yes&lt;/td&gt;
&lt;td&gt;yes&lt;/td&gt;
&lt;td&gt;no (needs gRPC-Web or Connect)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Tooling&lt;/td&gt;
&lt;td&gt;curl, Postman, every lang&lt;/td&gt;
&lt;td&gt;any GraphQL client&lt;/td&gt;
&lt;td&gt;protoc + code generation&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Caching&lt;/td&gt;
&lt;td&gt;HTTP cache works out of the box&lt;/td&gt;
&lt;td&gt;hard; client-side libs (Relay, Apollo)&lt;/td&gt;
&lt;td&gt;none built-in; app layer&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Best fit&lt;/td&gt;
&lt;td&gt;public APIs, CRUD, docs-as-product&lt;/td&gt;
&lt;td&gt;many clients, varied views on same data&lt;/td&gt;
&lt;td&gt;internal microservices, high-throughput&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;h4&gt;The over-fetching / under-fetching problem&lt;/h4&gt;
&lt;p&gt;The single clearest argument for GraphQL. Imagine a mobile screen that needs user name, last order ID, and unread notification count.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;flowchart TB
    Need([&quot;📱 Mobile needs: &amp;lt;b&amp;gt;name&amp;lt;/b&amp;gt; · &amp;lt;b&amp;gt;lastOrderId&amp;lt;/b&amp;gt; · &amp;lt;b&amp;gt;unreadCount&amp;lt;/b&amp;gt;&quot;])

    subgraph REST [&quot;🔁 REST — 3 round trips&quot;]
        direction TB
        R1[&quot;GET /users/42&amp;lt;br/&amp;gt;&amp;lt;i&amp;gt;returns 20 fields, keeps 1&amp;lt;/i&amp;gt;&quot;]
        R2[&quot;GET /users/42/orders?limit=1&amp;lt;br/&amp;gt;&amp;lt;i&amp;gt;returns full Order, keeps 1 id&amp;lt;/i&amp;gt;&quot;]
        R3[&quot;GET /users/42/notifications/unread-count&quot;]
        R1 --&amp;gt; R2 --&amp;gt; R3
    end

    subgraph GQL [&quot;⚡ GraphQL — 1 round trip, exact shape&quot;]
        direction TB
        G1[&quot;POST /graphql&amp;lt;br/&amp;gt;user(id: 42) &amp;amp;#123; name, lastOrder &amp;amp;#123; id &amp;amp;#125;, unreadCount &amp;amp;#125;&quot;]
    end

    subgraph GRPC [&quot;🛰️ gRPC — 1 round trip, server-defined shape&quot;]
        direction TB
        P1[&quot;UserService.GetProfile(id=42)&amp;lt;br/&amp;gt;&amp;lt;i&amp;gt;server-defined aggregate method&amp;lt;/i&amp;gt;&quot;]
    end

    Need --&amp;gt; REST
    Need --&amp;gt; GQL
    Need --&amp;gt; GRPC

    classDef need fill:#fef3c7,stroke:#d97706,stroke-width:2px,color:#78350f;
    classDef rest fill:#fee2e2,stroke:#dc2626,stroke-width:1.5px,color:#7f1d1d;
    classDef gql fill:#dcfce7,stroke:#16a34a,stroke-width:1.5px,color:#14532d;
    classDef grpc fill:#e0e7ff,stroke:#4f46e5,stroke-width:1.5px,color:#312e81;
    class Need need
    class R1,R2,R3 rest
    class G1 gql
    class P1 grpc
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;REST gets you the JSON but wastes bandwidth (orders contains 20 fields you don&apos;t need) or demands many round-trips. GraphQL lets the client ask for exactly what it wants. gRPC solves it too, but by defining a &lt;em&gt;server-side&lt;/em&gt; aggregate method — if the mobile and web teams want different shapes you end up with &lt;code&gt;GetProfileForWeb&lt;/code&gt; and &lt;code&gt;GetProfileForMobile&lt;/code&gt;, which is fine for a handful of clients but doesn&apos;t scale like GraphQL does.&lt;/p&gt;
&lt;h4&gt;REST, done well&lt;/h4&gt;
&lt;p&gt;The mental model is &lt;strong&gt;resources (nouns) acted on by HTTP methods (verbs)&lt;/strong&gt;. URLs are hierarchical; methods are the verbs; status codes are the outcome.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;GET    /v1/users                   list
GET    /v1/users/42                read
POST   /v1/users                   create
PUT    /v1/users/42                replace (idempotent)
PATCH  /v1/users/42                partial update
DELETE /v1/users/42                delete
GET    /v1/users/42/orders         sub-resource
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Conventions that save you from bike-shed arguments:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Cursor pagination&lt;/strong&gt; — &lt;code&gt;?cursor=eyJpZCI6Mjc...&amp;amp;limit=50&lt;/code&gt;. Not offset/limit — that&apos;s O(N) on the DB.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Filter/sort&lt;/strong&gt; as query params — &lt;code&gt;?status=active&amp;amp;sort=-createdAt&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Versioning&lt;/strong&gt; — keep it in the path (&lt;code&gt;/v1/&lt;/code&gt;, &lt;code&gt;/v2/&lt;/code&gt;). Header-based versioning is clever and will bite you during debugging.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Errors&lt;/strong&gt; — RFC 7807 &lt;code&gt;application/problem+json&lt;/code&gt;: &lt;code&gt;{&quot;type&quot;:&quot;...&quot;, &quot;title&quot;:&quot;...&quot;, &quot;detail&quot;:&quot;...&quot;, &quot;instance&quot;:&quot;...&quot;}&lt;/code&gt;. Stop inventing shapes.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;HATEOAS&lt;/strong&gt; (hypermedia links in responses) is the theoretically pure form but is almost never shipped. If the interviewer asks, explain what it is and note most production APIs don&apos;t bother.&lt;/p&gt;
&lt;h4&gt;GraphQL: one endpoint, client-shaped responses&lt;/h4&gt;
&lt;p&gt;The server publishes a typed schema; the client sends a query describing the shape it wants; the runtime walks the query and calls &lt;strong&gt;resolvers&lt;/strong&gt; to fetch each field.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;type User {
  id: ID!
  name: String!
  email: String!
  orders(limit: Int = 10): [Order!]!
  unreadCount: Int!
}

type Order {
  id: ID!
  total: Money!
  items: [LineItem!]!
}

type Query {
  user(id: ID!): User
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Client sends:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;query Profile($id: ID!) {
  user(id: $id) {
    name
    orders(limit: 1) { id total { amount currency } }
    unreadCount
  }
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;strong&gt;The famous trap — N+1 queries.&lt;/strong&gt; The &lt;code&gt;orders&lt;/code&gt; resolver is called once per &lt;code&gt;User&lt;/code&gt;. If you&apos;re listing 50 users and blindly loop, that&apos;s 50 DB round-trips. The fix is a &lt;strong&gt;DataLoader&lt;/strong&gt; — batches + caches per-request:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;// pseudo-Go: inside an HTTP request-scoped loader
loader := dataloader.New(func(ctx context.Context, ids []int) []*Order {
    // one SQL: SELECT ... WHERE user_id = ANY($1)
    return db.OrdersByUserIDs(ctx, ids)
})
// Each resolver call just does: loader.Load(userID) → coalesced into 1 query
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;strong&gt;Other GraphQL-isms to have an answer for:&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Mutations&lt;/strong&gt; are a separate root type; they execute sequentially (not parallel like &lt;code&gt;Query&lt;/code&gt; fields).&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Subscriptions&lt;/strong&gt; push server → client; transport is usually WebSocket.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Persisted queries&lt;/strong&gt; — client registers queries at build time; at runtime it only sends the query ID. Saves bandwidth, forbids arbitrary queries, defuses the &quot;malicious client writes an expensive query&quot; attack.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Caching&lt;/strong&gt; is the hardest part. Apollo Client normalizes objects by &lt;code&gt;__typename + id&lt;/code&gt; client-side; server-side is usually cache-miss territory unless you&apos;re doing persisted queries + HTTP cache headers.&lt;/li&gt;
&lt;/ul&gt;
&lt;h4&gt;gRPC: typed RPCs over HTTP/2&lt;/h4&gt;
&lt;p&gt;You write a &lt;code&gt;.proto&lt;/code&gt;, the compiler generates typed client + server stubs in every language you use.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;syntax = &quot;proto3&quot;;
package user.v1;

service UserService {
  rpc GetProfile(GetProfileRequest) returns (UserProfile);
  rpc WatchProfile(GetProfileRequest) returns (stream UserProfile);   // server stream
  rpc ImportUsers(stream UserInput) returns (ImportSummary);          // client stream
  rpc Chat(stream ChatMessage) returns (stream ChatMessage);           // bidirectional
}

message GetProfileRequest { string user_id = 1; }

message UserProfile {
  string user_id = 1;
  string name    = 2;
  string email   = 3;
  int32  unread_count = 4;
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Go server, unary method:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;type userServer struct {
    pb.UnimplementedUserServiceServer
    db *sql.DB
}

func (s *userServer) GetProfile(
    ctx context.Context, req *pb.GetProfileRequest,
) (*pb.UserProfile, error) {
    // context carries the client&apos;s deadline + cancellation + metadata
    var p pb.UserProfile
    err := s.db.QueryRowContext(ctx,
        `SELECT user_id, name, email, unread_count FROM users WHERE user_id=$1`,
        req.GetUserId(),
    ).Scan(&amp;amp;p.UserId, &amp;amp;p.Name, &amp;amp;p.Email, &amp;amp;p.UnreadCount)
    if err == sql.ErrNoRows {
        return nil, status.Errorf(codes.NotFound, &quot;user %s not found&quot;, req.GetUserId())
    }
    if err != nil {
        return nil, status.Errorf(codes.Internal, &quot;db: %v&quot;, err)
    }
    return &amp;amp;p, nil
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Go client call, with deadline:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;conn, err := grpc.NewClient(&quot;user-svc:50051&quot;,
    grpc.WithTransportCredentials(insecure.NewCredentials()))
if err != nil { return err }
defer conn.Close()

client := pb.NewUserServiceClient(conn)
ctx, cancel := context.WithTimeout(ctx, 300*time.Millisecond)
defer cancel()

profile, err := client.GetProfile(ctx, &amp;amp;pb.GetProfileRequest{UserId: &quot;42&quot;})
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;strong&gt;The four streaming modes&lt;/strong&gt; — interviewers love this picture:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;sequenceDiagram
    autonumber
    participant C as 🖥️ Client
    participant S as 🛰️ Server

    rect rgb(219, 234, 254)
    Note over C,S: 1️⃣ Unary — classic request / response
    C-&amp;gt;&amp;gt;+S: GetProfile(id=42)
    S--&amp;gt;&amp;gt;-C: UserProfile
    end

    rect rgb(220, 252, 231)
    Note over C,S: 2️⃣ Server streaming — one request, many responses
    C-&amp;gt;&amp;gt;+S: WatchProfile(id=42)
    S--&amp;gt;&amp;gt;C: UserProfile v1
    S--&amp;gt;&amp;gt;C: UserProfile v2
    S--&amp;gt;&amp;gt;-C: UserProfile v3 ...
    end

    rect rgb(254, 243, 199)
    Note over C,S: 3️⃣ Client streaming — many requests, one summary
    C-&amp;gt;&amp;gt;+S: ImportUsers (user_1)
    C-&amp;gt;&amp;gt;S: ImportUsers (user_2)
    C-&amp;gt;&amp;gt;S: ImportUsers (user_3)
    S--&amp;gt;&amp;gt;-C: ImportSummary (count=3)
    end

    rect rgb(237, 214, 255)
    Note over C,S: 4️⃣ Bidirectional — full-duplex, interleaved
    C-&amp;gt;&amp;gt;+S: Chat (hello)
    S--&amp;gt;&amp;gt;C: Chat (hi)
    C-&amp;gt;&amp;gt;S: Chat (how are you)
    S--&amp;gt;&amp;gt;-C: Chat (good)
    end
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;strong&gt;Why gRPC wins for internal microservices:&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Protobuf is small (1.5-5× smaller than JSON on the wire) and fast to marshal.&lt;/li&gt;
&lt;li&gt;HTTP/2 multiplexing + long-lived connections → low latency + good head-of-line story &lt;em&gt;within a service mesh&lt;/em&gt;.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;context.Context&lt;/code&gt; — deadlines and cancellations propagate across service hops out of the box.&lt;/li&gt;
&lt;li&gt;Status codes are a closed set (&lt;code&gt;codes.NotFound&lt;/code&gt;, &lt;code&gt;codes.DeadlineExceeded&lt;/code&gt;, …), not free-form strings.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Where gRPC hurts:&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Browsers can&apos;t speak gRPC natively&lt;/strong&gt; (needs trailers HTTP/2 makes awkward in the browser). Solutions: &lt;strong&gt;gRPC-Web&lt;/strong&gt; (proxy translates) or &lt;strong&gt;Connect&lt;/strong&gt; (gRPC-compatible, works over HTTP/1.1 too).&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Bigger learning curve&lt;/strong&gt; — protoc toolchain, codegen in every language, backward-compat discipline (&lt;code&gt;reserved&lt;/code&gt;, field numbers never reused).&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Observability&lt;/strong&gt; is trickier than REST (no URL pattern in logs; need OTel/tracing from day one).&lt;/li&gt;
&lt;/ul&gt;
&lt;h4&gt;Decision rubric&lt;/h4&gt;
&lt;p&gt;&lt;strong&gt;Pick REST when&lt;/strong&gt;:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The API is public or consumed by many external devs.&lt;/li&gt;
&lt;li&gt;Humans browse it (Postman, curl, docs).&lt;/li&gt;
&lt;li&gt;You want the HTTP cache to do real work (CDN, browser).&lt;/li&gt;
&lt;li&gt;CRUD on resources is most of what you&apos;re doing.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Pick GraphQL when&lt;/strong&gt;:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Many clients (web, iOS, Android) need different slices of the same underlying data graph.&lt;/li&gt;
&lt;li&gt;Backends-for-frontends would otherwise proliferate.&lt;/li&gt;
&lt;li&gt;The team can invest in schema review, DataLoader discipline, and persisted-query infra.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Pick gRPC when&lt;/strong&gt;:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Internal service-to-service traffic, especially polyglot teams (Go + Python + Java).&lt;/li&gt;
&lt;li&gt;Strong typing across languages is worth the toolchain cost.&lt;/li&gt;
&lt;li&gt;Streaming or low-latency RPC is a first-class need.&lt;/li&gt;
&lt;li&gt;You have a service mesh, tracing, and observability to absorb the operational tax.&lt;/li&gt;
&lt;/ul&gt;
&lt;h4&gt;Interview gotchas for §4b&lt;/h4&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;REST isn&apos;t an RFC.&lt;/strong&gt; There&apos;s no committee defining &quot;correct REST.&quot; Different teams mean different things. Lead with your own definition.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;GraphQL security surface.&lt;/strong&gt; A single expressive query can be a DDoS primitive — one field can resolve into 10,000 DB reads. Production deployments need &lt;strong&gt;query depth limits&lt;/strong&gt;, &lt;strong&gt;query cost analysis&lt;/strong&gt;, &lt;strong&gt;persisted queries&lt;/strong&gt;, and &lt;strong&gt;rate-limiting by user&lt;/strong&gt;, not by endpoint.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;gRPC deadline inheritance.&lt;/strong&gt; If a &lt;code&gt;GetProfile&lt;/code&gt; handler calls three downstream services and just passes along the same context, the &lt;em&gt;slowest&lt;/em&gt; of the three sees the full 300ms. Budgets out: you need to subtract expected work from each leg (or at least be intentional about it).&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Version drift in protobuf.&lt;/strong&gt; Once a &lt;code&gt;.proto&lt;/code&gt; is deployed, you cannot reuse a field number. &lt;code&gt;reserved 7, 9;&lt;/code&gt; prevents someone from re-assigning later. Forgetting this breaks wire compatibility silently.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;&quot;Why not just JSON-RPC?&quot;&lt;/strong&gt; — a valid interview probe. JSON-RPC is lighter-weight than gRPC but lacks streaming, codegen, and HTTP/2 flow-control. Fine for a small internal tool, not for a service mesh.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;4c. SSE vs WebSocket vs WebRTC&lt;/h3&gt;
&lt;p&gt;Three ways to do &quot;real-time.&quot; The interviewer wants to see you pick the simplest one that solves the problem — not the coolest.&lt;/p&gt;
&lt;h4&gt;Start with the decision tree&lt;/h4&gt;
&lt;pre&gt;&lt;code&gt;flowchart TD
    Q([&quot;Do you need &amp;lt;b&amp;gt;real-time&amp;lt;/b&amp;gt; updates?&quot;])
    D1{&quot;Direction?&quot;}
    D2{&quot;Payload type?&quot;}
    D3{&quot;Latency bound?&quot;}

    POLL([&quot;Long-polling / plain HTTP is fine ✅&quot;])
    SSE([&quot;&amp;lt;b&amp;gt;SSE&amp;lt;/b&amp;gt;&amp;lt;br/&amp;gt;server → client&amp;lt;br/&amp;gt;text only, auto-reconnect&quot;])
    WS([&quot;&amp;lt;b&amp;gt;WebSocket&amp;lt;/b&amp;gt;&amp;lt;br/&amp;gt;full-duplex&amp;lt;br/&amp;gt;text or binary&quot;])
    RTC([&quot;&amp;lt;b&amp;gt;WebRTC&amp;lt;/b&amp;gt;&amp;lt;br/&amp;gt;peer-to-peer&amp;lt;br/&amp;gt;sub-100ms media + data&quot;])

    Q --&amp;gt; |no, updates can wait seconds| POLL
    Q --&amp;gt; |yes| D1
    D1 --&amp;gt; |server → client only| SSE
    D1 --&amp;gt; |both directions| D2
    D2 --&amp;gt; |text / structured| WS
    D2 --&amp;gt; |audio / video / low-latency data| D3
    D3 --&amp;gt; |p2p · every ms matters| RTC
    D3 --&amp;gt; |OK with a server relay| WS

    classDef question fill:#fef3c7,stroke:#d97706,stroke-width:2px,color:#78350f;
    classDef answer fill:#dcfce7,stroke:#16a34a,stroke-width:1.5px,color:#14532d;
    classDef fallback fill:#f3f4f6,stroke:#6b7280,stroke-width:1.5px,color:#1f2937;
    class Q,D1,D2,D3 question
    class SSE,WS,RTC answer
    class POLL fallback
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;strong&gt;One-line rule of thumb&lt;/strong&gt;: &lt;strong&gt;SSE for feeds, WebSocket for chat, WebRTC for media.&lt;/strong&gt;&lt;/p&gt;
&lt;h3&gt;SSE — the boring answer that&apos;s often right&lt;/h3&gt;
&lt;p&gt;Server-Sent Events is just an HTTP response that never ends. The server holds the connection open and writes &lt;code&gt;data: ...\n\n&lt;/code&gt; chunks whenever it has something to say. The browser has a built-in &lt;code&gt;EventSource&lt;/code&gt; API that does reconnect + last-event-id for you.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;sequenceDiagram
    autonumber
    participant C as 🖥️ Browser
    participant S as 🌐 Server

    rect rgb(219, 234, 254)
    Note over C,S: ① Open the stream — one HTTP request, never-ending response
    C-&amp;gt;&amp;gt;+S: GET /stream · Accept: text/event-stream
    S--&amp;gt;&amp;gt;C: 200 OK · Content-Type: text/event-stream&amp;lt;br/&amp;gt;Cache-Control: no-store · Connection: keep-alive
    end

    rect rgb(220, 252, 231)
    Note over C,S: ② Server pushes events whenever it wants
    S--&amp;gt;&amp;gt;C: event: price · data: &amp;amp;#123;BTC: 67123&amp;amp;#125;
    S--&amp;gt;&amp;gt;C: event: price · data: &amp;amp;#123;BTC: 67140&amp;amp;#125;
    S--&amp;gt;&amp;gt;C: 💓 : heartbeat (comment line keeps proxies awake)
    S--&amp;gt;&amp;gt;-C: event: price · data: &amp;amp;#123;BTC: 67089&amp;amp;#125;
    end

    rect rgb(254, 226, 226)
    Note right of C: ⚠️ connection drops
    end

    rect rgb(254, 243, 199)
    Note over C,S: ③ EventSource auto-reconnects with Last-Event-ID
    C-&amp;gt;&amp;gt;S: GET /stream · Last-Event-ID: 42
    end
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;strong&gt;Why it&apos;s underrated:&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;It&apos;s just &lt;strong&gt;HTTP&lt;/strong&gt; — CDNs, proxies, auth cookies, browser DevTools, all already work.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;EventSource&lt;/code&gt; handles reconnect + backoff automatically.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;id:&lt;/code&gt; + &lt;code&gt;Last-Event-ID&lt;/code&gt; gives you exactly-once replay for free if the server indexes by ID.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Where it bites:&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Unidirectional&lt;/strong&gt; — for an upstream &quot;ack,&quot; use a second &lt;code&gt;POST&lt;/code&gt; endpoint. Awkward if you need true bi-di.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Text only&lt;/strong&gt; — send JSON, not binary. Encode binary if you must.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;HTTP/1.1 6-connection limit per origin.&lt;/strong&gt; If you open SSE on 7 tabs of your app, the 7th hangs. Fix: use HTTP/2 (same-origin streams are multiplexed).&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Proxy buffering.&lt;/strong&gt; Nginx / CDNs love to buffer responses. Disable per-route (&lt;code&gt;proxy_buffering off;&lt;/code&gt;, &lt;code&gt;X-Accel-Buffering: no&lt;/code&gt;) — otherwise clients see nothing until the server flushes 8 KB.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Go handler:&lt;/strong&gt;&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;func priceStream(w http.ResponseWriter, r *http.Request) {
    // Headers SSE spec requires.
    w.Header().Set(&quot;Content-Type&quot;, &quot;text/event-stream&quot;)
    w.Header().Set(&quot;Cache-Control&quot;, &quot;no-store&quot;)
    w.Header().Set(&quot;Connection&quot;, &quot;keep-alive&quot;)
    // Defeat proxy buffering — crucial for nginx, Cloudflare.
    w.Header().Set(&quot;X-Accel-Buffering&quot;, &quot;no&quot;)

    // The flusher lets us force chunks out immediately.
    flusher, ok := w.(http.Flusher)
    if !ok {
        http.Error(w, &quot;streaming unsupported&quot;, http.StatusInternalServerError)
        return
    }

    // Heartbeat every 15 s so intermediary idle timeouts don&apos;t close us.
    heartbeat := time.NewTicker(15 * time.Second)
    defer heartbeat.Stop()

    updates := subscribePrices(r.Context()) // chan PriceTick

    var id int
    for {
        select {
        case &amp;lt;-r.Context().Done():
            return // client disconnected
        case &amp;lt;-heartbeat.C:
            fmt.Fprint(w, &quot;: ping\n\n&quot;) // comment line = keep-alive
            flusher.Flush()
        case tick, ok := &amp;lt;-updates:
            if !ok {
                return
            }
            id++
            // &apos;id:&apos; makes it resumable via Last-Event-ID on reconnect.
            fmt.Fprintf(w, &quot;id: %d\nevent: price\ndata: %s\n\n&quot;,
                id, tick.JSON())
            flusher.Flush()
        }
    }
}
&lt;/code&gt;&lt;/pre&gt;
&lt;h3&gt;WebSocket — when you need full-duplex&lt;/h3&gt;
&lt;p&gt;WebSocket upgrades an HTTP connection into a long-lived, full-duplex, &lt;strong&gt;message-framed&lt;/strong&gt; TCP stream. After the upgrade, the two sides exchange binary or text frames in either direction with almost no per-message overhead.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;sequenceDiagram
    autonumber
    participant C as 🖥️ Client
    participant S as 🌐 Server

    rect rgb(219, 234, 254)
    Note over C,S: ① HTTP upgrade — the only HTTP round-trip in a WS session
    C-&amp;gt;&amp;gt;+S: GET /ws HTTP/1.1&amp;lt;br/&amp;gt;Upgrade: websocket&amp;lt;br/&amp;gt;Sec-WebSocket-Key: x3JJ…&amp;lt;br/&amp;gt;Sec-WebSocket-Version: 13
    S--&amp;gt;&amp;gt;-C: 🎉 101 Switching Protocols&amp;lt;br/&amp;gt;Upgrade: websocket&amp;lt;br/&amp;gt;Sec-WebSocket-Accept: hash(key + GUID)
    end

    rect rgb(220, 252, 231)
    Note over C,S: ② Full-duplex frames — either side sends any time
    C-&amp;gt;&amp;gt;S: TEXT · &amp;amp;#123;action: subscribe, room: 42&amp;amp;#125;
    S--&amp;gt;&amp;gt;C: TEXT · &amp;amp;#123;msg: welcome&amp;amp;#125;
    S--&amp;gt;&amp;gt;C: 🔢 BINARY · 0xCAFE…
    end

    rect rgb(254, 243, 199)
    Note over C,S: ③ Keep-alive control frames
    C-&amp;gt;&amp;gt;S: PING
    S--&amp;gt;&amp;gt;C: PONG
    end

    rect rgb(254, 226, 226)
    Note over C,S: ④ Graceful close
    C-&amp;gt;&amp;gt;S: CLOSE (1000 normal)
    S--&amp;gt;&amp;gt;C: CLOSE (ack)
    end
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;strong&gt;Highlights:&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;&lt;code&gt;101 Switching Protocols&lt;/code&gt;&lt;/strong&gt; is the magic status code — the server accepts the upgrade, TCP stays open, the protocol changes under it.&lt;/li&gt;
&lt;li&gt;Client frames are &lt;strong&gt;masked&lt;/strong&gt; (XOR with a per-frame key) to defeat cache-poisoning attacks on legacy proxies. Server frames are not.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;TEXT&lt;/strong&gt; frames must be valid UTF-8; &lt;strong&gt;BINARY&lt;/strong&gt; frames don&apos;t. Use BINARY for protobuf, msgpack, images.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;PING/PONG&lt;/strong&gt; frames are control-plane only. Use them; without keepalive, NAT timers + proxy idle timers (~60-120 s) will close the socket silently.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Go, with gorilla/ws&lt;/strong&gt; (still the de-facto library even after its archival; &lt;code&gt;gobwas/ws&lt;/code&gt; is the zero-alloc alternative):&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;var upgrader = websocket.Upgrader{
    ReadBufferSize:  1024,
    WriteBufferSize: 1024,
    CheckOrigin: func(r *http.Request) bool {
        // Enforce same-origin. WebSocket doesn&apos;t respect CORS —
        // you enforce origin policy yourself here.
        return r.Header.Get(&quot;Origin&quot;) == &quot;https://app.example.com&quot;
    },
}

func wsHandler(w http.ResponseWriter, r *http.Request) {
    conn, err := upgrader.Upgrade(w, r, nil)
    if err != nil {
        return // Upgrade already wrote the error response.
    }
    defer conn.Close()

    // Ping/pong: send pings every 30s, expect pongs within 60s.
    conn.SetReadDeadline(time.Now().Add(60 * time.Second))
    conn.SetPongHandler(func(string) error {
        conn.SetReadDeadline(time.Now().Add(60 * time.Second))
        return nil
    })

    go func() {
        t := time.NewTicker(30 * time.Second)
        defer t.Stop()
        for range t.C {
            if err := conn.WriteControl(
                websocket.PingMessage, nil,
                time.Now().Add(5*time.Second),
            ); err != nil {
                return
            }
        }
    }()

    for {
        msgType, data, err := conn.ReadMessage()
        if err != nil {
            return // client gone or deadline exceeded
        }
        // Echo server — replace with real routing.
        if err := conn.WriteMessage(msgType, data); err != nil {
            return
        }
    }
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;strong&gt;Gotchas:&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;No built-in auth.&lt;/strong&gt; The upgrade request &lt;em&gt;is&lt;/em&gt; HTTP, so do auth there (cookie / &lt;code&gt;Authorization&lt;/code&gt; header). Some browsers strip the &lt;code&gt;Authorization&lt;/code&gt; header on WS upgrades — use a cookie or a token-in-URL.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;&lt;code&gt;CheckOrigin&lt;/code&gt; is opt-in.&lt;/strong&gt; Forget it and you&apos;ve just built CSRF-over-WebSocket.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;No request / response semantics.&lt;/strong&gt; You send frames and hope. Implement a correlation-ID in your JSON envelope if you need req/resp on top.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Horizontal scaling.&lt;/strong&gt; WS sockets are sticky to one pod. When pod dies, users reconnect — and may land on a pod with no state. Fan-out via Redis pub/sub or a broker (NATS, Redis Streams, Kafka).&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;WebRTC — peer-to-peer, media-grade&lt;/h3&gt;
&lt;p&gt;WebRTC lets two browsers talk &lt;strong&gt;directly&lt;/strong&gt; (peer-to-peer), bypassing your server for the heavy media streams. You still need a server — the &lt;strong&gt;signaling&lt;/strong&gt; server — to help them find each other and exchange connection info.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;sequenceDiagram
    autonumber
    participant A as 👤 Peer A
    participant SIG as 📡 Signaling&amp;lt;br/&amp;gt;(your WS server)
    participant ST as 🛰️ STUN
    participant TN as 🔁 TURN
    participant B as 👤 Peer B

    rect rgb(219, 234, 254)
    Note over A,B: ① Signal (through your server — usually WebSocket)
    A-&amp;gt;&amp;gt;+SIG: offer (SDP)
    SIG-&amp;gt;&amp;gt;+B: offer (SDP)
    B-&amp;gt;&amp;gt;-SIG: answer (SDP)
    SIG-&amp;gt;&amp;gt;-A: answer (SDP)
    end

    rect rgb(254, 243, 199)
    Note over A,B: ② ICE — discover reachable addresses
    A-&amp;gt;&amp;gt;+ST: &quot;what&apos;s my public IP?&quot;
    ST--&amp;gt;&amp;gt;-A: 203.0.113.1:51000
    A-&amp;gt;&amp;gt;SIG: ICE candidate (host + reflexive)
    SIG-&amp;gt;&amp;gt;B: (forwarded)
    B-&amp;gt;&amp;gt;SIG: ICE candidate (its own)
    SIG-&amp;gt;&amp;gt;A: (forwarded)
    end

    rect rgb(220, 252, 231)
    Note over A,B: ③ Connectivity check — try direct first
    A--&amp;gt;&amp;gt;B: STUN binding
    B--&amp;gt;&amp;gt;A: STUN binding reply
    Note over A,B: ✅ direct path works → done
    end

    rect rgb(254, 226, 226)
    Note over A,B: ④ Fallback: symmetric NAT blocks direct → TURN relay
    A--&amp;gt;&amp;gt;TN: relay allocate
    TN--&amp;gt;&amp;gt;B: relayed packet
    end

    rect rgb(233, 213, 255)
    Note over A,B: ⑤ Media / data flow — end-to-end encrypted
    A--&amp;gt;&amp;gt;B: 🎥 video · 🎤 audio · 📂 data channel&amp;lt;br/&amp;gt;(DTLS-SRTP or SCTP over DTLS, all over UDP)
    B--&amp;gt;&amp;gt;A: 🎥 video · 🎤 audio · 📂 data channel
    end
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;strong&gt;Four things you must know to pass a WebRTC interview:&lt;/strong&gt;&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Signaling is your problem.&lt;/strong&gt; WebRTC doesn&apos;t dictate &lt;em&gt;how&lt;/em&gt; the SDP offers/answers get across. People use WebSocket, SSE, long-poll, whatever. Pick one from the earlier part of this section.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;NAT traversal&lt;/strong&gt; is why this is hard. Most peers are behind NAT (§2). &lt;strong&gt;STUN&lt;/strong&gt; tells a peer its public IP:port. &lt;strong&gt;TURN&lt;/strong&gt; relays traffic when STUN can&apos;t produce a workable path (symmetric NAT, strict firewalls). Budget for ~10-20 % of calls needing TURN — and TURN bandwidth is your bill.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;ICE&lt;/strong&gt; (Interactive Connectivity Establishment) is the algorithm that collects and prioritizes candidate addresses (host, server-reflexive, relayed), pings each pair, and picks the best one that works.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Two flavors of traffic:&lt;/strong&gt; media streams use &lt;strong&gt;DTLS-SRTP&lt;/strong&gt; (encrypted RTP over UDP). Non-media data uses the &lt;strong&gt;DataChannel&lt;/strong&gt; API, which is SCTP over DTLS over UDP. Both end-to-end encrypted.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;&lt;strong&gt;When to reach for it (and when not to):&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;✅ Video/audio calls, screen share, live &quot;remote desktop.&quot;&lt;/li&gt;
&lt;li&gt;✅ Sub-50 ms data — multiplayer games, collaborative tools where every ms shows.&lt;/li&gt;
&lt;li&gt;❌ Chat. A WebSocket through your server is simpler, cheaper, and easier to moderate.&lt;/li&gt;
&lt;li&gt;❌ Anything you need to log / record server-side. P2P means you&apos;re not in the path.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;Comparison, side by side&lt;/h3&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;SSE&lt;/strong&gt;&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;WebSocket&lt;/strong&gt;&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;WebRTC&lt;/strong&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Direction&lt;/td&gt;
&lt;td&gt;server → client&lt;/td&gt;
&lt;td&gt;bi-directional&lt;/td&gt;
&lt;td&gt;p2p (both)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Transport&lt;/td&gt;
&lt;td&gt;HTTP text stream&lt;/td&gt;
&lt;td&gt;TCP (post upgrade)&lt;/td&gt;
&lt;td&gt;UDP (SCTP / SRTP)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Auto-reconnect&lt;/td&gt;
&lt;td&gt;✅ built-in&lt;/td&gt;
&lt;td&gt;❌ DIY&lt;/td&gt;
&lt;td&gt;❌ renegotiate&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Binary&lt;/td&gt;
&lt;td&gt;❌ (text only)&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Auth&lt;/td&gt;
&lt;td&gt;cookies / headers&lt;/td&gt;
&lt;td&gt;same as HTTP&lt;/td&gt;
&lt;td&gt;out of band (signaling)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Works through strict proxies&lt;/td&gt;
&lt;td&gt;✅ (it&apos;s HTTP)&lt;/td&gt;
&lt;td&gt;mostly&lt;/td&gt;
&lt;td&gt;often needs TURN&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Infra complexity&lt;/td&gt;
&lt;td&gt;lowest&lt;/td&gt;
&lt;td&gt;medium&lt;/td&gt;
&lt;td&gt;highest&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Sample use cases&lt;/td&gt;
&lt;td&gt;stock tickers, log tails, AI token stream&lt;/td&gt;
&lt;td&gt;chat, dashboards, collaborative cursors&lt;/td&gt;
&lt;td&gt;Zoom, Meet, Discord voice&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;h3&gt;Interview gotchas for §4c&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;&quot;Why not long-poll?&quot;&lt;/strong&gt; — a classic warm-up. Long-polling &lt;em&gt;works&lt;/em&gt; but each update is a new TCP + TLS handshake. For a dozen updates/sec, SSE/WS are dramatically cheaper.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Scale the fan-out, not the socket.&lt;/strong&gt; For 1 M concurrent WebSocket users, the limit isn&apos;t TCP — it&apos;s how fast you can broadcast a message to 1 M sockets. Keep a per-room subscriber index, fan-out via Redis pub/sub or Kafka, pin users to regions.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;AI token streaming.&lt;/strong&gt; Both SSE and WebSocket work. Most LLM APIs (OpenAI, Anthropic) ship SSE — it&apos;s simpler, and the stream is strictly server → client.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;&lt;code&gt;wss://&lt;/code&gt; is mandatory in production.&lt;/strong&gt; Mobile carriers and corporate proxies routinely strip or block plain &lt;code&gt;ws://&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;WebRTC without a TURN budget is a demo.&lt;/strong&gt; Your team-coffee prototype works in the office because everyone&apos;s on the same NAT. Real users need TURN, and TURN bandwidth costs real money.&lt;/li&gt;
&lt;/ul&gt;
&lt;hr /&gt;
&lt;h2&gt;5. Load Balancing&lt;/h2&gt;
&lt;p&gt;A load balancer is the thing that lets you say &quot;run N copies of my service&quot; instead of &quot;run my service.&quot; It does three jobs at once:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Horizontal scaling&lt;/strong&gt; — spread load across replicas.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Availability&lt;/strong&gt; — health-check backends, take dead ones out of rotation.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Deployment flexibility&lt;/strong&gt; — gate traffic into new versions for blue/green, canary, rolling.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Every system-design interview touches one of these three.&lt;/p&gt;
&lt;h3&gt;Client-side vs dedicated load balancing&lt;/h3&gt;
&lt;p&gt;Two fundamentally different architectures, with very different failure modes.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;flowchart LR
    subgraph CLIENT [&quot;🧭 Client-side LB&quot;]
        direction TB
        C1[&quot;Client&amp;lt;br/&amp;gt;(with registry cache)&quot;]
        REG[(&quot;Service&amp;lt;br/&amp;gt;registry&amp;lt;br/&amp;gt;(Consul, etcd,&amp;lt;br/&amp;gt;k8s EndpointSlice)&quot;)]
        B1[&quot;Backend A&quot;]
        B2[&quot;Backend B&quot;]
        B3[&quot;Backend C&quot;]
        C1 -.refresh.-&amp;gt; REG
        C1 --&amp;gt;|picks a peer| B1
        C1 --&amp;gt; B2
        C1 --&amp;gt; B3
    end

    subgraph DED [&quot;🏗️ Dedicated LB&quot;]
        direction TB
        C2[Client] --&amp;gt; VIP[&quot;Load balancer&amp;lt;br/&amp;gt;(nginx · Envoy · ALB)&quot;]
        VIP --&amp;gt; D1[Backend A]
        VIP --&amp;gt; D2[Backend B]
        VIP --&amp;gt; D3[Backend C]
        HC[[&quot;health&amp;lt;br/&amp;gt;checks&quot;]] -.active.-&amp;gt; D1
        HC -.-&amp;gt; D2
        HC -.-&amp;gt; D3
        VIP --- HC
    end

    classDef client fill:#fef3c7,stroke:#d97706,stroke-width:1.5px,color:#78350f;
    classDef ded fill:#dbeafe,stroke:#3b6fd6,stroke-width:1.5px,color:#0f172a;
    classDef backend fill:#dcfce7,stroke:#16a34a,stroke-width:1.5px,color:#14532d;
    classDef infra fill:#e9d5ff,stroke:#7c3aed,stroke-width:1.5px,color:#4c1d95;
    class C1,C2 client
    class VIP,HC ded
    class B1,B2,B3,D1,D2,D3 backend
    class REG infra
&lt;/code&gt;&lt;/pre&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;Client-side&lt;/strong&gt;&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;Dedicated&lt;/strong&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Who picks the backend?&lt;/td&gt;
&lt;td&gt;the client library&lt;/td&gt;
&lt;td&gt;a box in the middle&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Extra network hop?&lt;/td&gt;
&lt;td&gt;no&lt;/td&gt;
&lt;td&gt;yes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Failure blast radius&lt;/td&gt;
&lt;td&gt;one client affected&lt;/td&gt;
&lt;td&gt;LB down = everything affected&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Health-check work&lt;/td&gt;
&lt;td&gt;every client&lt;/td&gt;
&lt;td&gt;centralized at the LB&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Best fit&lt;/td&gt;
&lt;td&gt;internal RPC (gRPC, Finagle, mesh sidecars)&lt;/td&gt;
&lt;td&gt;HTTP from unknown clients (browsers, mobile)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Typical infra&lt;/td&gt;
&lt;td&gt;Consul / etcd + client library&lt;/td&gt;
&lt;td&gt;ALB / NLB / nginx / Envoy / k8s Service&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;Kubernetes &lt;code&gt;Service&lt;/code&gt; objects are quietly a &lt;em&gt;distributed&lt;/em&gt; dedicated LB: each node&apos;s kube-proxy programs iptables/IPVS rules to DNAT traffic to a healthy pod, so there is no single chokepoint box. That&apos;s a nice hybrid — client code talks to a single VIP, but the data plane is per-node.&lt;/p&gt;
&lt;h3&gt;L4 vs L7 — the axis you must know cold&lt;/h3&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;L4 (transport)&lt;/strong&gt;&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;L7 (application)&lt;/strong&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Inspects&lt;/td&gt;
&lt;td&gt;IP + port + TCP flags&lt;/td&gt;
&lt;td&gt;HTTP method, path, headers, cookies, gRPC metadata&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Routing rules&lt;/td&gt;
&lt;td&gt;&quot;all :443 → pool X&quot;&lt;/td&gt;
&lt;td&gt;&quot;&lt;code&gt;/api/v2/*&lt;/code&gt; → pool A, cookie &lt;code&gt;beta=1&lt;/code&gt; → pool B&quot;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;TLS&lt;/td&gt;
&lt;td&gt;passthrough (SNI-only)&lt;/td&gt;
&lt;td&gt;terminate + re-inspect&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;CPU cost&lt;/td&gt;
&lt;td&gt;low&lt;/td&gt;
&lt;td&gt;high (parse each request)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Connection reuse&lt;/td&gt;
&lt;td&gt;transparent&lt;/td&gt;
&lt;td&gt;LB owns the pool to backends&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Typical products&lt;/td&gt;
&lt;td&gt;AWS NLB, HAProxy &lt;code&gt;mode tcp&lt;/code&gt;, IPVS&lt;/td&gt;
&lt;td&gt;ALB, nginx, Envoy, Traefik, Kong&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;&lt;strong&gt;Rule of thumb&lt;/strong&gt;: if you need header-based routing, canary by cookie, path rewrites, or per-route rate limits → L7. If you need raw throughput for arbitrary TCP (Redis, Postgres replicas, WebSocket pass-through) → L4.&lt;/p&gt;
&lt;h3&gt;Balancing algorithms&lt;/h3&gt;
&lt;p&gt;Assume 4 backends. Each algorithm decides where request &lt;code&gt;N+1&lt;/code&gt; goes.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;flowchart LR
    R[[&quot;Incoming requests&amp;lt;br/&amp;gt;1 · 2 · 3 · 4 · 5 · 6 · 7 · 8 · 9 · 10&quot;]]

    subgraph RR [&quot;🔁 Round-robin&quot;]
        direction TB
        A1[&quot;A: 1,5,9&quot;]
        A2[&quot;B: 2,6,10&quot;]
        A3[&quot;C: 3,7&quot;]
        A4[&quot;D: 4,8&quot;]
    end
    subgraph WRR [&quot;⚖️ Weighted (A=3 · B=1 · C=1 · D=1)&quot;]
        direction TB
        B1[&quot;A: 1,2,3,7,8,9&quot;]
        B2[&quot;B: 4,10&quot;]
        B3[&quot;C: 5&quot;]
        B4[&quot;D: 6&quot;]
    end
    subgraph LC [&quot;📊 Least-connections&quot;]
        direction TB
        C1[&quot;A: 3 open&quot;]
        C2[&quot;B: 4 open&quot;]
        C3[&quot;C: &amp;lt;b&amp;gt;1 open ← next req&amp;lt;/b&amp;gt;&quot;]
        C4[&quot;D: 2 open&quot;]
    end
    subgraph P2C [&quot;🎲 Power of two choices&quot;]
        direction TB
        D1[&quot;pick 2 random:&amp;lt;br/&amp;gt;C (1 open) vs A (3 open)&quot;]
        D2[&quot;send to C&quot;]
    end

    R --&amp;gt; RR
    R --&amp;gt; WRR
    R --&amp;gt; LC
    R --&amp;gt; P2C

    classDef hub fill:#fef3c7,stroke:#d97706,stroke-width:2px,color:#78350f;
    classDef ok fill:#dcfce7,stroke:#16a34a,stroke-width:1.5px,color:#14532d;
    classDef hot fill:#fee2e2,stroke:#dc2626,stroke-width:1.5px,color:#7f1d1d;
    classDef best fill:#dbeafe,stroke:#3b6fd6,stroke-width:2px,color:#0f172a;
    class R hub
    class A1,A2,A3,A4,B1,B2,B3,B4 ok
    class C1,C2,C4,D1 ok
    class C3,D2 best
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Quick take on each:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Round-robin&lt;/strong&gt; — zero state; pathological when backends have different capacity or when request cost varies.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Weighted round-robin&lt;/strong&gt; — tell the LB &quot;backend A is 3× the size,&quot; traffic splits 3:1:1:1. Typical during canary ramp.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Least connections&lt;/strong&gt; — usually the right default for long-lived HTTP and request-response with long tail. Requires the LB to count in-flight.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Power of two choices (P2C)&lt;/strong&gt; — pick two backends at random, send to the one with fewer active requests. Surprisingly close to optimal, no global state needed. Used by Envoy, Finagle, NGINX Plus.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;IP hash / session affinity (sticky sessions)&lt;/strong&gt; — hashes source IP (or a cookie) to a backend. Use only when the app has in-memory session state; prefer refactoring the state out.&lt;/li&gt;
&lt;/ul&gt;
&lt;h4&gt;Consistent hashing, for cache locality and stateful services&lt;/h4&gt;
&lt;p&gt;When each backend holds &lt;em&gt;different&lt;/em&gt; data (a sharded cache, a stateful computation, a per-user queue) you can&apos;t send requests to just anyone — the answer is only on one specific node. Naive &lt;code&gt;hash(key) % N&lt;/code&gt; sends 100 % of keys to new nodes when &lt;code&gt;N&lt;/code&gt; changes. &lt;strong&gt;Consistent hashing&lt;/strong&gt; shifts only ~&lt;code&gt;1/N&lt;/code&gt;.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;flowchart LR
    subgraph RING [&quot;Hash ring (mod 2³²)&quot;]
        direction TB
        A[&quot;🟩 Backend A&amp;lt;br/&amp;gt;vnodes @ 0, 120, 250&quot;]
        B[&quot;🟦 Backend B&amp;lt;br/&amp;gt;vnodes @ 60, 180, 310&quot;]
        C[&quot;🟧 Backend C&amp;lt;br/&amp;gt;vnodes @ 90, 200, 340&quot;]
    end

    K1[&quot;key &apos;user:42&apos; → hash 85&quot;] --&amp;gt;|&quot;nearest clockwise = 90&quot;| C
    K2[&quot;key &apos;session:abc&apos; → hash 155&quot;] --&amp;gt;|&quot;180&quot;| B
    K3[&quot;key &apos;order:7&apos; → hash 220&quot;] --&amp;gt;|&quot;250&quot;| A

    classDef key fill:#fef3c7,stroke:#d97706,stroke-width:1.5px,color:#78350f;
    classDef a fill:#dcfce7,stroke:#16a34a,stroke-width:1.5px,color:#14532d;
    classDef b fill:#dbeafe,stroke:#3b6fd6,stroke-width:1.5px,color:#0f172a;
    classDef c fill:#fed7aa,stroke:#ea580c,stroke-width:1.5px,color:#7c2d12;
    class K1,K2,K3 key
    class A a
    class B b
    class C c
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;strong&gt;Vnodes&lt;/strong&gt; (virtual nodes) are the trick that makes the distribution uniform. Each physical backend claims ~150 random positions on the ring; a new backend pulls roughly the right slice of keys off its neighbors instead of inheriting whichever contiguous arc happened to be next to its single position. Redis Cluster, DynamoDB, Cassandra, &lt;code&gt;memcached&lt;/code&gt; clients — all consistent-hash with vnodes under the hood.&lt;/p&gt;
&lt;h3&gt;Health checks — active and passive, together&lt;/h3&gt;
&lt;p&gt;Two complementary signals. In production you want both.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Active&lt;/strong&gt; — the LB probes &lt;code&gt;GET /healthz&lt;/code&gt; every 2–5 s. A failure → take out of rotation after &lt;code&gt;N&lt;/code&gt; consecutive misses. Cheap, predictable, but adds background load.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Passive&lt;/strong&gt; — the LB observes live traffic: 5xx responses, connection resets, timeouts. If a backend&apos;s error rate spikes above a threshold, eject it for a cooldown (Envoy&apos;s &lt;strong&gt;outlier detection&lt;/strong&gt;).&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The &lt;code&gt;/healthz&lt;/code&gt; vs &lt;code&gt;/readyz&lt;/code&gt; distinction matters for Kubernetes:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;/livez&lt;/code&gt; (liveness) — &quot;am I alive?&quot; If this fails, k8s restarts the container. Keep it trivial; don&apos;t check DB here (otherwise one slow DB takes out every pod).&lt;/li&gt;
&lt;li&gt;&lt;code&gt;/readyz&lt;/code&gt; (readiness) — &quot;am I ready to serve traffic?&quot; If this fails, k8s takes you out of the Service&apos;s endpoint list but does not restart you. Check dependencies here (DB, caches, downstream auth service).&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Slow-start / warm-up.&lt;/strong&gt; When a new backend comes up, don&apos;t immediately send it 25% of traffic — its connection pool is cold, JIT isn&apos;t warm, the JVM hasn&apos;t C2-compiled, Postgres connection handshakes amortize. Envoy has &lt;code&gt;slow_start_config&lt;/code&gt;; nginx has &lt;code&gt;slow_start=30s&lt;/code&gt; in &lt;code&gt;upstream&lt;/code&gt;. Without it, the first pod in a rolling deploy absorbs a latency spike every time.&lt;/p&gt;
&lt;h3&gt;TLS termination: where does the crypto happen?&lt;/h3&gt;
&lt;pre&gt;&lt;code&gt;flowchart LR
    CL[Client] --&amp;gt;|&quot;HTTPS&quot;| LB

    subgraph TERM_LB [&quot;① Terminate at LB&quot;]
        LB1[&quot;LB&amp;lt;br/&amp;gt;(holds cert + key)&quot;]
        LB1 --&amp;gt;|&quot;HTTP&amp;lt;br/&amp;gt;&amp;lt;i&amp;gt;or&amp;lt;/i&amp;gt; re-encrypted HTTPS&quot;| B1[Backend]
    end

    subgraph TERM_BACK [&quot;② Passthrough to backend&quot;]
        LB2[&quot;LB&amp;lt;br/&amp;gt;(L4, SNI-only)&quot;]
        LB2 --&amp;gt;|&quot;HTTPS passthrough&quot;| B2[&quot;Backend&amp;lt;br/&amp;gt;(holds cert + key)&quot;]
    end

    subgraph TERM_MESH [&quot;③ Mesh sidecar (mTLS)&quot;]
        LB3[LB] --&amp;gt;|&quot;HTTPS&quot;| SC1[Envoy sidecar]
        SC1 --&amp;gt;|&quot;localhost plaintext&quot;| B3[Backend]
        SC1 -.mTLS to peers.- SC2[other sidecars]
    end

    classDef lb fill:#dbeafe,stroke:#3b6fd6,stroke-width:1.5px,color:#0f172a;
    classDef backend fill:#dcfce7,stroke:#16a34a,stroke-width:1.5px,color:#14532d;
    classDef mesh fill:#e9d5ff,stroke:#7c3aed,stroke-width:1.5px,color:#4c1d95;
    class LB1,LB2,LB3 lb
    class B1,B2,B3 backend
    class SC1,SC2 mesh
&lt;/code&gt;&lt;/pre&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;Terminate at LB&lt;/strong&gt;&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;Passthrough&lt;/strong&gt;&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;Mesh sidecar&lt;/strong&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Cert lives on&lt;/td&gt;
&lt;td&gt;LB&lt;/td&gt;
&lt;td&gt;every backend&lt;/td&gt;
&lt;td&gt;sidecar + LB&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;L7 routing?&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;❌ (L4 only)&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;End-to-end encrypted?&lt;/td&gt;
&lt;td&gt;❌ unless LB→backend re-encrypts&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;✅ (mTLS)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Crypto CPU cost&lt;/td&gt;
&lt;td&gt;centralized on LB&lt;/td&gt;
&lt;td&gt;spread to backends&lt;/td&gt;
&lt;td&gt;on sidecars&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Typical use&lt;/td&gt;
&lt;td&gt;public web app&lt;/td&gt;
&lt;td&gt;pinned certs, compliance&lt;/td&gt;
&lt;td&gt;zero-trust service mesh&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;h3&gt;Global / geo load balancing&lt;/h3&gt;
&lt;p&gt;The LB box above is regional. Routing users to the closest healthy &lt;em&gt;region&lt;/em&gt; happens one level up:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;DNS-level&lt;/strong&gt; — the authoritative DNS server returns different &lt;code&gt;A&lt;/code&gt; records based on the resolver&apos;s location (AWS Route 53 latency-based routing, GeoDNS). TTL is the enemy: a dead region bleeds traffic until TTLs expire everywhere. Keep global-health TTLs low (30-60 s).&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Anycast&lt;/strong&gt; — the same IP is announced from multiple BGP points; routers pick the topologically nearest. CDNs and DNS root servers use anycast. Failover is sub-second because it&apos;s a routing update, not a DNS refresh.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;App-level&lt;/strong&gt; — the app decides, possibly overriding DNS. E.g. the web app pins users to their home shard after login.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;§6 goes deeper into CDN / regional.&lt;/p&gt;
&lt;p&gt;Here&apos;s the minimum viable production stack — one region, the seven boxes an interviewer expects you to name:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;flowchart LR
    user@{ shape: circle, label: &quot;User&quot; }
    cdn@{ shape: cloud, label: &quot;CDN&quot; }

    subgraph region [&quot;Region (one of many)&quot;]
        direction TB
        lb@{ shape: stadium, label: &quot;LB (nginx)&quot; }
        app@{ shape: rounded, label: &quot;API (Go)&quot; }
        cache@{ shape: cyl, label: &quot;Redis&quot; }
        db@{ shape: cyl, label: &quot;Postgres&quot; }
    end

    kafka@{ shape: stadium, label: &quot;Kafka (cross-region)&quot; }

    user --&amp;gt;|&quot;HTTPS&quot;| cdn
    cdn --&amp;gt;|&quot;miss → origin&quot;| lb
    lb --&amp;gt;|&quot;route&quot;| app
    app --&amp;gt;|&quot;read/write cache&quot;| cache
    app --&amp;gt;|&quot;sync write&quot;| db
    db --&amp;gt;|&quot;CDC events&quot;| kafka

    classDef neutral fill:#dbeafe,stroke:#3b6fd6,stroke-width:1.5px,color:#0f172a;
    classDef storage fill:#fed7aa,stroke:#ea580c,stroke-width:1.5px,color:#7c2d12;
    classDef highlight fill:#e9d5ff,stroke:#7c3aed,stroke-width:2px,color:#4c1d95;
    class user,cdn,app neutral
    class cache,db,kafka storage
    class lb highlight
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;em&gt;Six logos, one focal point (the LB), one flow from left to right. Replicate the &lt;code&gt;region&lt;/code&gt; group N times for multi-region; the Kafka bus is the only thing that actually crosses the boundary.&lt;/em&gt;&lt;/p&gt;
&lt;h3&gt;Deployment patterns the LB enables&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Blue/green&lt;/strong&gt; — keep blue running, deploy green alongside, flip all traffic in one LB config change. Roll back = flip back.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Canary&lt;/strong&gt; — weighted routing: 1 % → 10 % → 50 % → 100 % to the new version, watching error rate + latency at each step.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Rolling&lt;/strong&gt; — replace pods N at a time; LB takes the restarting pod out of rotation via readiness.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;All three require the LB to separate &quot;in rotation&quot; from &quot;running,&quot; which is exactly what readiness probes + weighted pools give you.&lt;/p&gt;
&lt;h3&gt;Go — a minimal round-robin L7 reverse proxy&lt;/h3&gt;
&lt;p&gt;Illustrative, not production:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;package main

import (
    &quot;net/http&quot;
    &quot;net/http/httputil&quot;
    &quot;net/url&quot;
    &quot;sync/atomic&quot;
    &quot;time&quot;
)

type Backend struct {
    URL      *url.URL
    Healthy  atomic.Bool
    Proxy    *httputil.ReverseProxy
}

type Pool struct {
    backends []*Backend
    idx      atomic.Uint64 // for round-robin
}

func NewPool(urls []string) *Pool {
    p := &amp;amp;Pool{}
    for _, raw := range urls {
        u, _ := url.Parse(raw)
        b := &amp;amp;Backend{URL: u}
        b.Proxy = httputil.NewSingleHostReverseProxy(u)
        // Mark backend unhealthy on transport errors.
        b.Proxy.ErrorHandler = func(w http.ResponseWriter, r *http.Request, err error) {
            b.Healthy.Store(false)
            http.Error(w, &quot;bad gateway&quot;, http.StatusBadGateway)
        }
        b.Healthy.Store(true)
        p.backends = append(p.backends, b)
    }
    return p
}

func (p *Pool) NextHealthy() *Backend {
    n := uint64(len(p.backends))
    for i := uint64(0); i &amp;lt; n; i++ {
        b := p.backends[p.idx.Add(1)%n]
        if b.Healthy.Load() {
            return b
        }
    }
    return nil
}

func (p *Pool) ServeHTTP(w http.ResponseWriter, r *http.Request) {
    b := p.NextHealthy()
    if b == nil {
        http.Error(w, &quot;no healthy backend&quot;, http.StatusServiceUnavailable)
        return
    }
    b.Proxy.ServeHTTP(w, r)
}

// Active health check: every 5s, GET /healthz on each backend.
func (p *Pool) HealthLoop() {
    client := &amp;amp;http.Client{Timeout: 1 * time.Second}
    t := time.NewTicker(5 * time.Second)
    for range t.C {
        for _, b := range p.backends {
            resp, err := client.Get(b.URL.String() + &quot;/healthz&quot;)
            b.Healthy.Store(err == nil &amp;amp;&amp;amp; resp != nil &amp;amp;&amp;amp; resp.StatusCode == 200)
            if resp != nil {
                resp.Body.Close()
            }
        }
    }
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Shortcuts this deliberately takes: no retry on 5xx, no P2C, no TLS backend, no hot-reload of the pool. Production LBs (Envoy, nginx) do all of those plus connection pooling, HTTP/2 multiplexing, circuit breaking, and metrics — the reason &quot;just write your own&quot; is almost always the wrong answer.&lt;/p&gt;
&lt;h3&gt;Interview gotchas for §5&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Thundering herd after an LB event.&lt;/strong&gt; When the LB restarts or many backends come up in a rolling deploy, naive round-robin sends one big wave to the newest pod. &lt;strong&gt;Slow-start mode&lt;/strong&gt; (Envoy, NGINX Plus) ramps weight up over &lt;code&gt;N&lt;/code&gt; seconds. Ask about it.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Session affinity is a scaling debt.&lt;/strong&gt; Breaks the symmetry that lets you kill any pod without user impact. If you must, key affinity on &lt;code&gt;user_id&lt;/code&gt; (cookie), &lt;em&gt;not&lt;/em&gt; source IP — phones roam between Wi-Fi and LTE.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;L7 LBs are CPU-bound on TLS&lt;/strong&gt;, not bandwidth. Plan capacity on handshakes-per-second, not gbps. Session resumption + OCSP stapling + HTTP/2 keepalive ease the bill.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;&lt;code&gt;keepalive_timeout&lt;/code&gt; mismatches → 502 storms.&lt;/strong&gt; If the LB&apos;s idle timeout is &lt;em&gt;longer&lt;/em&gt; than the backend&apos;s, the LB will send new requests on sockets the backend is about to close. Always keep LB idle ≤ backend idle − a couple of seconds.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;DNS TTL vs failover.&lt;/strong&gt; A 300 s TTL means a dead region bleeds traffic for 300 s worldwide. Lower the TTL &lt;em&gt;before&lt;/em&gt; you need to fail over (not during), or put anycast in front of the name.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Don&apos;t put an L7 LB in front of another L7 LB&lt;/strong&gt; unless you love chasing ghost 502s. One hop that parses HTTP is enough; the second hop just adds surface area for header drift, keepalive mismatch, and H1↔H2 translation bugs.&lt;/li&gt;
&lt;/ul&gt;
&lt;hr /&gt;
&lt;h2&gt;6. Deep Dives — CDN, regional, resilience&lt;/h2&gt;
&lt;p&gt;Up to this point every protocol assumed the network cooperates. It doesn&apos;t. Packets drop, regions fail, downstream services slow to a crawl, and your latency budget is shorter than any single hop&apos;s tail. This section is the kit of patterns you reach for to keep a system standing when things go sideways.&lt;/p&gt;
&lt;h3&gt;The latency / availability math interviewers expect&lt;/h3&gt;
&lt;p&gt;Two numbers you should be able to derive on the whiteboard:&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;SLA&lt;/th&gt;
&lt;th&gt;Budget per year&lt;/th&gt;
&lt;th&gt;Per month&lt;/th&gt;
&lt;th&gt;Per week&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;99.0%&lt;/td&gt;
&lt;td&gt;3.65 days&lt;/td&gt;
&lt;td&gt;7.2 h&lt;/td&gt;
&lt;td&gt;1.68 h&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;99.9% (three 9s)&lt;/td&gt;
&lt;td&gt;8.76 h&lt;/td&gt;
&lt;td&gt;43.8 min&lt;/td&gt;
&lt;td&gt;10.1 min&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;99.99% (four 9s)&lt;/td&gt;
&lt;td&gt;52.6 min&lt;/td&gt;
&lt;td&gt;4.4 min&lt;/td&gt;
&lt;td&gt;60.5 s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;99.999% (five 9s)&lt;/td&gt;
&lt;td&gt;5.26 min&lt;/td&gt;
&lt;td&gt;26.3 s&lt;/td&gt;
&lt;td&gt;6 s&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;&lt;strong&gt;Composition rule.&lt;/strong&gt; If you depend on N downstream services each at 99.9%, your availability is &lt;code&gt;0.999^N&lt;/code&gt;. Ten dependencies ≈ 99%. That&apos;s why resilience patterns exist — they recover availability from components that individually aren&apos;t good enough.&lt;/p&gt;
&lt;h3&gt;CDNs: push the edge closer&lt;/h3&gt;
&lt;p&gt;A CDN caches static (and increasingly dynamic) responses at points of presence (PoPs) near your users. A request hits the nearest PoP; if cached, served from there. If not, the PoP fetches from origin, caches, returns. First byte goes from ~200 ms trans-Pacific to ~20 ms same-city.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;flowchart LR
    user@{ shape: circle, label: &quot;User&quot; }
    pop@{ shape: cloud, label: &quot;Edge PoP&quot; }
    origin@{ shape: rounded, label: &quot;Origin&quot; }
    bucket@{ shape: disk, label: &quot;Object store&quot; }

    user --&amp;gt;|&quot;request&quot;| pop
    pop --&amp;gt;|&quot;cache miss&quot;| origin
    origin --&amp;gt;|&quot;read&quot;| bucket
    bucket -.-&amp;gt;|&quot;response&quot;| origin
    origin -.-&amp;gt;|&quot;fill cache&quot;| pop
    pop -.-&amp;gt;|&quot;response&quot;| user

    classDef neutral fill:#dbeafe,stroke:#3b6fd6,stroke-width:1.5px,color:#0f172a;
    classDef highlight fill:#e9d5ff,stroke:#7c3aed,stroke-width:2px,color:#4c1d95;
    classDef storage fill:#fed7aa,stroke:#ea580c,stroke-width:1.5px,color:#7c2d12;
    class user,origin neutral
    class pop highlight
    class bucket storage
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;strong&gt;Four knobs that actually matter in an interview:&lt;/strong&gt;&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Cache keys.&lt;/strong&gt; By default, URL = key. &lt;code&gt;Vary: Accept-Language&lt;/code&gt; splits entries by language header. Get the key wrong → serve the wrong user&apos;s content.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;TTL vs stale-while-revalidate.&lt;/strong&gt; &lt;code&gt;Cache-Control: max-age=60, stale-while-revalidate=3600&lt;/code&gt; = serve cached, asynchronously refetch after 60 s, serve a stale copy for up to 1 h if the origin is down. Trade freshness for resilience.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Cache-stampede protection.&lt;/strong&gt; When a popular URL expires, 10,000 clients hit the origin simultaneously. Fix: request coalescing at the edge (one request fans out), or &lt;code&gt;stale-while-revalidate&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Purging.&lt;/strong&gt; Pushing a fix? Tag-based invalidation (&lt;code&gt;Cache-Tag: article-42&lt;/code&gt;) is far better than URL-based when one piece of content appears in many URLs.&lt;/li&gt;
&lt;/ol&gt;
&lt;h3&gt;Regional partitioning: blast-radius management&lt;/h3&gt;
&lt;p&gt;Single-region = single blast radius. If you lose us-east-1, you lose everything. Three typical multi-region postures:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;flowchart LR
    subgraph AP [&quot;Active / Passive&quot;]
        direction TB
        APpri[&quot;Primary&amp;lt;br/&amp;gt;100% of traffic&quot;]
        APstd[&quot;Standby&amp;lt;br/&amp;gt;0% · replicated&quot;]
        APpri --&amp;gt;|&quot;async replication&quot;| APstd
    end

    subgraph AA [&quot;Active / Active&quot;]
        direction TB
        AA1[&quot;Region A&amp;lt;br/&amp;gt;50%&quot;]
        AA2[&quot;Region B&amp;lt;br/&amp;gt;50%&quot;]
        AA1 &amp;lt;--&amp;gt;|&quot;bi-directional sync&quot;| AA2
    end

    subgraph CELL [&quot;Cell-based&quot;]
        direction TB
        C1[&quot;Cell 1&amp;lt;br/&amp;gt;users 0-33%&quot;]
        C2[&quot;Cell 2&amp;lt;br/&amp;gt;users 33-66%&quot;]
        C3[&quot;Cell 3&amp;lt;br/&amp;gt;users 66-100%&quot;]
    end

    classDef neutral fill:#dbeafe,stroke:#3b6fd6,stroke-width:1.5px,color:#0f172a;
    classDef ok fill:#dcfce7,stroke:#16a34a,stroke-width:1.5px,color:#14532d;
    classDef warn fill:#fef3c7,stroke:#d97706,stroke-width:1.5px,color:#78350f;
    class APpri neutral
    class APstd warn
    class AA1,AA2 ok
    class C1,C2,C3 neutral
&lt;/code&gt;&lt;/pre&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Model&lt;/th&gt;
&lt;th&gt;Failover time&lt;/th&gt;
&lt;th&gt;Data loss risk&lt;/th&gt;
&lt;th&gt;Operational cost&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Active / Passive&lt;/td&gt;
&lt;td&gt;minutes (DNS / BGP flip)&lt;/td&gt;
&lt;td&gt;last async-replication window (seconds to minutes)&lt;/td&gt;
&lt;td&gt;low — stand-by is cheap&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Active / Active&lt;/td&gt;
&lt;td&gt;seconds (already serving)&lt;/td&gt;
&lt;td&gt;merge conflicts if both wrote&lt;/td&gt;
&lt;td&gt;high — multi-master sync&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cell-based&lt;/td&gt;
&lt;td&gt;blast radius = one cell&lt;/td&gt;
&lt;td&gt;only that cell&apos;s users&lt;/td&gt;
&lt;td&gt;medium — many small cells&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;&lt;strong&gt;Cell-based&lt;/strong&gt; is AWS&apos;s favorite pattern: each &quot;cell&quot; is a self-contained stack serving a slice of users. When a cell goes bad, only that slice is affected. Adding capacity = adding cells, not scaling one giant region.&lt;/p&gt;
&lt;h3&gt;Timeouts: the most underappreciated resilience primitive&lt;/h3&gt;
&lt;p&gt;If you remember nothing else: &lt;strong&gt;every network call needs a timeout.&lt;/strong&gt; The default behavior of most language HTTP libraries is &quot;wait forever&quot; — which translates to goroutines/threads piling up, connection pools exhausting, the whole service grinding to a halt because of one slow downstream.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Timeout budget&lt;/strong&gt; — carry a deadline through the call stack, not a timeout per hop. If the top-level request has 800 ms left, an internal call can&apos;t use 500 ms if two more hops come after it.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;sequenceDiagram
    autonumber
    participant U as User
    participant A as API (budget: 800ms)
    participant B as Service B (budget: 400ms)
    participant C as Service C (budget: 150ms)

    U-&amp;gt;&amp;gt;A: request
    Note over A: deadline = now + 800ms
    A-&amp;gt;&amp;gt;B: call (ctx deadline = 400ms)
    Note over B: deadline = now + 400ms
    B-&amp;gt;&amp;gt;C: call (ctx deadline = 150ms)
    Note over C: deadline = now + 150ms
    C--&amp;gt;&amp;gt;B: 80ms
    B--&amp;gt;&amp;gt;A: 220ms
    A--&amp;gt;&amp;gt;U: 370ms ✓ within budget
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;In Go, &lt;code&gt;context.Context&lt;/code&gt; does this for you — &lt;code&gt;context.WithDeadline(parent, t)&lt;/code&gt; clamps the child&apos;s deadline to whichever is tighter. gRPC and well-behaved HTTP libraries read &lt;code&gt;ctx.Deadline()&lt;/code&gt; and fail fast if there&apos;s no time left.&lt;/p&gt;
&lt;h3&gt;Retries with exponential backoff + jitter&lt;/h3&gt;
&lt;p&gt;When a transient error happens (DNS blip, upstream restart, rate-limit), the first instinct is to retry. Done naively, retries turn a 1-second outage into a 30-second outage as clients synchronize their retry storms.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Three rules to retry well:&lt;/strong&gt;&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Retry only idempotent operations.&lt;/strong&gt; &lt;code&gt;GET&lt;/code&gt;, &lt;code&gt;PUT&lt;/code&gt;, &lt;code&gt;DELETE&lt;/code&gt; — yes. &lt;code&gt;POST&lt;/code&gt; — only if you carry an idempotency key.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Cap the number of attempts.&lt;/strong&gt; Usually 3–5. Infinite retries is a DDoS on yourself.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Exponential backoff + jitter.&lt;/strong&gt; Double the delay each attempt, then add randomness so N clients don&apos;t all retry at the exact same moment.&lt;/li&gt;
&lt;/ol&gt;
&lt;pre&gt;&lt;code&gt;sequenceDiagram
    autonumber
    participant C as Client
    participant S as Upstream

    C-&amp;gt;&amp;gt;S: GET /resource
    S--xC: 503 Service Unavailable

    Note over C: wait 100-300ms&amp;lt;br/&amp;gt;(base 200ms + jitter)
    C-&amp;gt;&amp;gt;S: attempt 2
    S--xC: 503

    Note over C: wait 200-600ms&amp;lt;br/&amp;gt;(base 400ms + jitter)
    C-&amp;gt;&amp;gt;S: attempt 3
    S--xC: 503

    Note over C: wait 400-1200ms&amp;lt;br/&amp;gt;(base 800ms + jitter)
    C-&amp;gt;&amp;gt;S: attempt 4
    S--&amp;gt;&amp;gt;C: 200 OK ✅
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The jitter is crucial. Without it, all clients that failed at &lt;code&gt;t=0&lt;/code&gt; retry simultaneously at &lt;code&gt;t=200&lt;/code&gt;, crushing the service again. With &lt;strong&gt;full jitter&lt;/strong&gt; (&lt;code&gt;sleep = rand(0, base * 2^attempt)&lt;/code&gt;), retries smear across the whole interval.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Go — the retry that you copy-paste into every new service:&lt;/strong&gt;&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;import (
    &quot;context&quot;
    &quot;errors&quot;
    &quot;math/rand&quot;
    &quot;time&quot;
)

type retryableFn func(ctx context.Context) error

// retry runs fn with full-jitter exponential backoff, capped at maxAttempts.
// Returns the last error if all attempts fail. Respects the context deadline.
func retry(ctx context.Context, maxAttempts int, base time.Duration, fn retryableFn) error {
    var err error
    for attempt := 0; attempt &amp;lt; maxAttempts; attempt++ {
        err = fn(ctx)
        if err == nil {
            return nil
        }
        if !isRetryable(err) || attempt == maxAttempts-1 {
            return err
        }
        // Full jitter: sleep = random in [0, base * 2^attempt]
        maxSleep := base * (1 &amp;lt;&amp;lt; attempt)
        sleep := time.Duration(rand.Int63n(int64(maxSleep)))

        select {
        case &amp;lt;-time.After(sleep):
        case &amp;lt;-ctx.Done():
            return errors.Join(err, ctx.Err())
        }
    }
    return err
}

func isRetryable(err error) bool {
    // 5xx, connection reset, deadline exceeded — yes.
    // 4xx (client error), validation errors — no.
    var te interface{ Timeout() bool }
    if errors.As(err, &amp;amp;te) &amp;amp;&amp;amp; te.Timeout() { return true }
    // + protocol-specific heuristics (HTTP status, gRPC codes.Unavailable) …
    return false
}
&lt;/code&gt;&lt;/pre&gt;
&lt;h3&gt;Circuit breaker: stop hitting a dead service&lt;/h3&gt;
&lt;p&gt;A circuit breaker wraps calls to a downstream and fails fast when failures cross a threshold. Think of it as a fuse that opens to protect the rest of the system from cascading failures.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;stateDiagram-v2
    direction LR
    [*] --&amp;gt; Closed

    Closed --&amp;gt; Open : failures &amp;gt; threshold&amp;lt;br/&amp;gt;in rolling window
    Open --&amp;gt; HalfOpen : after cool-down&amp;lt;br/&amp;gt;(e.g. 30s)
    HalfOpen --&amp;gt; Closed : probe succeeds
    HalfOpen --&amp;gt; Open : probe fails

    note right of Closed
        normal — calls go through
        failure count incremented
    end note
    note right of Open
        fail fast — no calls
        return cached / fallback / 503
    end note
    note right of HalfOpen
        let one probe through
        decide based on result
    end note

    classDef good fill:#dcfce7,stroke:#16a34a,stroke-width:2px,color:#14532d;
    classDef bad fill:#fee2e2,stroke:#dc2626,stroke-width:2px,color:#7f1d1d;
    classDef probe fill:#fef3c7,stroke:#d97706,stroke-width:2px,color:#78350f;
    class Closed good
    class Open bad
    class HalfOpen probe
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;strong&gt;Why this matters:&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Without a breaker&lt;/strong&gt;: a dead downstream takes 10 s to time out per call. At 1000 req/s incoming, that&apos;s 10,000 goroutines stuck in flight → OOM within seconds.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;With a breaker&lt;/strong&gt; in Open state: calls fail in microseconds with a known error. Goroutines complete, connection pools stay healthy, upstream clients can be told &quot;this feature is degraded, retry in 30 s&quot; instead of &quot;your entire page timed out.&quot;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Go with &lt;code&gt;sony/gobreaker&lt;/code&gt;:&lt;/strong&gt;&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;import &quot;github.com/sony/gobreaker/v2&quot;

var paymentBreaker = gobreaker.NewCircuitBreaker[*PaymentResult](gobreaker.Settings{
    Name:        &quot;payment-gateway&quot;,
    MaxRequests: 3,             // allowed through in Half-Open
    Interval:    60 * time.Second, // rolling window
    Timeout:     30 * time.Second, // Open → Half-Open cool-down
    ReadyToTrip: func(counts gobreaker.Counts) bool {
        failureRate := float64(counts.TotalFailures) / float64(counts.Requests)
        return counts.Requests &amp;gt;= 20 &amp;amp;&amp;amp; failureRate &amp;gt;= 0.5
    },
    OnStateChange: func(name string, from, to gobreaker.State) {
        log.Printf(&quot;breaker %s: %s → %s&quot;, name, from, to)
    },
})

func chargeCard(ctx context.Context, req ChargeRequest) (*PaymentResult, error) {
    return paymentBreaker.Execute(func() (*PaymentResult, error) {
        return paymentClient.Charge(ctx, req)
    })
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;strong&gt;The knobs that matter&lt;/strong&gt; (with reasonable defaults):&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Failure threshold&lt;/strong&gt; — failure rate ≥ 50% over a rolling window of 20+ requests. Lower = more sensitive, more false trips.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Cool-down&lt;/strong&gt; — time in Open before trying Half-Open. Usually 15-60 s. Too short = hammer while still sick; too long = slow recovery.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Half-Open probe count&lt;/strong&gt; — how many requests before declaring healthy again. 1-5. Too many = expose more traffic to a still-broken service.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;Bulkheads: isolate the blast radius&lt;/h3&gt;
&lt;p&gt;Don&apos;t share pools across unrelated features. If &lt;code&gt;payment&lt;/code&gt; and &lt;code&gt;search&lt;/code&gt; both draw from one &lt;code&gt;http.Transport.MaxIdleConns = 100&lt;/code&gt; pool, a payment-gateway slowdown can starve search. Give each downstream its own pool (separate &lt;code&gt;http.Client&lt;/code&gt;), or per-tenant pools for multi-tenant systems.&lt;/p&gt;
&lt;p&gt;Paired with a breaker, this means one dead dependency degrades only &lt;em&gt;its&lt;/em&gt; feature — the rest of the app keeps working.&lt;/p&gt;
&lt;h3&gt;Rate limiting, three places&lt;/h3&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;At the edge&lt;/strong&gt; (CDN / API gateway) — by IP / API key. Rejects floods before they touch your code.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;At the service&lt;/strong&gt; (middleware) — by user / tenant / endpoint. Enforces per-customer contracts.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;At the downstream call site&lt;/strong&gt; (client-side) — token bucket per upstream. Shields your dependencies from you.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;The classic algorithm is &lt;strong&gt;token bucket&lt;/strong&gt;: &lt;code&gt;capacity&lt;/code&gt; tokens refilled at &lt;code&gt;rate&lt;/code&gt;/sec. Each request costs 1. If no tokens, 429. Bursts up to &lt;code&gt;capacity&lt;/code&gt;, steady-state &lt;code&gt;rate&lt;/code&gt;. Go&apos;s &lt;code&gt;golang.org/x/time/rate.Limiter&lt;/code&gt; is idiomatic.&lt;/p&gt;
&lt;h3&gt;Putting it together — the resilience stack&lt;/h3&gt;
&lt;pre&gt;&lt;code&gt;flowchart TB
    REQ([&quot;incoming request&quot;])
    TMO[&quot;①  timeout budget&amp;lt;br/&amp;gt;ctx.WithDeadline&quot;]
    RLT[&quot;②  rate limiter&amp;lt;br/&amp;gt;token bucket / leaky&quot;]
    BLK[&quot;③  bulkhead&amp;lt;br/&amp;gt;per-dependency pool&quot;]
    CB[&quot;④  circuit breaker&amp;lt;br/&amp;gt;closed / open / half-open&quot;]
    RET[&quot;⑤  retry&amp;lt;br/&amp;gt;exp backoff + jitter&quot;]
    CALL([&quot;downstream call&quot;])
    FALLBACK([&quot;fallback / 503 / cached&quot;])

    REQ --&amp;gt; TMO --&amp;gt; RLT --&amp;gt; BLK --&amp;gt; CB
    CB --&amp;gt;|&quot;closed&quot;| RET --&amp;gt; CALL
    CB --&amp;gt;|&quot;open&quot;| FALLBACK
    CB --&amp;gt;|&quot;half-open&quot;| CALL

    classDef step fill:#dbeafe,stroke:#3b6fd6,stroke-width:1.5px,color:#0f172a;
    classDef good fill:#dcfce7,stroke:#16a34a,stroke-width:1.5px,color:#14532d;
    classDef warn fill:#fef3c7,stroke:#d97706,stroke-width:1.5px,color:#78350f;
    class TMO,RLT,BLK,CB,RET step
    class CALL good
    class FALLBACK warn
    class REQ,FALLBACK highlight
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Order matters: &lt;strong&gt;timeouts first&lt;/strong&gt; so nothing can run forever, &lt;strong&gt;rate-limit&lt;/strong&gt; before the expensive work, &lt;strong&gt;bulkhead&lt;/strong&gt; to isolate, &lt;strong&gt;breaker&lt;/strong&gt; to fail fast, &lt;strong&gt;retry&lt;/strong&gt; on retryable errors only, then the actual call. Getting the order wrong (e.g. retrying before the breaker) amplifies bad behavior instead of absorbing it.&lt;/p&gt;
&lt;h3&gt;Interview gotchas for §6&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Retry storms after a mass timeout.&lt;/strong&gt; If you set retries = 3 on every layer (client, gateway, service, downstream), a single slow call multiplies into 3⁴ = 81 attempts. Pick &lt;em&gt;one&lt;/em&gt; layer to retry; the others pass the error up.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;&lt;code&gt;DELETE&lt;/code&gt; isn&apos;t always safe to retry.&lt;/strong&gt; It&apos;s idempotent semantically (&lt;code&gt;DELETE x&lt;/code&gt; twice = same state) but the &lt;em&gt;second&lt;/em&gt; DELETE on a not-found resource may return 404 — your caller needs to treat 404-after-DELETE as success, not failure.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Breaker + retry interaction.&lt;/strong&gt; The retry layer inside a breaker means one user retrying 3× accounts for 3 failures in the breaker&apos;s window, tripping it faster than you&apos;d expect. Decide: retry outside the breaker (breaker is the ultimate source of truth) &lt;em&gt;or&lt;/em&gt; inside (retries are &quot;part of one operation&quot;).&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Cold-start after &lt;code&gt;Open → HalfOpen&lt;/code&gt;.&lt;/strong&gt; If your downstream just came back and you send all your traffic in the first second, you kill it again. Use &lt;code&gt;MaxRequests&lt;/code&gt; in Half-Open, or add a gradual weight ramp-up (see §5 slow-start).&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Monitor &lt;code&gt;OnStateChange&lt;/code&gt;.&lt;/strong&gt; A breaker silently tripping is worse than no breaker — users see fallbacks and you don&apos;t know why. Page / log every state transition.&lt;/li&gt;
&lt;/ul&gt;
&lt;hr /&gt;
&lt;h2&gt;7. Interview cheat sheet&lt;/h2&gt;
&lt;p&gt;Three ways to use this section:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Night-before review&lt;/strong&gt; — read only this page, open the diagrams you don&apos;t remember.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;During the interview&lt;/strong&gt; — when the interviewer drops a keyword, the tables below have a starting sentence.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Mock warm-up&lt;/strong&gt; — cover the right column and quiz yourself.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;Answer template for any networking question&lt;/h3&gt;
&lt;p&gt;Strong answers have four beats in order. Missing one is the usual reason a good technical answer &lt;em&gt;feels&lt;/em&gt; mediocre:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Frame the trade-off.&lt;/strong&gt; Name the two or three things we&apos;re choosing between (latency vs throughput, consistency vs availability, correctness vs cost).&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Pick a default.&lt;/strong&gt; Give a concrete choice with numbers where you can.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Call out the failure mode.&lt;/strong&gt; Say out loud when your default breaks and what you&apos;d reach for next.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Tie to the specific system in the prompt.&lt;/strong&gt; Generic answers rate generic.&lt;/li&gt;
&lt;/ol&gt;
&lt;h3&gt;When you hear X → how to open&lt;/h3&gt;
&lt;p&gt;The right-hand column is the &lt;em&gt;first sentence&lt;/em&gt;, not the whole answer. Expand from there.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Design decisions&lt;/strong&gt;&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Interviewer says&lt;/th&gt;
&lt;th&gt;Open with&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;em&gt;&quot;What happens when I type a URL?&quot;&lt;/em&gt;&lt;/td&gt;
&lt;td&gt;DNS → TCP 3-way → TLS 1.3 (1 RTT) → HTTP request → render. First byte floor = one RTT. HTTP/3 folds the handshake into QUIC.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;em&gt;&quot;TCP or UDP for this?&quot;&lt;/em&gt;&lt;/td&gt;
&lt;td&gt;Default TCP for correctness; UDP when ordering/retransmits are the app&apos;s job (DNS, media, QUIC) or when HOL blocking matters.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;em&gt;&quot;REST, GraphQL, or gRPC?&quot;&lt;/em&gt;&lt;/td&gt;
&lt;td&gt;REST for public / CRUD / cacheable. GraphQL when one graph × many client shapes. gRPC for internal polyglot services with streaming.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;em&gt;&quot;WebSocket, SSE, or WebRTC?&quot;&lt;/em&gt;&lt;/td&gt;
&lt;td&gt;SSE for server → client feeds. WebSocket for bi-di text/binary. WebRTC only if you need media or sub-50ms peer-to-peer.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;em&gt;&quot;301 vs 302?&quot;&lt;/em&gt;&lt;/td&gt;
&lt;td&gt;301 = permanent, cached aggressively, pain to roll back. 302 = temporary, not cached. Use 307/308 to preserve the HTTP method.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;em&gt;&quot;How do you encrypt service-to-service traffic?&quot;&lt;/em&gt;&lt;/td&gt;
&lt;td&gt;mTLS, usually delegated to a service mesh sidecar (Envoy / Linkerd). The mesh owns cert rotation, identity, and policy.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;&lt;strong&gt;Scaling&lt;/strong&gt;&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Interviewer says&lt;/th&gt;
&lt;th&gt;Open with&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;em&gt;&quot;Scale this stateless service.&quot;&lt;/em&gt;&lt;/td&gt;
&lt;td&gt;One LB (L4 for raw throughput, L7 for routing / rewriting) fronts N replicas, with state in a DB or cache. Add health checks + slow-start to prevent thundering herd on rollouts.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;em&gt;&quot;Design a rate limiter.&quot;&lt;/em&gt;&lt;/td&gt;
&lt;td&gt;Token bucket: capacity &lt;code&gt;C&lt;/code&gt;, refill rate &lt;code&gt;R&lt;/code&gt;. Bursts up to &lt;code&gt;C&lt;/code&gt;, steady-state &lt;code&gt;R&lt;/code&gt;. Key by tenant / user / IP, persist counters in Redis for multi-node consistency.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;em&gt;&quot;Design for 1 M concurrent connections.&quot;&lt;/em&gt;&lt;/td&gt;
&lt;td&gt;The bottleneck is fan-out, not the sockets themselves. Per-room subscriber index, pub/sub broker (Redis / Kafka / NATS) to broadcast, region-pinned pods, connection-count health signal.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;em&gt;&quot;Deploy with zero downtime.&quot;&lt;/em&gt;&lt;/td&gt;
&lt;td&gt;Readiness gates rotation. &lt;strong&gt;Rolling&lt;/strong&gt; replaces N pods at a time; &lt;strong&gt;blue/green&lt;/strong&gt; keeps both versions hot and flips the LB; &lt;strong&gt;canary&lt;/strong&gt; shifts traffic by weight (1% → 10% → 100%) while watching error rate.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;em&gt;&quot;Multi-region strategy?&quot;&lt;/em&gt;&lt;/td&gt;
&lt;td&gt;Active/passive if the cost of replication lag is tolerable; active/active if the app can handle conflict resolution; cell-based when blast radius is the primary concern.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;&lt;strong&gt;Failure + resilience&lt;/strong&gt;&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Interviewer says&lt;/th&gt;
&lt;th&gt;Open with&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;em&gt;&quot;Postgres goes down — what happens?&quot;&lt;/em&gt;&lt;/td&gt;
&lt;td&gt;Clients carry deadlines; a circuit breaker opens after threshold so we fail fast instead of piling up goroutines. Serve cached / read-replica if the endpoint tolerates it; return a structured degradation (503 with a Retry-After) otherwise.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;em&gt;&quot;Why are your latencies spiking?&quot;&lt;/em&gt;&lt;/td&gt;
&lt;td&gt;Separate p50 from p99 first. Likely suspects: GC pauses, connection pool saturation, downstream tail latency, TCP retransmits on a flaky path, or a cold cache. Instrument with distributed tracing to find the hop.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;em&gt;&quot;Your service keeps 502-ing.&quot;&lt;/em&gt;&lt;/td&gt;
&lt;td&gt;Usually an LB ↔ backend keep-alive mismatch: LB reuses a connection the backend just closed. Align &lt;code&gt;keepalive_timeout&lt;/code&gt; (LB &amp;lt; backend) and watch &lt;code&gt;upstream_reset&lt;/code&gt; logs.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;em&gt;&quot;One user&apos;s bad request is taking down the service.&quot;&lt;/em&gt;&lt;/td&gt;
&lt;td&gt;You need bulkheads. Separate connection pools per downstream so one slow dependency doesn&apos;t starve the rest; rate-limit per user/tenant, not just globally.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;em&gt;&quot;What&apos;s wrong with retrying every error?&quot;&lt;/em&gt;&lt;/td&gt;
&lt;td&gt;Retry storms. Each layer retries 3× → stacks multiplicatively (3⁴ = 81 attempts). Retry in one place (client), use exponential backoff with full jitter, and only for &lt;em&gt;idempotent&lt;/em&gt; operations or calls with an idempotency key.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;h3&gt;The one-pager&lt;/h3&gt;
&lt;p&gt;If nothing else sticks, memorize this:&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Decision&lt;/th&gt;
&lt;th&gt;Default&lt;/th&gt;
&lt;th&gt;Why&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Transport&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;TCP&lt;/strong&gt; for correctness, &lt;strong&gt;UDP / QUIC&lt;/strong&gt; for real-time&lt;/td&gt;
&lt;td&gt;TCP head-of-line blocking is at the stream layer&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;HTTP version&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;HTTP/2&lt;/strong&gt; within a datacenter, &lt;strong&gt;HTTP/3&lt;/strong&gt; at the edge for mobile&lt;/td&gt;
&lt;td&gt;/3 rides QUIC → no HOL, connection migration across networks&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;API style&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;REST&lt;/strong&gt; public, &lt;strong&gt;gRPC&lt;/strong&gt; internal, &lt;strong&gt;GraphQL&lt;/strong&gt; when one schema × many clients&lt;/td&gt;
&lt;td&gt;Each pattern matches a distinct constraint&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Real-time&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;SSE&lt;/strong&gt; server→client, &lt;strong&gt;WebSocket&lt;/strong&gt; bi-di, &lt;strong&gt;WebRTC&lt;/strong&gt; media / sub-50ms&lt;/td&gt;
&lt;td&gt;Pick the simplest channel that solves the problem&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Load balancing&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;L4&lt;/strong&gt; for raw throughput, &lt;strong&gt;L7&lt;/strong&gt; for HTTP-aware routing&lt;/td&gt;
&lt;td&gt;L7 is CPU-bound on TLS, not bandwidth&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;LB algorithm&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Least-connections&lt;/strong&gt; as default, &lt;strong&gt;P2C&lt;/strong&gt; when stateless, &lt;strong&gt;consistent hashing&lt;/strong&gt; for shard affinity&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Resilience stack&lt;/td&gt;
&lt;td&gt;&lt;code&gt;timeout → rate-limit → bulkhead → breaker → retry → call&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Order matters — retries before the breaker amplify failures&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Retries&lt;/td&gt;
&lt;td&gt;Exponential backoff with &lt;strong&gt;full jitter&lt;/strong&gt;, cap 3–5 attempts, idempotent only&lt;/td&gt;
&lt;td&gt;Prevents retry storms&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Multi-region&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Cell-based&lt;/strong&gt; for blast-radius control, &lt;strong&gt;active / active&lt;/strong&gt; for sub-minute RTO&lt;/td&gt;
&lt;td&gt;Active/passive cheapest, active/active costliest&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Caching&lt;/td&gt;
&lt;td&gt;&lt;code&gt;Cache-Control: max-age=N, stale-while-revalidate=M&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Resilience usually beats freshness&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;h3&gt;Pitfalls to volunteer&lt;/h3&gt;
&lt;p&gt;Interviewers reward candidates who surface failure modes before being asked. The list below is short enough to scan the day of; drop one or two where they fit the scenario:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;POST is not idempotent.&lt;/strong&gt; Safe retries need an idempotency key propagated through every layer.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;&lt;code&gt;TIME_WAIT&lt;/code&gt; port exhaustion&lt;/strong&gt; on high-churn outbound clients. Use connection pools; avoid &lt;code&gt;tcp_tw_recycle&lt;/code&gt; (deprecated, breaks NAT).&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;&lt;code&gt;Connection: close&lt;/code&gt;&lt;/strong&gt; emitted on every response cripples the client pool.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;WebSocket without &lt;code&gt;CheckOrigin&lt;/code&gt;&lt;/strong&gt; is CSRF-over-WebSocket waiting to happen.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;WebRTC without a TURN budget&lt;/strong&gt; is a demo, not a product — plan for 10–20 % of calls to need the relay.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Session affinity&lt;/strong&gt; is scaling debt. It breaks the symmetry that lets you terminate any pod.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;&lt;code&gt;keepalive_timeout&lt;/code&gt; asymmetry&lt;/strong&gt; between LB and backend produces 502 storms.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;DNS TTL during failover.&lt;/strong&gt; A 300 s TTL means 300 s of bleeding traffic to a dead region.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Retry inside a breaker&lt;/strong&gt; double-counts failures. Decide which layer owns retry semantics.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Reusing a protobuf field number&lt;/strong&gt; silently breaks wire compatibility. Use &lt;code&gt;reserved&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Unbounded GraphQL queries&lt;/strong&gt; are a DoS primitive. Enforce depth limits and persisted queries in production.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;0-RTT TLS early data is replayable.&lt;/strong&gt; Never use it for state-changing requests.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;HTTP/2 with a self-signed cert&lt;/strong&gt; in Go needs &lt;code&gt;NextProtos = [&quot;h2&quot;, &quot;http/1.1&quot;]&lt;/code&gt; — otherwise the client silently falls back to HTTP/1.1.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;PMTUD black-holing.&lt;/strong&gt; A middlebox dropping ICMP &quot;fragmentation needed&quot; packets stalls any segment over the path MTU.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;Further reading&lt;/h3&gt;
&lt;p&gt;In order of depth-per-hour:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;&lt;a href=&quot;https://github.com/donnemartin/system-design-primer&quot;&gt;System Design Primer&lt;/a&gt;&lt;/strong&gt; — a curated reading list masquerading as a README. Best starting point.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Designing Data-Intensive Applications&lt;/strong&gt; (Kleppmann). Chapters 5, 6, and 7 on replication, partitioning, and transactions. The implicit syllabus of most system-design rounds.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;RFC 9110 (HTTP semantics)&lt;/strong&gt; and &lt;strong&gt;RFC 9114 (HTTP/3)&lt;/strong&gt; — readable, surprisingly short.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;&lt;a href=&quot;https://aws.amazon.com/builders-library/&quot;&gt;AWS Builders&apos; Library&lt;/a&gt;&lt;/strong&gt; — essays on the patterns in §5 and §6, written at the scale they were invented for.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;&lt;a href=&quot;http://highscalability.com/&quot;&gt;highscalability.com&lt;/a&gt;&lt;/strong&gt; — post-mortems and architecture profiles from companies in production.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;&lt;a href=&quot;https://blog.bytebytego.com/&quot;&gt;ByteByteGo&apos;s newsletter&lt;/a&gt;&lt;/strong&gt; — weekly, diagram-heavy, short enough to read on a commute.&lt;/li&gt;
&lt;/ol&gt;
&lt;h3&gt;Wrapping up&lt;/h3&gt;
&lt;p&gt;A working mental model of networking is cumulative. You will not acquire it in one sitting — you will notice one week that a problem at the layer above is easier because you understood the one below. Use this post as the spine; fill the gaps with whatever production mystery you&apos;re chasing that week.&lt;/p&gt;
&lt;p&gt;Corrections and sharper phrasings are welcome. Open an issue on the blog&apos;s repo and I&apos;ll update with attribution.&lt;/p&gt;
</content:encoded></item><item><title>Two Sum — hash map in one pass</title><link>https://fuwari.vercel.app/posts/daily-two-sum/</link><guid isPermaLink="true">https://fuwari.vercel.app/posts/daily-two-sum/</guid><description>One pass through the array, remember complements in a map. O(n) time, O(n) space.</description><pubDate>Mon, 20 Apr 2026 02:15:00 GMT</pubDate><content:encoded>&lt;h2&gt;Intuition&lt;/h2&gt;
&lt;p&gt;Instead of checking every pair (O(n²)), remember what we&apos;ve seen. For each number &lt;code&gt;x&lt;/code&gt;, the &lt;em&gt;complement&lt;/em&gt; we need is &lt;code&gt;target - x&lt;/code&gt;. If we already saw it, we&apos;re done.&lt;/p&gt;
&lt;h2&gt;Go&lt;/h2&gt;
&lt;pre&gt;&lt;code&gt;func twoSum(nums []int, target int) []int {
    seen := make(map[int]int, len(nums))
    for i, x := range nums {
        if j, ok := seen[target-x]; ok {
            return []int{j, i}
        }
        seen[x] = i
    }
    return nil
}
&lt;/code&gt;&lt;/pre&gt;
&lt;h2&gt;Trap&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Don&apos;t put &lt;code&gt;seen[x] = i&lt;/code&gt; before the lookup. You&apos;d match &lt;code&gt;x + x = target&lt;/code&gt; against itself with the same index.&lt;/li&gt;
&lt;li&gt;Guaranteed exactly one solution — so no &quot;best pair&quot; logic needed.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;Complexity&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Time: &lt;strong&gt;O(n)&lt;/strong&gt; — one pass, map ops are amortized O(1)&lt;/li&gt;
&lt;li&gt;Space: &lt;strong&gt;O(n)&lt;/strong&gt; — worst case we store every element&lt;/li&gt;
&lt;/ul&gt;
</content:encoded></item><item><title>Designing a graph-based payroll engine</title><link>https://fuwari.vercel.app/posts/payroll-dag-engine/</link><guid isPermaLink="true">https://fuwari.vercel.app/posts/payroll-dag-engine/</guid><description>How we modeled interdependent salary components as a DAG and used Kahn&apos;s topological sort to parallelize calculation across 6,000 employees in 30 seconds.</description><pubDate>Sat, 18 Apr 2026 14:30:00 GMT</pubDate><content:encoded>&lt;h2&gt;The problem&lt;/h2&gt;
&lt;p&gt;Payroll looks simple until you meet a real enterprise ruleset. A single employee&apos;s pay is a pile of &lt;strong&gt;interdependent&lt;/strong&gt; line items:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;base = rank.base_salary * effort_ratio&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;overtime = base * 1.5 * overtime_hours&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;income_tax = (base + overtime + bonuses) * tax_bracket_fn(...)&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;net = base + overtime + bonuses - income_tax - social_insurance - ...&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Each component can reference any other. HR changes formulas constantly. Worse: some companies have &lt;strong&gt;custom&lt;/strong&gt; components — a &quot;Tet bonus&quot; formula that only exists for that client.&lt;/p&gt;
&lt;p&gt;Hard-coding the calculation order meant every new rule was a code deploy. Lead time for a new payroll component was weeks.&lt;/p&gt;
&lt;h2&gt;The insight&lt;/h2&gt;
&lt;p&gt;&lt;strong&gt;These formulas form a directed acyclic graph.&lt;/strong&gt; A component is a node. &quot;Uses the value of X&quot; is an edge from X to the current component. The order you must compute them in is a &lt;strong&gt;topological sort&lt;/strong&gt; of that graph.&lt;/p&gt;
&lt;p&gt;Once you see it that way, two things fall out:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Configuration, not code.&lt;/strong&gt; Store the formula text + dependencies in the database. Users change rules via UI.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Parallelism for free.&lt;/strong&gt; Nodes at the same topological level have no dependencies between them — compute them concurrently.&lt;/li&gt;
&lt;/ol&gt;
&lt;h2&gt;Kahn&apos;s algorithm&lt;/h2&gt;
&lt;p&gt;Kahn&apos;s is the classic O(V + E) topological sort. You keep a queue of nodes with no remaining incoming edges:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;type Node struct {
    ID       string
    Formula  string          // &quot;base * 1.5 * overtime_hours&quot;
    DependsOn []string
}

func topoLevels(nodes []*Node) [][]*Node {
    inDeg := make(map[string]int, len(nodes))
    byID  := make(map[string]*Node, len(nodes))
    children := make(map[string][]*Node, len(nodes))

    for _, n := range nodes {
        byID[n.ID] = n
        inDeg[n.ID] = len(n.DependsOn)
        for _, dep := range n.DependsOn {
            children[dep] = append(children[dep], n)
        }
    }

    var levels [][]*Node
    ready := make([]*Node, 0)
    for _, n := range nodes {
        if inDeg[n.ID] == 0 {
            ready = append(ready, n)
        }
    }

    for len(ready) &amp;gt; 0 {
        level := ready
        levels = append(levels, level)
        ready = nil
        for _, n := range level {
            for _, c := range children[n.ID] {
                inDeg[c.ID]--
                if inDeg[c.ID] == 0 {
                    ready = append(ready, c)
                }
            }
        }
    }
    return levels
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;If the graph has a cycle (&lt;code&gt;A&lt;/code&gt; depends on &lt;code&gt;B&lt;/code&gt; which depends on &lt;code&gt;A&lt;/code&gt;), some nodes will never hit in-degree 0 — detect it by comparing total processed nodes to the input length. We reject the ruleset at save time rather than at compute time.&lt;/p&gt;
&lt;h2&gt;Parallel calculation per level&lt;/h2&gt;
&lt;p&gt;The critical trick: &lt;strong&gt;within a level, nodes are independent.&lt;/strong&gt; For 6,000 employees × ~30 components each, the naive sequential version took ~5 minutes. Level-parallel with &lt;code&gt;sync.WaitGroup&lt;/code&gt; and a bounded goroutine pool collapsed it to ~30 seconds.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;func computeEmployee(emp *Employee, levels [][]*Node) (map[string]float64, error) {
    vals := make(map[string]float64)
    var mu sync.Mutex

    for _, level := range levels {
        var wg sync.WaitGroup
        errs := make(chan error, len(level))
        for _, node := range level {
            wg.Add(1)
            go func(n *Node) {
                defer wg.Done()
                v, err := evaluateFormula(n.Formula, emp, vals)
                if err != nil {
                    errs &amp;lt;- fmt.Errorf(&quot;%s: %w&quot;, n.ID, err)
                    return
                }
                mu.Lock()
                vals[n.ID] = v
                mu.Unlock()
            }(node)
        }
        wg.Wait()
        close(errs)
        for err := range errs {
            if err != nil {
                return nil, err
            }
        }
    }
    return vals, nil
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;In production we cap the pool — unlimited goroutines spawn thousands for a big company, thrashing the CPU and killing the formula interpreter cache.&lt;/p&gt;
&lt;h2&gt;What I&apos;d do differently&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Cache compiled formulas.&lt;/strong&gt; Parsing the expression per employee per component is 80% of the time. The DAG structure is shared across the whole company.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Structural sharing for the &lt;code&gt;vals&lt;/code&gt; map.&lt;/strong&gt; For identical employees (same rank, same contract type), most components have identical values — don&apos;t recompute.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Persistent levels.&lt;/strong&gt; The level structure rarely changes. Recompute only on ruleset update.&lt;/li&gt;
&lt;/ul&gt;
&lt;hr /&gt;
&lt;p&gt;The whole thing is ~800 lines of Go. Business users have added 40+ custom components in the last year. Zero deploys.&lt;/p&gt;
</content:encoded></item></channel></rss>