Caching stores frequently accessed data in fast storage so future requests skip the slow source โ cutting latency and reducing DB load. A cache hit returns data directly; a cache miss fetches from the source, then populates the cache for subsequent requests.
Cache-aside (also called lazy loading) is the most common caching pattern. The application code owns the cache interaction โ the cache is never written to directly on a write, only on a read miss.
Read flow:
Why it works: Only data that is actually requested gets cached, so the cache stays lean. Cold-start is the only downside โ the first request always hits the DB.
When to use it: Most read-heavy workloads: user profiles, product pages, feed items. It pairs well with Redis as the cache layer and any relational or document DB as the source of truth.
Trade-offs: Stale data is possible between the DB write and the next cache miss. Explicit invalidation (delete the key on write) or short TTLs mitigate this.
How you handle writes determines whether the cache stays consistent with the database.
Write-through โ every write updates cache and DB synchronously before returning to the caller. Cache is always consistent. Writes are slower because both stores must acknowledge. Best when read-after-write consistency is critical (e.g., user settings, financial balances).
Write-back (write-behind) โ writes go to cache first; a background process flushes dirty entries to the DB asynchronously. Writes are fast, but a crash before the flush loses data. Best for high-write workloads where occasional loss is acceptable (metrics, counters, analytics).
Write-around โ writes go directly to the DB, bypassing the cache entirely. Cache is populated only on the next read miss. Prevents cache pollution for data that is written frequently but rarely re-read (bulk imports, audit logs). Increases read latency for the first access.
Rule of thumb: Write-through for correctness, write-back for throughput, write-around for write-heavy but read-cold data.
When the cache is full, something must be evicted. The policy determines which entry goes.
LRU (Least Recently Used) โ evicts the entry that was accessed longest ago. Optimal for workloads where recent access predicts future access (sessions, user feeds, API responses). Redis implements LRU natively.
LFU (Least Frequently Used) โ evicts the entry accessed the fewest times over its lifetime. Better than LRU when a burst of one-time reads would otherwise pollute the cache. Heavier bookkeeping overhead.
FIFO (First In, First Out) โ evicts the oldest-inserted entry regardless of access pattern. Simple but often wrong โ a frequently-accessed entry inserted early gets evicted before a cold entry inserted recently.
TTL (Time To Live) โ entries expire after a fixed duration regardless of access. Not strictly an eviction policy but works alongside LRU/LFU to bound staleness. Always set a TTL; an unbounded cache grows until it exhausts memory.
Sizing tip: Aim for a cache hit rate above 90% for read-heavy systems. If hit rate is low, the cache is too small or the key space is too diverse โ consider a consistent hashing ring across multiple cache nodes.
A Content Delivery Network (CDN) places cache nodes at the network edge, geographically close to users, so that static and cacheable dynamic content is served without a round-trip to the origin.
How it works:
https://cdn.example.com/app.js.Cache-Control headers tell CDNs (and browsers) how long to cache a response:
Cache-Control: public, max-age=31536000, immutable โ cache for one year, never revalidate. Use for content-hashed assets (app.a3f2bc.js).Cache-Control: no-cache โ must revalidate with origin before serving (uses ETag/Last-Modified). Good for HTML pages.Cache-Control: private โ browser may cache but CDN must not. Use for authenticated responses.Cache busting: Append a content hash to the filename (app.a3f2bc.js) so a new deploy changes the URL, forcing the CDN to fetch the new file. Do not change TTLs โ keep them long; change URLs instead.
Invalidation / purge: CDNs support API-driven cache purges for emergency rollouts. This is expensive at scale โ prefer hash-based URLs over purging.
A cache stampede (thundering herd) occurs when a hot key's TTL expires and thousands of concurrent requests all miss the cache simultaneously, flooding the DB.
Why it happens: Every request that finds the expired key independently decides to fetch from DB and repopulate. Under high concurrency this sends N parallel DB queries for the same data.
Mutex lock (probabilistic lock): When a cache miss occurs, the first request acquires a lock (e.g., SET key:lock NX EX 2). All other requests either wait or serve the slightly stale value while the lock holder fetches and repopulates. Prevents the thundering herd at the cost of a brief lock-wait.
Probabilistic early refresh: Before the key actually expires, a small fraction of requests voluntarily refresh it early. As the TTL approaches zero the probability of triggering a refresh rises. No lock needed; works well for non-critical content.
Request coalescing: The cache layer (or a proxy like Nginx/Varnish) collapses concurrent identical requests into a single upstream fetch and fans the response out to all waiting clients. Transparent to the application.
Never set all keys to the same TTL โ stagger TTLs with a small random jitter (e.g., base_ttl + rand(0, 60)) so keys don't expire in synchronised waves.