Achieving a 3ms TTFB: Edge Caching & Core Web Vitals (2026)
Up-to-date Feed
View AllHome / Blog / Achieving a 3ms TTFB: Edge Caching & Core Web Vitals (2026)
Achieving a 3ms TTFB: Edge Caching & Core Web Vitals (2026)
How to migrate API logic to Edge networks and leverage SWR to achieve near-instantaneous server response times.
Published April 24, 2026 · Last updated June 14, 2026 · By Abu Sufyan, Full-Stack Systems Engineer
Quick Answer
To achieve a 3ms Time to First Byte (TTFB), you must completely bypass origin server computations and database lookups during the initial request. This is accomplished by deploying your frontend to a global Edge CDN, utilizing a strict stale-while-revalidate (SWR) caching directive to serve from RAM, and optimizing dynamic Brotli compression to Level 4 instead of Level 11 to avoid CPU bottlenecks.
👉 Try the IP & Geo Lookup Tool free → — Test your local network routing and latency directly from your browser.
Why This Happens (In-Depth Analysis)
Three months ago, I was looking at our Web Vitals dashboard and noticed our Next.js App Router API was occasionally serving users a brutal 400ms Time to First Byte (TTFB). We were running on Vercel's global Edge network. We were supposed to be getting 10ms responses. So what was going wrong?
After attaching DataDog tracers to our edge functions, I realized we had fundamentally misunderstood how Cache-Control headers behave during V8 isolate cold starts. We were blindly trusting the defaults.
Time to First Byte (TTFB) is the exact time it takes for a user's browser to receive the very first packet of data from your server. In 2026, if you can't hit a TTFB under 50ms, Google's Helpful Content system will silently demote your Core Web Vitals scores. The reality is that physics plays a huge role here. The speed of light in a fiber-optic cable is roughly 200,000 km/s. If a user in Singapore requests a server located in Virginia, USA (roughly 15,000 km), the physical round-trip time (RTT) is 150ms. The only way to bypass this physical limit is to replicate content to edge servers located physically close to the user.
But proximity isn't enough. Here are the raw architectural bottlenecks we discovered:
- Brotli Level 11 CPU Spikes: I had set our compression to the maximum level (11) thinking it would speed up the network transfer. However, Brotli Level 11 requires immense CPU cycles to build the compression tree. For dynamic edge responses, this added 150ms of blocking computation before the first byte was sent.
- Cold Start Latency: Relying on on-demand serverless functions meant that when traffic spiked, new V8 isolates had to boot up, parse the JavaScript bundles, and execute. This routinely added 300ms to responses.
- Synchronous Revalidation: We were using a standard
max-agecache. When the cache expired, the very next user would be forced to wait for the entire origin server round-trip before getting a response.
How to Fix It (Step-by-Step Tutorial)
Fixing a 400ms TTFB requires moving the computation out of the critical request path. Here is how we engineered our 3ms response times.
1. Implement Stale-While-Revalidate (SWR) Caching
To serve a page in 3ms, the Edge Node must never wait for backend computations, database queries, or serverless cold starts. The content must reside directly in the edge node's local Random Access Memory (RAM).
We achieved this by forcing aggressive Incremental Static Regeneration (ISR) using stale-while-revalidate:
Cache-Control: public, max-age=0, s-maxage=31536000, stale-while-revalidate=86400
public, max-age=0: Tells the user's browser not to cache the HTML file locally. If we update the site layout, we need them to request the fresh structure immediately, not load a stale local file.s-maxage=31536000: Tells the Edge CDN to hold the pre-rendered HTML in its cache for a full year.stale-while-revalidate=86400: When a user requests a page older than 24 hours, the Edge node still returns the stale cached page instantly (yielding our 3ms TTFB). In the background, it triggers a non-blocking process to rebuild the page with fresh data.
2. Tune Brotli Compression Levels for Edge Execution
Once the Edge node retrieves the HTML from RAM, it must compress it before transmitting it over the TCP wire. Brotli outperforms Gzip because it uses a pre-compiled dictionary of 13,000 common HTML/CSS syntax words.
However, you must separate static build-time compression from dynamic edge compression:
// next.config.js
module.exports = {
// Use Vercel's optimized defaults for dynamic routes (Brotli Level 4)
compress: true,
// Custom headers to ensure origin doesn't double-compress
async headers() {
return [
{
source: '/api/:path*',
headers: [
{ key: 'Cache-Control', value: 'public, s-maxage=86400, stale-while-revalidate=43200' }
],
},
]
}
}
- Static Assets: Pre-compress your CSS, JS, and Fonts at build time using Brotli Level 11. The CPU cost is paid once during CI/CD.
- Dynamic Responses: Let the Edge server compress API JSON or dynamic HTML on the fly using Brotli Level 4. It executes with sub-millisecond CPU overhead.
3. Eliminate Main-Thread Blocks
An instantaneous TTFB is useless if your page is packed with heavy, blocking JavaScript files that lock up the browser's main rendering thread. Move all third-party analytics to non-blocking background execution using asynchronous strategies.
import Script from 'next/script'
export default function RootLayout({ children }) {
return (
<html>
<body>
{children}
<Script
id="gtag-init"
src="https://www.googletagmanager.com/gtag/js?id=G-123"
strategy="lazyOnload" // Postpones loading until the page is fully interactive
/>
</body>
</html>
)
}
Faster way: use Ping & Latency Testing Tools
Measuring TTFB correctly is vital. Stop using Chrome DevTools to measure network latency—Chrome includes DNS resolution and layout render blocks in its TTFB metric. I had to use raw curl commands to see the actual server response times. You can use our IP Address & Geo Lookup to quickly identify where you are testing from and ensure your edge routing is behaving as expected without writing complex terminal commands.
Edge Cases Most Guides Miss
The TCP Head-of-Line Blocking Problem Even with a 3ms server response, users on poor networks may experience delays due to TCP packet loss. TCP requires packets to arrive in order. If packet #2 drops, packet #3 cannot be processed until #2 is retransmitted. To solve this, ensure your CDN is utilizing HTTP/3 (QUIC), which runs over UDP and allows independent data streams to process out of order.
Cache Header Conflicts
If you have multiple CDN layers (e.g., Cloudflare in front of Vercel), your outer CDN might strip or override the stale-while-revalidate directive. Always verify your HTTP response headers in production to ensure the inner Cache-Control rules propagate fully to the user.
Vary Headers Destroying Cache Hit Ratios
Be incredibly careful with the Vary header. If your server sends Vary: User-Agent, your CDN will create a separate cache entry for every single browser version that visits your site, plunging your cache hit ratio to near zero. Only vary by necessary parameters like Accept-Encoding.
Comprehensive FAQ
What is the physical limit of web response times?
The physical limit is dictated by the speed of light. In a fiber-optic cable, light travels roughly 200,000 km/s. If a user in Singapore requests a server in Virginia (15,000 km away), the absolute minimum round-trip time is 150ms. Replicating content to global edge servers is the only way to bypass this limitation.
How does stale-while-revalidate prevent caching delays?
The Stale-While-Revalidate (SWR) cache strategy returns existing cached content instantly to the user (achieving that 3ms TTFB) while simultaneously triggering a non-blocking background network request to fetch and rebuild the data. The next user will receive the freshly built content.
Why does Brotli compress web assets better than Gzip?
Brotli utilizes a modern compression algorithm paired with a built-in static dictionary containing over 13,000 common words, phrases, and standard HTML/CSS/JS syntax structures. Gzip, on the other hand, must build its compression dictionary from scratch for every single file it processes.
Does a fast TTFB guarantee good Core Web Vitals?
No. TTFB is a foundational metric, but if your page contains heavy JavaScript payloads, unoptimized images, or render-blocking CSS, your Largest Contentful Paint (LCP) and Cumulative Layout Shift (CLS) scores will still suffer. A 3ms TTFB simply guarantees the browser can start parsing your code immediately.
About the Author
Abu Sufyan — Full-stack developer and infrastructure engineer with expertise in edge computing, Vercel/Next.js performance optimization, and distributed systems. Founder of WebToolkit Pro. GitHub
Related tools:
- IP Address & Geo Lookup — Check your routing paths and geographical IP data instantly.
- JSON Validator & Formatter — Format and validate your JSON API responses locally in your browser.
{
"@context": "https://schema.org",
"@type": "Article",
"headline": "Achieving a 3ms TTFB: Edge Caching & Core Web Vitals (2026)",
"description": "Learn how we achieved a 3ms Time to First Byte (TTFB) by optimizing Vercel Edge caching, Stale-While-Revalidate protocols, and Brotli compression levels.",
"datePublished": "2026-04-24",
"dateModified": "2026-06-14",
"author": {
"@type": "Person",
"name": "Abu Sufyan",
"url": "https://github.com/abusufyan-netizen"
},
"publisher": {
"@type": "Organization",
"name": "WebToolkit Pro",
"url": "https://wtkpro.site"
},
"mainEntityOfPage": {
"@type": "WebPage",
"@id": "https://wtkpro.site/blog/3ms-ttfb-performance-study/"
}
}
{
"@context": "https://schema.org",
"@type": "FAQPage",
"mainEntity": [
{
"@type": "Question",
"name": "What is the physical limit of web response times?",
"acceptedAnswer": {
"@type": "Answer",
"text": "The physical limit is dictated by the speed of light. In a fiber-optic cable, light travels roughly 200,000 km/s. If a user in Singapore requests a server in Virginia (15,000 km away), the absolute minimum round-trip time is 150ms. Replicating content to global edge servers is the only way to bypass this limitation."
}
},
{
"@type": "Question",
"name": "How does stale-while-revalidate prevent caching delays?",
"acceptedAnswer": {
"@type": "Answer",
"text": "The Stale-While-Revalidate (SWR) cache strategy returns existing cached content instantly to the user (achieving that 3ms TTFB) while simultaneously triggering a non-blocking background network request to fetch and rebuild the data. The next user will receive the freshly built content."
}
},
{
"@type": "Question",
"name": "Why does Brotli compress web assets better than Gzip?",
"acceptedAnswer": {
"@type": "Answer",
"text": "Brotli utilizes a modern compression algorithm paired with a built-in static dictionary containing over 13,000 common words, phrases, and standard HTML/CSS/JS syntax structures. Gzip, on the other hand, must build its compression dictionary from scratch for every single file it processes."
}
},
{
"@type": "Question",
"name": "Does a fast TTFB guarantee good Core Web Vitals?",
"acceptedAnswer": {
"@type": "Answer",
"text": "No. TTFB is a foundational metric, but if your page contains heavy JavaScript payloads, unoptimized images, or render-blocking CSS, your Largest Contentful Paint (LCP) and Cumulative Layout Shift (CLS) scores will still suffer. A 3ms TTFB simply guarantees the browser can start parsing your code immediately."
}
}
]
}
API Latency Calculator
Calculate real-world API response time budgets and TTFB targets.
wtkpro.site
WebToolkit Pro Team
Contributing Author