Skip to content

TECH PARTNERS

Spicy Bytes with Fastly: Hot Takes on Cloud Costs, AI Scaling, and Why People Still Matter

June 23, 2026

Elton Carneiro

If you've ever stared at a cloud bill and wondered where that extra zero came from, you're not alone. At Wasabi, we hear it constantly: organizations that can trace the problem back to architecture decisions made before anyone fully understood the cost implications. Some of them can trace it back to a hyperscaler that had every incentive to keep it that way.

I was recently joined by Jesse von Doom, Director of Product Management at Fastly, for the latest episode of Spicy Bytes, Wasabi TV's riff on Hot Ones. Filmed at NESN Studio in Fenway Park and hosted by Shannon Lynch, Sr. Industry Marketing Manager at Wasabi, the show puts industry guests through seven questions and seven increasingly hot chicken wings. Between bites, Jesse and I covered AI infrastructure, cloud economics, and the scaling mistakes that can catch even experienced teams off guard.

Here are some highlights.

Why are cloud costs so hard to control?

When Shannon asked who's to blame for insane cloud bills, Jesse and I agreed: hyperscalers, with architecture complexity as the reason most organizations never push back. To be fair, large enterprises often need that complexity to achieve the scale they're operating at. But for many organizations, it's just extra layers of management that tend to be expensive when you get it wrong.

In cloud architecture, the origin is where your data lives, such as a central server or storage platform. The edge is a distributed network of servers located close to end users worldwide. Getting those two layers to work together efficiently requires architectural expertise most teams don't have in house. Hyperscalers know that, and in my experience, that complexity isn't accidental. Mistakes cost money. Hyperscalers benefit from those mistakes.

One cost category most organizations don't plan for: egress, the fees most cloud providers charge when data is retrieved from storage or transferred out of their environment. Most organizations budget for ingestion and underestimate egress entirely. At AI scale, that gap becomes significant. As I like to say, petabytes are the new gigabytes.

Wasabi removes egress costs entirely on the storage side with no egress or API fees, regardless of how much data moves. Fastly extends that cost advantage at the delivery layer by reducing how often data needs to move at all. When a Fastly edge node receives a request for content stored in Wasabi, it fetches the object once and caches a copy at an edge node close to the end user. Every subsequent request is served from that cache without going back to Wasabi. Fewer round trips mean less data movement, faster response times, and a better experience for the end user.

There's another cost driver that's harder to see on the bill: reaching for AI to optimize infrastructure before the underlying architecture is sound. Organizations that skip straight to AI-assisted scaling get surprised when it breaks or costs more than expected under real load. My recommended framework is simple: define a process, make sure it works well, make sure it works at scale, then figure out how to optimize it using AI.

Wasabi + Fastly

Affordable storage and edge delivery, built to work together. See the full integration.

Read the solution brief

Where do edge and origin fit in AI workloads?

Getting the cost equation right means getting the architecture right first. And in AI infrastructure, that starts with understanding where different workloads actually belong.

AI workloads aren't one-size-fits-all, and where processing happens matters. Model training, large-scale inference, and compute-intensive workloads require the raw processing power of centralized data centers. Edge nodes are optimized for speed and proximity, not heavy compute, but that's exactly what makes them valuable for a different class of AI workload: low-latency inference, real-time user interactions, chatbot responses, and model routing.

One mistake I see organizations make is treating AI as the scaling mechanism for all of it. Routing everything through an LLM or pushing all processing to the edge creates cost and latency problems rather than solving them. What users want is fast, responsive interaction. As Jesse put it: "A lot of the stuff we're seeing for scaling of AI, I think people are going too far in on AI as the actual scaling factor. It turns out what people want is a lot of real-time interaction, and the AI stuff can kind of happen in the background." With many LLM processes taking multiple seconds, the benefit of shaving off even 100 milliseconds by moving to the edge just isn’t worth the effort or extra cost.

Jesse's framing resonated with me: “just enough AI.” The original instinct (to place LLM processing closer to users, and run everything through a model) turns out to be tricky in practice. Caching, personalization, and smart routing at the edge can do more work than most organizations realize, without a live LLM call every time.

Fastly's AI Accelerator is built around that principle. When AI processes a query, it translates the concept into a numerical vector. This is called semantic caching. The AI Accelerator identifies caching opportunities within that process, reducing how often a live LLM call is required without degrading the user experience. For organizations over-centralizing into GPU-heavy infrastructure without a delivery strategy, that's a cost that compounds in ways that are difficult to unwind.

Why do people still matter in an AI-driven world?

That GPU dependency has another cost that doesn't show up on the infrastructure bill. Jesse and I kept coming back to the same question: if organizations are making huge bets on hardware, what are they giving up to do it?

The answer, more often than not, is people. Organizations shedding engineers to fund GPU capacity are making a specific bet: that hardware will outperform the domain expertise those people carry. Jesse was direct. "What we spend on talent is significantly higher than what we spend on computer equipment. […] I would bet on our people every single time."

It's also a bet with an expiration date. GPUs are the dominant compute platform in AI today, but this is the first generation of the technology. Alternatives (tensor processing units, inference-optimized CPUs) are already in development. The people who understand your business, your workflows, and your customers will still be relevant when the hardware landscape looks completely different.

I'd add an operational dimension to that. You can use AI to optimize a workflow. You can't use it to fix one that was never well designed. Knowing the difference still requires a person, someone with enough context to know whether what AI is producing is actually good.

AI is a tool. People use the tools. Great engineering teams using AI are what really excite me. But how we get there matters. We get there through creativity, engineering expertise, and human problem-solving."

Jesse von Doom, Director of Product Management, Fastly

What do Wasabi and Fastly do together?

The through-line across this conversation was consistent: the organizations getting AI infrastructure right aren't the ones spending the most on hardware. They're the ones doing the architectural thinking first and keeping the people who know the difference.

My take on where this is all heading: There has never been a faster path from idea to product. AI can help organizations optimize their operations in ways that weren't possible even a few years ago. But none of that works without sound architecture underneath it, and without the people who understand the business well enough to know whether what AI is producing is actually good.

That's where Wasabi and Fastly come in. Wasabi provides affordable cloud storage with no egress fees, removing one of the most unpredictable cost variables in AI infrastructure. Fastly sits in front of it as the delivery layer, reducing redundant origin traffic and putting content closer to users. Store it affordably, move it efficiently, and deliver it fast.

See the full Spicy Bytes episode

Learn more about the Wasabi + Fastly partnership, get more of Jesse and Elton's take on hyperscalers, and find out which parts of today's AI infrastructure they think won't age well.

Watch now

Egress is the cost of data leaving a storage system, moving from your origin to a delivery layer or from your infrastructure to end users. Most cloud providers charge for it. At AI scale, those charges compound quickly. Wasabi charges no egress fees, removing one of the more unpredictable variables in AI infrastructure costs.

The origin is where your data or application lives, like a central server or storage platform. The edge is a distributed network of servers close to end users. Serving content from the edge reduces latency and origin load. Fastly operates the edge layer; Wasabi operates as the origin.

Fastly's AI Accelerator optimizes AI workloads at the edge by identifying caching opportunities in how AI processes queries. This reduces how often a live LLM call is required, improving response times and lowering compute costs without degrading the user experience.

Jesse von Doom's framing for disciplined AI scaling: use AI where it adds clear value, and handle everything else through proven infrastructure. The goal is fast, responsive user experience. The AI work happens in the background where it's needed, not everywhere by default.

Related article

data protection
TECH PARTNERSSpicy Bytes with Synology: Hot wings, hotter takes on data protection

Most Recent

What’s causing healthcare’s cloud storage budget crisis: 2026 Wasabi Global Cloud Storage Index

New research from 171 healthcare IT leaders reveals that fees account for 49% of the average cloud storage bill. Here's what the 2026 Wasabi Global Cloud Storage Index found, and what it means for your organization.

Storage economics and cyber risk: When hidden fees undermine resilience

Hidden cloud storage fees can change how often your team tests recovery. Learn how fee structures create a measurable gap in cyber resilience and what predictable storage economics look like.

Cyber resilience for AI environments: What MSPs need to know

Most MSP backup frameworks weren't built to protect AI data. Learn how to close the gap, build a credible AI resilience practice, and win the governance conversation your competitors aren't equipped to have.

SUBSCRIBE

Storage Insights from the Storage Experts

Storage insights sent direct to your inbox.

Subscribe