Do I need a load balancer with Cloud Run?

Usually not. Cloud Run has built-in load balancing and SSL termination. An external Cloud Load Balancer on top means paying twice for the same thing.

How do I reduce Vertex AI costs?

Switch to a cheaper model variant, cache responses for repeated queries, and validate inputs before sending them to the API.

Does right-sizing Cloud Run affect performance?

Not if you add a startup probe for fast cold starts and reduce minimum instances to match actual traffic.

How often should I audit my cloud bill?

Monthly at minimum. Cloud costs drift silently. A quick monthly review catches waste before it compounds.

We Were Overpaying GCP by 57%. Here's Every Fix.

Q: How much can you realistically save on GCP?

Most teams have at least 20-30% of unnecessary spend in default configurations, idle resources, and over-provisioned services. We hit 57% because we had never done a proper audit before.

We Were Bleeding Money and Didn't Know It

Here's the thing about cloud bills: they go up slowly. You don't notice it month to month because no single service spikes. It just... creeps. And before you know it, you're paying significantly more than last quarter for the same product serving the same users.

That was us. We're a small team running QuerySafe on Google Cloud Platform. We expected our bill to grow with users. Instead, it was growing on its own. Same traffic, higher costs. Every month, a little more.

So we blocked a day, opened the billing console, and went line by line through every single service. What we found was embarrassing. A load balancer we didn't need. Containers running 24/7 for a product that gets traffic in bursts. AI calls for empty messages. Temp files nobody was cleaning up. VMs left on from a task three weeks ago.

We fixed all of it. Total savings: 57%. Here's the breakdown.

The Billing Breakdown

Service	% Change
Networking	↓ 45%
Cloud SQL	↓ 4%
Artifact Registry	↓ 55%
Vertex AI	↓ 77%
Cloud Run	↓ 97%
Cloud Storage	↓ 61%
Compute Engine	↓ 54%
Cloud Build	↓ 33%

Screenshot from our actual GCP billing console. Every optimization described below is reflected in these numbers.

GCP Billing Console showing 57% cost reduction across all services

Our GCP billing dashboard after optimization

Networking: The Hidden Tax ↓ 45%

Networking was our single largest line item. More expensive than our database. More expensive than AI inference. For an app with modest traffic, that didn't make sense.

We dug in and found a Cloud Load Balancer that had been provisioned during initial setup. It was charging per-hour forwarding rules plus per-gigabyte processed. But Cloud Run already handles load balancing and SSL termination on its own. The external load balancer was doing work that was already being done natively.

We removed it, pointed our custom domain through Cloud Run's built-in domain mapping, and networking dropped by 45%. No performance impact. No downtime during the switch.

Default setups from tutorials and quickstart guides often include components you don't actually need. Worth questioning every piece in the chain.

Cloud Run: Right-Sizing the Runtime ↓ 97%

97% sounds dramatic, but the fix was simple. Our containers were over-provisioned: multiple always-on instances, more memory than needed, CPU allocated even when nothing was happening.

We made three changes:

Reduced minimum instances to what the traffic actually required
Added a startup probe so cold starts were fast enough to not need warm instances sitting idle
Switched to CPU-only-during-requests billing, meaning we stopped paying for idle compute between requests

Cloud Run is supposed to be pay-per-request, but the default settings lean toward always-on. If you don't actively configure it, you end up paying for compute that's sitting there doing nothing.

Vertex AI: Smarter Model Selection ↓ 77%

AI inference was our third biggest expense. Three things helped here:

Switched to a cheaper model variant. Same quality output, lower cost per token
Added response caching for repeated queries. A lot of chatbot questions are near-identical, and regenerating the same answer every time wastes tokens
Validated inputs before sending them to the model. Empty messages, duplicate requests, and malformed queries were hitting the API and burning credits for no reason

AI costs scale with how disciplined you are about what hits the model. Every unnecessary token is money out the door.

Cloud Storage: Clean Up After Yourself ↓ 61%

Training a chatbot generates temporary files: extracted text, intermediate processing artifacts, PDF conversions. These were piling up. Nobody was deleting them because nobody was looking.

We set up lifecycle policies to auto-delete temp files after a few days, cleaned out leftover training data from chatbots that had been deleted, and made the pipeline clean up after itself every run.

Storage is cheap. But "cheap times forever" isn't cheap anymore.

Compute Engine: Stop Paying for Idle VMs ↓ 54%

We had VMs that were spun up for specific jobs but nobody turned them off after. Background processing, builds, maintenance scripts. Stuff that ran for a few minutes but the VM stayed alive for hours.

We moved everything possible to serverless. Cloud Run, Cloud Functions, Cloud Build jobs that spin up, do the work, and shut down. If something doesn't need to be running 24/7, it shouldn't be.

The Compound Effect

None of this was groundbreaking on its own. Removing a load balancer, right-sizing containers, caching AI responses, deleting temp files, shutting down idle VMs. Basic operational hygiene. But stacked together, they cut total spend by 57%.

Same traffic. Same performance. Same uptime. We just stopped paying for things we didn't need.

Five Takeaways

Look at your bill every month. Not quarterly. Monthly. Costs creep up when nobody's watching.
Don't trust defaults. Quickstart guides are built for simplicity, not for your actual workload. Your production setup should look different from the tutorial.
Right-size everything. If you're provisioning more than you're using, you're paying for air.
Use what the platform gives you. If Cloud Run already does load balancing and SSL, don't bolt on another service that does the same thing.
Clean up after yourself. Set lifecycle policies. If your pipeline creates temp files, your pipeline should delete temp files.

Frequently Asked Questions

It depends on how much waste exists in your current setup. Most teams we've talked to have at least 20-30% of unnecessary spend sitting in default configurations, idle resources, and over-provisioned services. We hit 57% because we'd never done a proper audit before.

Usually not. Cloud Run has built-in load balancing and SSL termination. If you set up an external Cloud Load Balancer on top of it, you're paying twice for the same thing. There are edge cases where you need one (like custom WAF rules or multi-region routing), but most standard deployments don't.

Three things worked for us: switch to a cheaper model variant if quality holds up, cache responses for repeated or similar queries, and validate inputs before they hit the API so you're not burning tokens on empty or malformed requests.

Not if you do it properly. The key is adding a startup probe so cold starts are fast, then reducing minimum instances to match actual traffic. We saw no difference in response times after the change.

Monthly, at minimum. Cloud costs drift. New services get added, old ones stay running, traffic patterns change. A quick monthly review catches waste before it compounds. Set a calendar reminder.

Build on infrastructure that scales efficiently.

Get Started Free