From Firebase to Cloud Run: How We Cut Infrastructure Costs by 60%

Where We Started

QuerySafe launched on Firebase. It was the natural choice for a fast-moving startup: managed hosting, serverless functions, a real-time database, and authentication out of the box. We shipped our first version in weeks instead of months.

But as the platform grew from a prototype to a production system handling enterprise workloads, the cracks started showing. Firebase is excellent for getting started. It is less excellent when your application needs to run ML models, process large documents, and serve low-latency AI responses at scale.

Why We Migrated

Four problems pushed us toward a migration:

1. Cold Start Latency

Firebase Cloud Functions spin down when idle. When a user sent their first message to a chatbot, the function had to cold start, loading the runtime, importing dependencies, initializing connections. This added 3 to 5 seconds of latency before the user saw any response. For a conversational AI product, that delay is unacceptable.

2. Unpredictable Firestore Costs

Firestore charges per document read and write. Every chatbot conversation generates reads for the chatbot config, the knowledge base metadata, conversation history, and user profile. A single chat message could trigger 8 to 12 document reads. At moderate scale, these reads added up fast, and the monthly bill became unpredictable.

3. No Native ML Support

QuerySafe's training pipeline needs to load a SentenceTransformer embedding model, build FAISS vector indices, and process large PDF documents with vision APIs. Cloud Functions have a 540-second timeout, limited memory, and no persistent file system. We were fighting the platform instead of building features.

4. Limited Control

As we added features like document processing, vision-based PDF extraction, and URL crawling, we needed control over system dependencies, file storage, and background processing. Firebase's serverless model, which was an advantage early on, became a constraint.

The New Architecture

We migrated to a containerized architecture on Google Cloud, staying within the GCP ecosystem but moving to services designed for our workload:

Component	Before (Firebase)	After (Cloud Run)
Compute	Cloud Functions (Node.js)	Cloud Run (Django, containerized)
Database	Firestore (NoSQL)	Cloud SQL PostgreSQL
File Storage	Firebase Storage	Cloud Storage + local container volumes
Vector Search	External service	FAISS (in-container)
Authentication	Firebase Auth	Django sessions + OTP
Hosting	Firebase Hosting	Firebase Hosting (unchanged for static site)

Cloud Run is the centerpiece. It runs our Django application in a Docker container with full control over dependencies, file system access, and resource allocation. It auto-scales to zero when idle and scales up under load, just like Cloud Functions, but without the cold start penalty and with much higher resource limits.

The Cost Impact

The migration reduced our monthly infrastructure spend by approximately 60%. Here is the breakdown at comparable scale:

Cost Category	Firebase Stack	Cloud Run Stack
Compute	$350–500/mo	$80–150/mo
Database	$300–450/mo	$70–120/mo
Storage & Networking	$100–200/mo	$50–80/mo
Vector DB	$100–150/mo	$0 (FAISS in-container)
Total	$850–1,300/mo	$200–350/mo

The biggest single saving came from replacing Firestore with Cloud SQL. Firestore's per-read pricing model is expensive for read-heavy workloads like chatbot conversations. Cloud SQL charges a flat monthly rate for the instance, regardless of query volume. At our scale, this alone cut database costs by over 70%.

The second major saving was eliminating the external vector database. By running FAISS directly inside our Cloud Run container, we removed a $100 to $150 monthly line item entirely. FAISS is fast, free, and handles our index sizes without issue.

Performance Improvements

Cost was not the only motivation. Performance improved across every metric:

Cold start time dropped from 3–5 seconds to under 1 second. Cloud Run's minimum instances feature keeps a warm container ready. Users no longer wait for a function to spin up before getting a response.

Chatbot response latency improved measurably. The embedding model loads once when the container starts and stays in memory. On Firebase, every function invocation loaded the model from scratch. In-container FAISS queries are sub-millisecond, compared to network round-trips to an external vector database.

Training pipeline throughput increased. Cloud Run containers can run for up to 60 minutes with generous memory and CPU allocation. Document processing, vision API calls, embedding generation, and FAISS indexing all happen in a single container with access to local disk. On Cloud Functions, we were constantly working around timeout and memory limits.

Lessons Learned

Every migration teaches you something. Here are the lessons worth sharing:

Schema design matters more than you think. Moving from Firestore's document model to PostgreSQL required careful schema design. Firestore encourages denormalized, nested data. PostgreSQL rewards normalized, relational thinking. We spent more time on data modeling than on any other part of the migration.

Cloud Run's min-instances is worth the cost. Setting minimum instances to 1 for our primary service costs a few dollars per month but eliminates cold starts entirely for the first user request. For a product where latency matters, this is non-negotiable.

Keep your static site on Firebase Hosting. We did not migrate everything. Firebase Hosting is excellent for static sites: global CDN, automatic SSL, clean URLs, fast deploys. Our public marketing site still runs on Firebase Hosting. Migrate what needs migrating, keep what works.

Container-based deployment gives you freedom. With Cloud Run, we control every dependency. Need a specific version of PyMuPDF for PDF processing? Add it to the Dockerfile. Need ffmpeg for future audio processing? One line in the build. On Cloud Functions, every dependency had to fit within the platform's constraints.

Was It Worth It?

Absolutely. The migration took focused effort, but the results speak for themselves: 60% lower infrastructure costs, faster response times, a training pipeline that can handle enterprise-scale document processing, and a platform architecture that does not fight us when we build new features.

If you are building an AI application on Firebase and starting to feel the constraints, the path to Cloud Run is well-trodden. The GCP ecosystem makes it possible to migrate incrementally, service by service, without a big-bang rewrite.

QuerySafe is built on infrastructure designed to scale efficiently.

Try It Free