Most Software Scalability Advice Is Premature. Here’s What Actually Matters

    Matt Watson
    By Matt Watson · CEO of Full Scale, 4x Founder, Author of Product Driven
    Updated 13 min read
    software-scalability hero, Full Scale
    In this article

    Open almost any guide on software scalability and the first thing it tells you to do is break your app into microservices. I want to push back on that. The same tradeoffs show up in how to build a backend for your app without over-engineering it.

    I bootstrapped VinSolutions to $35 million in annual recurring revenue running on basically one big database server with a few read replicas. We sold it for $147 million. Later I built Stackify, where we eventually ran around 2,000 sharded SQL Server databases to handle the load. So I have lived on both ends of this, the simple end and the genuinely hard end.

    Here is what twenty years of being the person who actually scaled the systems taught me: most teams spend real money solving scalability problems they will never have.

    Software scalability is your system’s ability to handle more load, whether that is more users, more data, or more transactions, without slowing down or falling over. That is all it means. The good news is that for the vast majority of products, the cloud already solved the hard part. Get a handful of basic architecture patterns right, and you can scale by adding servers and get back to building your product.

    Keep it simple and boring until you can’t.

    That is almost the whole post. The rest is just what “the basics” actually are, where the one real hard decision lives, and why most of the scary scaling problems are a sign you have already won.

    What software scalability actually means

    Scalability comes in two flavors, and the difference shapes everything else. It also decides whether you need a backend rebuild or just a refactor.

    Vertical scaling means making one machine bigger by adding CPU, memory, or disk. It is the simplest move you can make, and it carries you much further than most engineers expect.

    Horizontal scaling means running more machines and spreading the work across them. You add web servers behind a load balancer, add database replicas, and add background workers that pull from a queue.

    Vertical scalingHorizontal scaling
    What it isMake one machine biggerRun more machines
    DifficultyEasy, almost no design workNeeds the right architecture up front
    CeilingLimited by the biggest box you can buyEffectively unlimited
    Best whenEarly on, or for a quick capacity bumpYou expect real, sustained growth

    Vertical scaling is easy but has a ceiling, because you can only buy so big a box. Horizontal scaling has almost no ceiling, but it asks more of your architecture up front. The trick is making horizontal scaling cheap and boring to reach for, so that the day you need it, it is already sitting there waiting for you.

    The cloud already solved the expensive part

    Twenty years ago, scaling meant buying servers, racking them, and praying your capacity planning was right. At VinSolutions I babysat a maxed-out Dell because adding capacity meant a purchase order and a drive to the data center.

    That world is gone. Today you can start a server in seconds and pay for it by the hour. Worldwide spending on public cloud services reached $723 billion in 2025, and it keeps climbing, because renting compute on demand is now the default way software gets built. The cloud providers even handle the adding-and-removing of servers for you with auto-scaling, so capacity follows your traffic up and back down without anyone filing a ticket. E-commerce platforms feel those spikes more than anything else, which is why custom e-commerce development is mostly backend engineering with a storefront on top.

    People forget the catch: “cloud” is just someone else’s servers. The cloud did not repeal the laws of computer science. It made capacity something you rent instead of own, which is a huge deal, but it cannot design your software for you.

    The infrastructure side of scaling is mostly a commodity now. What is left is the part you still own: building an app that can actually use all that cheap capacity.

    The basic patterns that actually matter

    This is the part worth your attention. Get these right and horizontal scaling stays cheap and boring to reach for. Skip them and no amount of cloud spend will save you.

    Design your app to run more than one instance

    The single most important rule is this: any web server should be able to handle any request, with nothing important stored on that one machine. That is called a stateless application, and it is the core idea behind the widely used twelve-factor app approach.

    When your app is stateless, scaling out is trivial. You put a load balancer in front and run two servers, or ten, or a hundred. Any of them can serve any user because none of them is special. The moment a server starts holding something the others do not have, like a logged-in user’s session in local memory, you have quietly broken your ability to add more servers.

    Get this one decision right early. It is cheap to design for on day one and painful to retrofit later. The monolith vs microservices call is exactly that kind of early decision.

    Store files in shared storage, not on the web server

    A classic scaling killer I have watched break real systems: a user uploads a file, and the app saves it to the local disk of whichever web server handled the request. It works perfectly with one server. Add a second server and half your file requests start failing, because the file lives on the other machine.

    Put uploads in shared object storage from the start, something like Amazon S3 or its equivalents. Then it does not matter which server handles the next request. The file is in one shared place every server can reach.

    Use a queue for the slow stuff

    Not every task needs to happen while the user waits.

    Sending email, generating a report, processing a video, syncing data to another system, none of that should block the page from loading. Hand those jobs to a queue and let background workers chew through them. The user gets an instant response, and you scale the slow work separately by adding more workers when the queue gets deep.

    Queuing is one of the biggest wins in this whole list, and it is almost always cheaper to add than the brute-force alternative of throwing bigger servers at a slow request.

    Design the database well, then index it

    This is where scaling actually breaks, every time.

    I was the default database administrator at every company I built, mostly because nobody else wanted the job. Two decades of SQL Server tuning later, I have the scars to prove it. The database is almost always the first thing to fall over under load, and it is the last thing the cloud can magic away for you.

    Most database scaling problems come down to two things: a schema that was not designed for how the data is actually queried, and missing indexes. A well-placed index can turn a query that takes ten seconds into one that takes ten milliseconds. There is a reason resources like Use The Index, Luke exist, indexing is the highest-return database skill there is, and most teams under-invest in it.

    Modern managed databases help here. Cloud services now handle provisioning, failover, connection pooling, and even read replicas for you, which takes real operational pain off your plate. What no managed service will do is fix a bad schema or write your missing indexes. That part is still yours.

    Add read replicas when reads outpace what one database can serve, with one honest caveat: a replica lags the primary by a fraction of a second, so a user can occasionally not see their own write right away. Plan for that. Tune your slow queries before you assume you need a bigger machine. Smart database design and indexing will take you further than any architecture diagram.

    Building a development team?

    See how Full Scale can help you hire senior engineers in days, not months.

    Cache the expensive things

    If you compute the same answer over and over, compute it once and remember it.

    Caching frequently requested data in memory, with a tool like Redis or a content delivery network for static files, takes enormous pressure off your database and speeds up everything users see. Just have a plan for when cached data goes stale, because a fast wrong answer is still a wrong answer.

    Measure before you scale anything

    I built Stackify around this exact problem. It was an application monitoring product whose whole job was telling engineers where their software was actually slow. The lesson from building it stuck with me: you cannot scale what you cannot see.

    Most teams guess at their bottleneck and scale the wrong thing. They add web servers when the database is the problem, or buy a bigger database when one missing index is the problem. Put real monitoring on your system before you spend a dollar on capacity, so you scale the thing that is actually breaking. The monitoring also tells you when “until you can’t” has arrived, before your users find out for you. That same visibility is the backbone of keeping an application healthy after launch, because apps break even when nobody changes the code.

    The one genuinely hard call: sharding

    Everything above is straightforward engineering. Sharding is the place where scaling gets genuinely hard, and it is the one decision you actually have to think about in advance.

    Sharding means splitting your data across many separate databases, so no single database holds all of it. At Stackify we ran around 2,000 sharded SQL Server databases, because we were ingesting application logs and performance data at a volume no single database could hold. That is real scale, and it cost us real complexity. Sharded systems are harder to query, harder to keep consistent, and harder to operate. Every developer who touches the system has to understand the sharding scheme, which is exactly the kind of complexity that becomes technical debt if you take it on too early.

    This is the one place I tell people to design ahead, and the reason it is the exception matters. Data volume is a property of what you are building, not a bet on how successful you will be. If you are building logging, analytics, or anything that ingests events, you know on day one that you will hold billions of rows. That is knowable. Hoping for a million users is not.

    So the honest rule is this: if your product could realistically hold millions or billions of records in a single table, design for sharding up front, because bolting it on later is brutal. If it could not, do not shard. You are buying a problem you do not have.

    This is the same instinct I apply to every architecture decision. Never introduce complexity unless it gives a 5x improvement or there is genuinely no other way to get there.

    Most scaling problems are rich-people problems

    Here is the reframe I wish someone had handed me at twenty-two.

    Most of the dramatic scaling problems you read about are rich-people problems. When you are first starting out, you are extremely unlikely to have any of them. And if you do hit them, it means you have been wildly successful, you probably have a lot of customers and a lot of revenue, and frankly you can afford to hire people to solve them. So congratulations.

    The expensive mistake runs the other direction. It is the team that builds for a million users while they are still hunting for their first hundred. They shard a database that holds ten thousand rows. They split a working app into twelve microservices that now need a platform team to operate. All of that effort is runway, and they spent it preparing for traffic that never arrived.

    To be fair to microservices, they do solve a real problem. Once you have a lot of engineers stepping on each other in one codebase, splitting into independently deployable services lets teams ship without waiting on each other. That is an organizational win, not a performance one, and it is worth paying for when you have the org to justify it. Notice that even the industry has cooled on the microservices-by-default reflex, with the “modular monolith” coming back into fashion precisely because most teams were buying complexity they did not need.

    Pendo found that 80% of software features are rarely or never used. The same disease infects architecture. Engineers love building for a future that feels impressive and usually never shows up. This is the same discipline behind a good MVP, where cutting scope beats adding it. Customers do not buy cool architecture. They buy products that solve their problem today.

    This is exactly the trap I describe in Product Driven: teams stay busy building things that do not move the business, because building feels like progress. Premature scaling is that trap wearing an engineering costume. The discipline is the same one that governs the build versus buy decision, spend your effort where it actually pays off, and not one hour before.

    Scalability is a team decision, not just an architecture one

    There is one more thing the guides leave out. Scalability is not only about patterns. It is about whether you have engineers who have done this before and, just as important, who know when not to.

    A senior engineer who has scaled a real system will tell you that you do not need sharding yet. A junior one, or an AI assistant, will happily build it because you asked. Knowing which patterns to skip is worth more than knowing how to implement all of them, and that judgment only comes from having been burned. This is one of the quieter challenges every CTO faces: protecting the team from its own urge to over-build.

    It is also why I am wary of hiring the cheapest developer you can find to save money, a mistake I call cheapshoring. Scaling judgment is exactly the kind of expensive experience that gets cut when you optimize purely for hourly rate.

    At Full Scale we augment our clients’ teams with senior engineers who have built and scaled production systems, so the people writing your scaling code have actually scaled something before. When we build custom software for a client, the default is the boring, proven stack, not the resume-driven one.

    AMC Theatres runs the world’s largest movie-theatre ticketing platform with this exact model. As their CIO Derrick Leggett puts it, “It’s a fully integrated team. It’s just some of the people happen to be living in the Philippines.” The engineers scaling that platform are treated as full members of the team, not a vendor on the other side of a wall. That is what makes scaling decisions get made well.

    Frequently asked questions

    What is software scalability?

    Software scalability is an application’s ability to handle increasing load, including more users, more data, and more transactions, without losing performance or going down. A scalable system can grow by adding resources, and ideally shrink again when demand drops so you are not paying for capacity you no longer need.

    What is the difference between vertical and horizontal scaling?

    Vertical scaling makes a single machine more powerful by adding CPU, memory, or storage. It is simple but limited by how large a single server can get. Horizontal scaling adds more machines and distributes the load across them, which has almost no ceiling but requires your application to be designed for it, starting with stateless app servers.

    When should you start worrying about software scalability?

    Worry sooner about the cheap decisions and later about the expensive ones. Make your app stateless and use shared file storage from day one, because those are nearly free to design for and painful to retrofit. Hold off on sharding, microservices, and other heavy machinery until real load forces your hand. Most products never reach that point.

    Do you need microservices to scale?

    No. Plenty of large, high-traffic products run on a well-built single application, often called a monolith. Microservices solve specific organizational and scaling problems, and they add real operational complexity in exchange. Reach for them when you have a concrete problem they solve, not as a default starting point.

    How do you scale a database?

    Start with good schema design and proper indexing, which solve the majority of database performance problems. Add read replicas when read traffic outgrows a single database, and cache frequently requested data to reduce load. Sharding, splitting data across many databases, is the last resort, reserved for genuinely massive data volumes because of the complexity it introduces.

    Build software that scales when it needs to, and not before

    The best scalability strategy for most teams is restraint. Get the cheap patterns right early, design your database with care, and refuse to build for traffic you do not have yet. When you do hit real scale, it will mean your product worked, and you will have the resources to handle it.

    If you want senior engineers who have already scaled production systems on your team, book a 15-minute call with Full Scale and we will talk through what your roadmap actually needs.

    Get Product-Driven Insights

    Weekly insights on building better software teams, scaling products, and the future of offshore development.

    Subscribe on Substack

    Ready to add senior engineers to your team?

    Book a 15-minute call. Tell us your stack and where the gaps are, and we'll show you the engineers we'd put on your team.