Cloud Cost Optimization for Engineering Teams: How to Cut AWS/Azure Bills 30%

In this article
- What cloud cost optimization actually is
- Why your cloud bill keeps growing
- Quick wins: cut 15 to 20 percent in your first week
- The durable fix is ownership, not another tool
- A 30-60-90 day plan
- How to optimize without breaking production
- You are optimizing the wrong number
- When not to optimize
- From a runaway bill to a number you own
- Frequently asked questions
The cloud bill is too high. Your CFO is right about that much. They are just wrong about why.
I have watched plenty of engineering leaders open an AWS invoice that doubled overnight and have no idea why. I have watched VPs of Engineering try to explain an Azure bill to a board when they cannot explain it to themselves. That is not a sign you are bad at engineering. It is a sign that nobody owns the number, and that cloud billing is designed to be hard to read.
Here is what most cloud cost optimization guides will not tell you. Real cloud cost optimization is mostly an ownership problem, not a tooling problem. A dashboard can show you that you are bleeding money. It cannot stop the bleeding. A person who owns the bill can.
The waste is real and it is getting worse. Flexera’s 2026 State of the Cloud report found that wasted cloud spend rose to 29 percent, the first increase in five years, driven by AI workloads and pricing that keeps getting more complicated. The same research has found for years that 84 percent of organizations name managing cloud spend as their top cloud challenge.
This guide gives you both halves of the fix: quick wins you can start this week, and the ownership system that keeps the savings from creeping back. I am writing it from the operator’s chair, not the vendor’s. Years ago I ran the infrastructure behind Stackify, an application monitoring product that ingested logs and performance data from thousands of applications. At one point we were spending more than a million dollars a year on Azure. That bill was not driven by one rogue server. It was driven by architecture and the volume of data we kept, and I will come back to what that taught me.
What cloud cost optimization actually is
Most leaders hear “cloud cost optimization” and picture turning stuff off. That is treating a fever with an ice bath. You handle a symptom and ignore the disease.
Cloud cost optimization is the ongoing practice of keeping visibility, ownership, and control over what you spend in the cloud, so you balance performance against cost on purpose instead of by accident. It is a habit, not a one-time cleanup. Turning off idle resources is part of it. Tagging, real dashboards, and someone who actually owns the bill are the part that makes the savings last.
That distinction matters because cloud cost management and cloud cost optimization get used as if they are the same thing. Management is knowing where the money goes. Optimization is doing something about it. You need the first to get the second, but the first alone has never saved anyone a dollar.

Why your cloud bill keeps growing
Your bill did not triple because the company grew threefold. Growth is part of it. The rest comes from a handful of drivers nobody is watching, and almost all of them trace back to ownership.
Nobody can see what they spend. This is the root of it. Harness surveyed 700 engineering leaders and developers for its FinOps in Focus report and projected $44.5 billion in infrastructure cloud waste for 2025, roughly 21 percent of enterprise cloud spend, driven by the disconnect between finance and engineering. Only 43 percent of teams had real-time data on idle resources. More than half of developers said purchasing commitments came down to guesswork. You cannot expect engineers to spend carefully when the bill arrives weeks later and lands on someone else’s desk.
Fear keeps everything over-provisioned. One bad launch day leaves a scar. After that, every service runs at peak capacity around the clock, just in case. You pay for the busiest hour during the 22 hours a day that are quiet. Teams call that a safety strategy. It is really an expensive trauma response, and one of the biggest line items nobody questions. The fix is auto-scaling with a sane floor and some headroom, so capacity tracks demand instead of fear. The one exception is latency-critical or slow-booting workloads, where steady capacity is the right call.
Zombie resources never die. A staging cluster from two years ago. Dev environments nobody has touched since the engineer who made them left. Storage volumes attached to nothing. They keep billing because deleting them feels risky and nobody owns the decision to do it.
Architecture is the bill. This is the one I learned the hard way. At Stackify we sharded the database per customer, on purpose, from day one, because I had been burned at VinSolutions by one giant database we never sharded early enough. That choice meant we ran around 2,000 SQL Server databases. An Azure engineer once told me their platform had no database that performs “like a Ferrari,” but you could have “an unlimited fleet of Hondas.” So we built for the fleet. It worked, and it also created a class of per-shard cost that never goes away. The largest numbers on your bill are usually decisions someone made about architecture and data retention, not a checkbox you forgot to flip. The most useful lever there is unglamorous: set data-retention and tiering policies on purpose, because at Stackify the real cost was the sheer volume of data we chose to keep.
And now there is AI. GPU and inference spend is the fastest-growing line on most 2026 cloud bills, and it is the same ownership problem wearing a new costume: idle GPUs, oversized instances, and nobody watching the meter. Flexera tied that first rise in cloud waste in five years directly to AI workloads. New line item, same root cause.
Quick wins: cut 15 to 20 percent in your first week
Before any strategy, stop the bleeding. These are low-risk moves your team can start on a Tuesday afternoon without filing a change request. None of them touch production in a scary way.
- Kill zombie resources. Run an inventory, filter for anything not touched in 30 days, email the owners, and delete after a one-week grace period. Leave backups and disaster recovery alone. This is usually the single biggest win, often 8 to 12 percent.
- Right-size the obvious offenders. Find instances sitting under 30 percent CPU and memory for weeks. Start with non-production, downsize during quiet hours, and watch for a day before you touch anything in production. Call it 5 to 8 percent.
- Delete unattached storage. Storage volumes attached to nothing still cost money. Snapshot anything large for safety, then delete volumes older than 30 days.
- Clean up old snapshots. Keep a sane retention window, delete the rest, and set a lifecycle policy so the bloat does not come back.
- Shut down non-production after hours. Dev and staging do not need to run nights and weekends. Schedule them for working hours with a manual override, and check for overnight pipelines first.
Add it up and most teams find 15 to 20 percent in the first week. That is real money, and it buys you the credibility to do the harder work next.

The durable fix is ownership, not another tool
Quick wins prove what is possible. They also creep right back if nothing changes. You will cut 20 percent, celebrate, and watch the bill climb again within two months, because the thing that caused the waste, no clear owner, is still in place.
So make it visible and make it owned. Those are the two moves that actually last.
Visibility first. Tag every resource with four things: the team that owns it, the project it belongs to, the environment, and a cost center for finance. Enforce tagging when a resource is created, not in a quarterly cleanup. Then put four numbers on a dashboard everyone can see: spend against budget month to date, spend by team, the ratio of production to non-production spend, and week-over-week change so anomalies surface fast.
Then ownership. Decide whether one person owns the bill or each team owns its slice, and say it out loud. The honest version of “everyone owns cloud cost” is that no one does. Start with showback, where teams see their costs without taking a budget hit, because it builds awareness without a fight. Move to chargeback, where the cost lands on the team’s budget, once the data is solid and the org is mature enough to handle it.
The good news is that engineers want this. In the Harness research, 62 percent of developers said they want more control over and responsibility for cloud costs. Engineers are not the bottleneck. The missing piece is giving them the numbers and naming who is accountable.
A 30-60-90 day plan
One-time optimization is a band-aid. A schedule is what turns it into a practice.
Days 1 to 30: quick wins and visibility. Run all five quick wins. Stand up tagging across your biggest spenders and a basic dashboard in AWS Cost Explorer or Azure Cost Management. Name the ownership model. Target 15 to 20 percent and make sure the CFO sees it.
Days 31 to 60: structure. Buy reserved instances or savings plans for the baseline you are certain you will use. They cut 30 to 50 percent off that usage, but they lock you in, so commit only the floor and let on-demand absorb the peaks. Most teams overcommit because they assume they will grow into it, and you are probably not Google. Turn on auto-scaling, add storage tiering, and set cost anomaly alerts that flag any team jumping more than 20 percent in a week. Another 10 to 15 percent is realistic here.
Days 61 to 90: culture. Hold a 30-minute cost review every month. Show engineers what their choices actually cost. Put cost into design discussions and architecture reviews, not as a gate, but as a normal input. This is the part that keeps the gains from regressing.
These ranges overlap rather than stack. Most teams land near 30 percent sustained, which is roughly the share the industry wastes, not some magic 50 percent.

How to optimize without breaking production
Let me be honest about the real reason teams do not optimize. They are scared of breaking something. I have seen leaders pay an extra $50,000 a month because they were too nervous to touch anything. The most expensive thing in your cloud is not waste. It is the fear that stops you from fixing it.
Handle that with a risk ladder instead of a leap. Tier the work. Green is safe and reversible: zombie cleanup, snapshot policies, non-prod scheduling. Yellow needs care: right-sizing production, reserved purchases. Red is anything touching a critical system with no redundancy, and you save it for last with the most testing.
The process is the same every time. Pick a target, write down the savings you expect, and test it in a non-production environment. Watch it for 48 hours and confirm nothing degraded. Roll it to 10 percent of production traffic, monitor for a week, and keep a rollback plan ready the whole time. Boring beats brave here.
You are optimizing the wrong number
Here is the part every other guide skips, because the companies writing them sell cloud tools.
In every engineering org I have run or worked with, the cloud is a fraction of what the team costs. Infrastructure usually lands somewhere around 20 to 30 percent of technical spend, and people are the rest. I have watched a leader grind an AWS bill from $50,000 down to $40,000 a month while spending $200,000 a month on the team running it. That is hunting for coins in the couch while the mortgage runs.
If you want the bigger savings, look at how you staff the team, not just how you size your instances.
That does not mean hiring the cheapest developers you can find. Doing that is a mistake I call cheapshoring, and it costs more than it saves once you count the rework and turnover. The move that actually works is staff augmentation with senior offshore engineers who join your team directly. At Full Scale that runs about $35 an hour, fully loaded, against $150,000 to $180,000 a year for a senior U.S. hire. The developer works in your tools, your standups, and your repo, the same as anyone else on staff. You are changing the cost of the team, not the quality of it. That is the same logic as reducing your software development costs without slowing delivery.
And if you are tempted to follow 37signals out of the cloud entirely, read the fine print first. They cut roughly $2 million a year by moving to their own hardware, and project more than $10 million over five years. Good for them, but the lesson is not to leave the cloud. It is that your bill is a set of choices, not a default you inherit. Most engineering orgs should fix ownership and right-sizing long before they buy a rack of servers.

When not to optimize
Trust is worth more than a few points of savings, so here is when to leave the bill alone. Do not touch a critical production system that has no redundancy. Leave legacy systems you do not fully understand until you do. Avoid changes during a launch, a holiday, or any high-traffic window. And never optimize without monitoring in place to catch a regression. Sometimes the right move is patience: build visibility, understand what you are running, then cut with confidence.
From a runaway bill to a number you own
Your cloud bill is not out of control because your engineers are careless. It is out of control because nobody owns it, and most teams treat optimization as a project instead of a habit. Make the spend visible, give it an owner, start with quick wins this week, and hold the line with a simple monthly rhythm.
Then go after the bigger number. Cloud is a slice of your technical spend. The team is the rest. Own both, and the math on your whole engineering budget changes. This is the kind of ownership thinking that runs through everything we believe about building teams, and it is the core of my book Product Driven.
If you want help with the bigger lever, talk to us about building your team. We have made 1,000+ engineer placements for 200+ companies since 2018, at a flat $35 an hour with 93%+ retention.
Frequently asked questions
How much can I realistically save with cloud cost optimization?
Most teams find 15 to 20 percent in the first week from quick wins like killing idle resources and right-sizing, then another 10 to 15 percent from structural changes over the next two months. A sustained cut around 30 percent is a realistic target, which lines up with the roughly 29 percent of cloud spend the industry wastes. Anything above 50 percent usually means the original setup was badly over-built.
Will cloud cost optimization slow down my engineering team?
Done right, no. The goal is visibility and ownership, not restrictions. When engineers can see what their choices cost, they make better ones on their own. Build guardrails, not gates.
What are the biggest cloud cost wastes most teams miss?
Idle and zombie resources, instances running under 30 percent utilization, storage attached to nothing, and paying for peak capacity during off-peak hours. Together these are usually most of the waste, and most of it traces back to nobody owning the bill.
Should I optimize cloud costs or hire an offshore team first?
Optimize the cloud first, because the quick wins take a week or two and prove the value. Then look at the team, because salaries are the bigger number. Staff augmentation is the longer-term lever, and the two compound.
What is the difference between cloud cost management and cloud cost optimization?
Management tells you where the money goes. Optimization is the work of bringing it down. You need the first to get the second, but tracking your spend, on its own, changes nothing.
How often should I review cloud costs?
A 30-minute review every month, a quick anomaly check every week, and real-time alerts in between. Build it into the normal rhythm instead of running disruptive optimization sprints that pull engineers off product work.



