8 Symptoms of Ineffective Cloud Cost Management and How to Fix Them

Published in

FAUN — Developer Community 🐾

7 min readJul 29, 2024

Several studies like the Cloud Spend Optimization Report 2024 by Vertice uncover that organizations are wasting up to a third of their cloud spend. The report also shows that 43% of the participating organizations have not yet implemented a cost management strategy. Implementing a cost management strategy aka. establishing FinOps practices is a vital part of cloud governance, but it is not a trivial task. The FinOps methodology is a combination of systems, best practices, and culture that aims to optimize cloud spend and provides transparency. It is a cross-functional approach that involves finance, engineering, and business teams. The goal is to ensure that the organization is getting the most value out of cloud spending as well as providing data to make informed business decisions for future investments.
AWS provides a comprehensive set of tools that help you collect and visualize the data that you need to implement cost management effectively.¹

The most crucial part is the adoption of the Cloud Intelligence Dashboards Framework. It’s an open-source framework, lovingly cultivated and maintained by AWS, that gives customers the power to get high-level and granular insight into their cost and usage data.²

To check if your organization is managing cloud costs effectively and to find out how to mitigate issues, here are eight common symptoms that indicate that your organization could improve:

1. No cost governance at all

The organization has no idea where the cloud spending is going. There is no cost governance in place and no one is responsible for managing cloud costs continuously. Even though the actual cloud spend is within budget, without cost governance, the organization is not able to optimize cloud spend and tends to overspend and waste money. One reason no cost governance is in place is that organizations claim that money is not a first-class citizen to them because they have been running profitably for years. However, this is a dangerous assumption and comes with a big risk. There will be times when the organization will face financial difficulties and is forced to cut costs. Implementing a cost governance model needs time and knowledge, both become critical when you are under pressure. You may not put all your efforts into cost governance when you have other priorities, but you should at least constantly evolve your cost governance model.
A minimum viable cost governance model should include a monthly cost review, check for cost anomalies, and check for cost optimization opportunities.

You should be able to answer questions like:

What are the top cost drivers?
Why did the cost increase or decrease? Take a look at the top 5 movers and bottom 5 movers.
If you detect a cost anomaly, investigate it and take action. Put measures in place to prevent it and check again next month.

2. No internal chargeback model

Big organizations often have multiple teams or even business units that use cloud resources. Typically they also use a federated AWS account setup with consolidated billing. Without an internal chargeback model, teams have little to no incentive to optimize cloud spend. Especially when you have savings plans in place, top spenders get discounts while all others pay the full price. An internal chargeback model is a way to allocate cloud costs to the teams that use the resources and distribute savings evenly. It helps to create cost awareness and accountability.
To start implementing a chargeback model, you can use AWS Cost Allocation Tags. You can tag resources with the team name or business unit and use the AWS Cost Explorer to create cost allocation reports. Gather an overview of common shared costs, like costs for shared network components, DDoS protection service fees, and also discounts that you can distribute to the teams. The chargeback model aims to be a fair and transparent way to distribute costs and savings, regardless of the workload size. This helps to unveil the true cost of cloud resources per workload.

3. Avoiding commitment

Organizations that are reluctant to commit to a baseline of consumed resources are missing out on significant cost savings. Reserved Instances and Savings Plans are a great way to save money on cloud spend, compared to on-demand pricing. While organizations that migrate from on-premises were forced in the past to commit to a certain amount of resources for a long period (aka CAPEX). Strictly follow the promised land of you pay only for what you use (aka OPEX) is not always the best choice.
Typically, it is a balance between both, but finding the right optimum is a slippery slope and requires quite some knowledge and experience.

As a rule of thumb, your baseline production traffic is ideally covered by Reserved Instances and Savings Plans. Reserved Instances are a good choice for databases (because it’s the only option), while compute resources are better covered by Savings Plans. Savings Plans are more flexible and can be used for any EC2 instance, Fargate, or Lambda usage. You should constantly monitor your usage and adjust your Reserved Instances and Savings Plans accordingly to not waste money.

Saving Plans ideally cover constantly around 90–95% of your total compute usage. If you are above 95% you are likely to run into a situation where your Savings Plan would cover more than you use. If you are below 90% you are likely to miss out on potential savings.
Since this is a complex topic, organizations with federated AWS accounts should manage Reserved Instances and Savings Plans centrally. This way, you can optimize the usage of Reserved Instances and Savings Plans across all accounts. Together with the aforementioned internal chargeback model, you can distribute the savings equally across all teams.

4. No cost awareness during development

Developers are often not aware of the cost implications of their code. Neither do architects tread cost optimization as a first-class citizen. Rapid prototypes are implemented as if they must be able to handle production-like traffic.

To mitigate this, you should create a cost-aware culture in your organization. Infrastructure-as-Code makes systems easier to bootstrap, scale and delete environments. Shut down development environments when they are not used outside business hours. Implement automated account reset/ nuke mechanisms for development environments. Use AWS Budgets to set cost limits and alerts and implement strict remediation rules.

5. Burn money on development environments

Development environments are typically systems without the need to run 24/7, they are only used during business hours. While on-prem there is no need to shut them down when they are not used, in a cloud environment this leads to unnecessary costs. To mitigate this, you should implement efficient autoscaling for development environments and shut them down outside business hours. Using AWS Instance Scheduler³ is a great way to automate this process and typically results in more savings than you could generate by purely optimizing with Reserved Instances and Savings Plans.

6. Not continuously improving

The capabilities and services of cloud providers are constantly evolving. New instance types typically offer better performance at a lower price and emerging features change how you can architect solutions.

Organizations that do not continuously revise and improve their cloud solutions are missing further potential saving opportunities. To exchange reserved instances before the term ends you can sell your reserved instances on the AWS Reserved Instance Marketplace⁴. Savings Plans offer much more flexibility, and you can easily switch instance types. However, you may need to adapt your Savings Plans to new coverage requirements in the future. AWS Compute Optimizer can help you to spot potential improvement opportunities. It provides recommendations for right-sizing your instances and helps you to identify underutilized resources.

7. Constantly running on oversized production environments

Coming from the on-prem world, where you had to plan for peak traffic, organizations tend to oversize their production environments. This may make sense in a lift and shift migration approach but to fully leverage the cloud benefits you should right-size your production environments as soon as possible.

Organizations tend to postpone that task in fear of breaking things or simply not trusting their system and scaling capabilities of the cloud provider. They rather prefer running an unoptimized workload than investing in optimization. However, the only way to constantly evolve and adapt is to embrace a working culture that allows it to fail fast. Services like AWS Compute Optimizer can help to right-size your instances and help you to identify underutilized resources mitigating the risk of breaking things or optimizing on the wrong end.

8. Copying other organizations’ architecture

Even though you may have a similar workload as another organization, resp. your competitor, copying their architecture for success is typically not the best choice.

Context always matters and every business that is meant to stay needs at least one unique selling point to make a difference to other competitors. You should always evaluate your requirements and constraints and design your architecture accordingly. Constantly review your strategy and architecture to be able to adapt it to the ever-changing environment. If you copy the success of others, you may also copy their failures but also their optimization for their specific use case. Strive to find the best solution for your specific use case. This does not mean you cannot learn from others, but keep evaluating if their solution is the best suited for you.

Implementing a cost management strategy is a vital part of cloud governance and helps you to optimize cloud spend.
The aforementioned eight symptoms are common in organizations that are not managing cloud costs effectively.

It’s a starting point to help you to evaluate your cost management strategy.
Once you understand the reasons why you are wasting money on cloud resources, you can take action to mitigate them.

Thanks for reading, and happy cost optimization!
I am happy to receive your feedback and answer your questions.

[1] Data collection framework

[2] CUDOS demo dashboards

[3] AWS Instance scheduler

[4] Reserved Instances Marketplace