How to reduce your AWS costs
Table of Contents
I get this question A LOT, and the answer is never one-size-fits-all, but I can say that it requires a combination of tools, processes and people.
Lets start with the tools (I’m an Architect)
Tools
In order to save costs, you need good visibility on what you are running and where.
Cost Explorer
The easiest tool- Allows you to get a view of your costs quickly. I recommend enabling hourly granularity and resource view. You’ll see in the next section why those are important.
Cost and Usage report
If you want to take the raw data and get the insights yourself, the data is provided as CSV files in the S3 bucket you choose. I would recommend using a solution such as CUDOS below.
CUDOS
A solution built on Cost and Usage reports, Athena and Quicksight. The solution and complete instruction are available in the Well Architected Labs
Compute optimizer
The Compute optimizer would analyze your activity and suggest alternative instances based on your usage. The CUDOS dashboard also has an additional dashboard for the compute optimizer findings.
AWS Accounts
If you don’t already separate your workloads to different accounts, I highly recommend doing so. I found that it’s easier to control costs when you have a clear understanding of ownership. It’s easier to understand “Who owns this EBS?” when the account is owned by a small team. If you have too many chefs in the kitchen it’s hard to verify everyone cleans up after themselves.
Tagging
Now that I think about it, I will create a separate post just for tagging. Stay tuned.
What should you be looking at?
If your architecture is not cloud native, then your costs are probably derived by EC2. Using any the tools above you should be able to get an hourly view of your EC2 usage.
EC2 hourly view
If you separated your environments to separate account, it should be easy to filter the costs per environment, and the results should be very different.
In Dev/test environments you should be seeing five fingers, that represent the five working days in the week. Most instances should be off during off hours and weekends. So you should be seeing something resembling the image below:
The important thing to note is that there should be minimal number of EC2s running 24X7 and those EC2 should be covered by savings plan or reserved instances. The rest can be used by Spots where possible.
Shutting down EC2s during off hours would yield 70% in cost savings, and using EC2 Spots would be up to 80% savings.
In production environments these charts should reflect the load peak. The number of instances should increase in heavy load hours and be kept to minimum when load is minimal. If there is no increase during load, it could be that the instances are over provisioned and need to be rightsized, or that you are not utilizing auto-scaling.
Hourly view is available in Cost Explorer. If you would like to query the cost and usage report for the hourly view, I shared the queries here.
Now let’s say you find that you have an unusual large amount of instances running 24X7, and you want to identify those instances and find the owner, the account ID, etc. You can find a query to list those instances here.
Auto-stop instances
For all the instances running 247 check which should actually be running 247 and verify the others are automatically shutting down during off hours. Shutting down during off hours can save 70% of the EC2 costs.
There is a solution created for automatically shutting down EC2s and it’s available here.
The power of trend
There are several services that the smallest increase trend is an indication of misconfiguration. For example EBS. EBS volumes with the slightest increase of cost is an indication that the EBS usage is not monitored and should be reviewed. I had a customer with the smallest trend, and once they looked into it, they realized they have neglected the EBS volumes for years, and 40% of the volumes were not in use, but because the increase in cost was so small, it accumulated over time.
Same for S3 buckets. If you see an increase trend of an S3 bucket, it means that this bucket does not have a lifecycle policy in place, and it was probably overlooked for a while.
EBS snapshots are also a common cost saving opportunity.
Look into your resources, and see where you have an increase over time. You might find that these trends add up to large sums.
Reserved instances and Savings plans
I saved those for last because that’s when you need to address them- last.
- Use Auto-scaling
- Shut down dev and test instances during off hours
- Use Spots
- Verify all instances are rightsized using compute optimizer
- Verify only the required instances are running 24*7
Then and only then, purchase savings plans or reserved instances for the instances running 24*7.
Processes
Setup a quarterly process to review the savings plans and reserved instances. To see if new ones are needed and if there are plans about to expire.
Setup a monthly to review the costs of the different departments. The cost dashboard should be visible to everyone in the organization.
Setup incentives plans to encourage teams to reduce costs. Some companies share the cost KPI of all teams and rewards teams that excel in cost savings.
People
The main change in cost would be when the systems are modernized and build with cost in mind. The problem is- developers don’t always design their systems with cost in mind, or don’t consider the total ownership costs.
For example:
- Developing your own solution over using a SaaS to avoid the cost of SaaS subscription, while ignoring the cost of instances or the development hours
- Not considering serverless solutions
- Ignoring costs of licenses since they were already purchased (but would need to be renewed over time)
There are many more examples, but the point is- Developers and architects need to be educated on cloud pattern and available services and tools, otherwise you might find yourself in constant technical debt.
I thought about summarizing this- but I don’t want you to think I created this using ChatGPT…