AWS Elasticsearch Ultrawarm cost analysis

Paulo Martins published in Cloud/Infrastructure

2020-08-09 1437 words 7 minutes

Contents

In 2010, Elasti.co created the Elasticsearch engine. Five years later, in 2015, AWS launched the Elasticsearch Service.

Last year, an OSS licensing war between AWS and elastic.co led AWS to create their own “flavor” of Elasticsearch called OpenDistro. At first sight, OpenDistro is very much like the “original product”, but as we work with more advanced use cases, we will start to notice some differences. Most differences relate to the number of fantastic plugins provided by elasti.co that are proprietary and that we can’t use in AWS’s offer.

There is a lot of information online about the many drawbacks of using AWS service. Even though the service has been evolving during the last years, the two solutions are not yet comparable. If you want to know more, I recommend that you take a look at these posts:Z

On the other hand, using AWS also comes with some advantages. In my opinion, there are at least three strong reasons for using the solution provided by AWS:

It’s already in AWS and, thus, we don’t have to create a new account or go over the entire company procedure of getting a new tool.
The integration with the entire AWS IAM is handy. If we already follow the least privilege best practices for our infrastructure, with the managed AWS solution, we can use the same tool we already know. Since Feb 2020, we can “use roles to define granular permissions for indices, documents, or fields and to extend Kibana with read-only views and secure multi-tenant support”.
You can leverage of Ultrawarm storage to reduce costs. We will explore Ultrawarm in more detail in the following sections, and we’ll discuss why this storage is an interesting addition to the service.

Ultrawarm

In their 2019 Re-Invent conference, AWS revealed a new feature that would be available on their Elasticsearch service offer. This new feature was called Ultrawarm.

In a nutshell, Ultrawarm is an S3 backed storage for the managed AWS Elasticsearch that promises to reduce storage costs by up to 90%. You can make use of OpenDistro’s Index State Management policies to send our “non-active” indexes to Ultrawarm. Non-active indexes are those where we will not write to anymore and those we expect to query less frequently. The performance of our searches will be affected, but we can still query these indexes in the same way.

This new offering became generally available in May 2020, and it is currently one of the most valuable features of the managed service of AWS.

As we might imagine, having a 90% cheaper storage is something that will have a meaningful impact on the monthly cost of our Elasticsearch cluster.

Cost Analysis

In the following sections, we will dive into two different scenarios and try to understand the impact of using Ultrawarm in our cluster:

First Scenario - 100 GB/day w/ retention for 30 days
Second Scenario - 150 GB/day w/ retention for 7 years

The goal of this analysis is to help us measure the difference of having all indexes stored in regular storage versus sending some indexes to Ultrawarm.

Method

The scenarios are very similar. We create a new index every day, and after X days, our policy in the Index State Management deletes old indexes.

For the Ultrawarm scenarios we will move older data to Ultrawarm and keep most recent data on hot instances for better performance.

There are many ways of setting up an Elasticsearch cluster to support load. Before we start the analysis, we must make some assumptions on how to scale the cluster:

We follow the AWS recommendations to calculate storage requirements.
All data is replicated once.
We don’t have very complex workloads.
Ireland (eu-west-1) region is used as reference for all prices.

For each scenario, we will follow four steps to hopefully approximate real-life scenarios, namely:

Calculate the storage requirements

AWS provides a set of simplified formulas that we can use as a starting point to load-test our cluster. The formula to calculate the storage is:

$$ \mathrm{Minimum> Storage> Requirement} = \mathrm{Source> Data} \times ( 1 + \mathrm{Number> of> Replicas}) \times 1.45 $$

Where:

$$ \mathrm{Source> Data} = \mathrm{Data> per> day} \times \mathrm{Retention> days} $$
Choose data instance types

To choose the number and type of instances, we will make it fit the instance storage limits. For simplicity, all instances will be from the r5 family.

Choose master instance types

AWS published a table that we can use as a guideline to choose the instances for the Master nodes:

Instance Count	Recommended Minimum Dedicated Master Instance Type
1–10	c5.large.elasticsearch
10–30	c5.xlarge.elasticsearch
30–75	c5.2xlarge.elasticsearch
75–200	r5.4xlarge.elasticsearch

The minimum recommendation is 3 dedicated masters. To keep it simple, we will keep those 3 master nodes and change the instance types depending on the instance count of the data nodes.

Total Cost and comparison

Sum all the costs and analyze the differences.

First Scenario | 100 GB/day w/ retention for 30 days

Let us start with a simple scenario where we need to store 100GB of data and have 30 days of retention. When enabling Ultrawarm, we want to keep the indexes of the most recent week (7 days) available for quick search and we will store the remaining 23 days in Ultrawarm instances.

With this scenario, we get the following results:

Calculate the storage requirements

Using our formula:

$$ \mathrm{Minimum> Storage> Requirement} = 30 \times 100GB \times 2 \times 1.45 = 8700GB $$

	Retention (days)	Price per GB ($)	Cost ($)
Price without Ultrawarm
All Hot	40	0.15	1296.30
Price with 23 days of retention in Ultrawarm
Hot	7	0.15	302.47
Ultrawarm	23	0.02	160.08

Choose data instance types

	Instance type	Quantity	Price per hour ($)	Cost ($)
Price without Ultrawarm
Hot	r5.large.elasticsearch	9	0.21	1302.91
Price with 23 days of retention in Ultrawarm
Hot	r5.large.elasticsearch	2	0.21	230.93
Ultrawarm	uw1.medium	4	0.262	838.82

Choose master instance types

	Master Instances ($)
Master Instance type	c5.large
Quantity	3
Price per hour ($)	0.14
Total Master instance cost ($)	306.72

Total Cost and comparison

By using Ultrawarm for this use case we can get a monthly difference of -1066.91

Second Scenario | 150GB/day w/ retention for 7 years

For the second scenario, we want to see how Ultrawarm worked at scale. In this scenario, we have 150GB of data arriving every day, and we need to store it for 7 years while still being able to access it easily. When enabling Ultrawarm, we want to keep the indexes for the most recent month (30 days) available for quick search, and we will store the remaining 2525 days in Ultrawarm instances.

Note that In the following calculations I will try to simplify. If we get to a cluster of this size, we will most likely need to spend more time considering the right instances, masters, and so on.

Calculate the storage requirements

Using our formula: Minimum Storage Requirement = 30 * 100GB * 2 * 1.45 = 1111425 GB = 1.1 PB

	Retention (days)	Price per GB ($)	Cost ($)
Price without Ultrawarm
All Hot	2555	0.15	165,602.33
Price with 6 years and 11 months of retention in Ultrawarm
Hot	30	0.15	1,944.45
Ultrawarm	2525	0.02	26,361.00

Choose data instance types

We now need to use the ultrawarm1.large.elasticsearch that supports up to 20 GB each.

	Instance type	Quantity	Price per hour ($)	Cost ($)
Price without Ultrawarm
Hot	r5.12xlarge.elasticsearch	93	4.99	332,827.33
Price with 6 years and 11 months of retention in Ultrawarm
Hot	r5.large.elasticsearch	13	0.21	2,969.14
Ultrawarm	ultrawarm1.large.elasticsearch	55	2.955	116,845.13

Choose master instance types

	Master Instances ($)
Master Instance type	r5.4xlarge.elasticsearch
Quantity	3
Price per hour ($)	4.991
Total Master instance cost ($)	10,780.56

Total Cost and comparison

By using ultrawarm for this use case we can get a monthly difference of $-350,309.94

Conclusion

In both scenarios, we reduced the storage costs of our cluster with Ultrawarm. However, when we consider the added cost of Ultrawarm instances, the gains in the first scenario are not as significant.

In the end, the potential gains depend heavily on the percentage of the data we need to keep in hot storage to quick access. If we have a use-case where the reduction in query performance is acceptable, moving part of our data to Ultrawarm can have a sizeable impact on our AWS bill.

Analyzing logs is a great example of such use-case. You still want to search old logs, however, most of the time, we’ll query more recent events. In this case, we can move most of our data to Ultrawarm and reap the rewards.