How we [Cisco] optimized performance on Snowflake to reduce costs 15%

14 min readApr 14, 2021

Performance Overview

Snowflake is a powerful cloud data warehouse platform. It can do wonders, however only if you have control over associated cost. It is a pay per use model and hence if costs are not monitored and controlled it will be sky rocketing resulting in discomfort.

Snowflake does come with lot of checks and balances to optimize the cost right out of the box. In order to make of best use of it one must understand what is right and what is wrong. Concurrency, Scalability along with optimal performance will be the core moto of any data warehouse environment. As data in a data warehouse environment grows over a period, if enough focus and attention are not spent on optimizing the queries, response time of those queries could be unacceptable.

To overcome such scenario, in this white paper we will try to explain few key concepts. This can be used as appropriate for controlling Snowflake usage cost along with performance at its best. By implementing several of these levers we could curtail our Snowflake usage cost by 15% to 20%.

Snowflake Cache

Snowflake comprises of three different type of cache

a) Metadata cache

b) Warehouse cache

c) Result cache

Snowflake Architecture

Metadata cache

Metadata cache is used to optimize the performance of cloud services layer. Whenever a user submits a query it needs to be validated both syntactically as well semantically. In order to do that Snowflake execution engine needs to know the data dictionary for the account and such information’s are cached at metadata layer. Also, any parsed query plan is cached and so next time when a user executes similar query previously parsed query plan will be re-used for faster compilation time. This is something maintained and optimized by Snowflake itself and so it involves zero user intervention.

Warehouse cache

This sits inside the compute (middle) layer. Almost every query execution in Snowflake needs an active warehouse (exception being metadata driven queries like show tables and pre-computed aggregated queries like select count (*), select min(col) etc.). When a warehouse is provisioned like any other compute server it comes with predefined CPU, Memory and storage. Whenever a query is executed, associated data is retrieved from Cloud storage layer. This data is in turn stored on memory/disks associated with provisioned warehouse. Next time when another query needs same data it will be simply re-used from local disk. This is called local disk cache. The data which is not available in local disk cache will result in remote I/O.

As everyone is aware remote I/O is a very costly operation and could be time consuming. So more the usage of warehouse cache, faster will be query response. Unlike many other databases you cannot directly dictate the size as well content of warehouse cache. However, you should be made aware that, size of warehouse cache is directly proportional to size of warehouse used. For example, cache associated with X-Small sized warehouse would be 128 times smaller than corresponding cache associated with 4XL sized warehouse. This is one of the reasons why one should use adequately sized warehouse for better performance when a query being executed have to deal with humongous data size.

Whenever a warehouse is suspended corresponding cache gets cleared immediately. Snowflake doesn’t guarantee existence of cache even if you resume right after suspending a warehouse. This brings another critical point of how often one should auto suspend a warehouse. Default suspension time is 10 minutes. However, one should consider a tradeoff between cost and performance when it comes to warehouse suspension. If it is set too aggressive the purpose will be defeated as same data needs to be cached again and again. Hence following points should be considered before designing a warehouse suspension time.

a) How often queries will be executed

b) How large is the data set involved

c) How much of overlap exists on tables used among similar queries

Figure-A depicts query profile of a query which fired on a cold warehouse (which was in suspended status until query execution started). It took 6 seconds and 80% of time has been spent on I/O operation and Percentage scanned from cache shows 0% which means no warehouse cache has been used in this case.

Similar query has been submitted again (Figure-B) while warehouse was warm (before gets suspended) and it got completed in just 2 seconds. In this case 60% of data scan has been used in cache which resulted in 66% boost to query response time. This signifies that in most of the cases increase of cache efficiency percentage will be directly proportional to increase in query response time.

Result cache

Result cache is a concept by which Snowflake re-use persisted result when same query is submitted. Result is persisted for a period of 24 hours up to a maximum of 31 days. However, result cache works only if the query triggered are exactly identical in nature, exception being white space and new line characters. Even if there is case difference between the queries, result cache will not be leveraged. For instance, result from “select col1 from tab1” will not be reused while running query “SELECT COL1 FROM TAB1”.

In above example Figure-A shows the query which ran (first time in past 24 hours) for 4.5 sec and when the same query got triggered again later (Figure-B) it returned result in couple of milli seconds and query profile shows result cache has been used.

Result cache will be used across users/roles as long as different roles querying data has access to all underlying tables and submitting identical. For example if user A queries data X through Role R , within 24 hours if user B queries same data X through Role S result cache will still be used and results will be returned without actually executing the query.

Warehouse

Every query which needs compute resources (CPU, Memory and Storage) will need a warehouse. Snowflake warehouse comes in t-shirt sizes. The table below depicts the size of warehouses, associated resource availability and credit/cost implication.

While larger the warehouse size, more associated resources there is a mis conception that larger the warehouse, better the query response time. This is not true always. Upsizing a warehouse must be a taken up holistically as associated cost incurred will also be high. Warehouses are sized from two dimensions.

1) Concurrency

2) Complexity

Each warehouse will in turn be a cluster as well. Cluster size determines how many instances of similar warehouses will be running in parallel. This comes handy especially when system is expected to undergo huge concurrent load. Value for max cluster can be set to a number greater than one. This way whenever high loads hit the system for same warehouse additional warehouses will be started automatically. They also get stopped automatically based on auto suspension setting to minimum cluster size when load decreases.

On other hand complex queries which involves huge data sets or large number of joins might need more compute resources to execute such query. This will demand upsizing the warehouse size. Upsizing is always manual and one can use “ALTER WAREHOUSE” command to change the warehouse size. However, every time when you scale up a warehouse associated cache will get wiped out. So careful attention should be paid before doing so and it is better to size the warehouse based on workload and leave it to run on same size until significant change in workload happens.

Whenever a warehouse get started, a minimum of one minute will be billed. There after it is billed for every second for which warehouse is up. So, setting a suspension time of less than a minute does not make any sense. Always one must consider cost vs cache before deciding a number to set for auto suspension.

Just to summarize whenever a warehouse is designed for specific workload 4 different dimensions should be considered. They are

a) Dataset volume

b) Complexity of query

c) Associate query response time SLA

d) Cost to run query

It is easy to optimize in one dimension causing sub-optimization on other.

Clustering

Clustering plays a crucial role when it comes to query optimization. Every table in Snowflake is partitioned by default. These are called micro partitions. Every micro partition is designed to store between 50MB to 500 MB uncompressed. When a table has no explicit partition key (clustering key) defined then it gets partitioned based on natural key or based on order of insertion.

A clustering key can be defined based on single column or more than one column (composite columns). Also, functions can be used while defining a partition key. If you are defining composite column-based partition key, then it is always advised to choose the order of columns from least cardinality to most cardinality. At any given point health of an partitioning key can be evaluated by “Clustering depth” and “Overlaps”. SYSTEM$CLUSTERING_INFORMATION can be invoked to health check a clustering key on any given table.

While choosing a clustering column one must consider the query pattern that involves on the tables that are being clustered. It is best bet to choose a column which often used in filters and joins. Efficiency of clustering key can be determined easily from associated query profile.

Figure — A

Figure – A shows exact queries ran back-to-back, one prior to clustering and second one post clustering on a huge table which has billions of rows. Figure – B shows query profile of execution before clustering and Figure – C shows query profile of execution post clustering. As you may notice post clustering response time came down from 5.3 seconds to milli seconds. This can be inferred from query profile section of Partitions total to Partitions scanned. This is called partition pruning. In other words, Query without clustering key scanned 52% of partitions whereas with clustering key it scanned only 7.5% of partition. This explains the 134% boost in query performance as result of clustering by appropriate key.

While clustering can improve the performance, one must consider associated re-clustering cost as well. Once a clustering key is defined on a table, every DML that happens on the table will demand a reclustering operation which needs compute and hence associated cost. Snowflake is capable of auto clustering and it scans for changes and recluster in background. However associated cost for such operation is passed back to the customer. One way to minimize this cost is while inserting data, check if it can be inserted on the same order as that of clustering key definition. Another tip is if table undergoes way too many changes in a day, auto clustering can be suspended using alter table command and resumed once a day to do cumulative clustering post bunch of changes.

Materialized view

Materialized views are physical structure which stores pre-computed/aggregated data from a physical table. Whenever underlying physical table undergoes a DML, associated materialized view also needs to be refreshed to have data consistency. While this is taken care automatically by Snowflake, it does come with associated compute cost. While this white paper is being published Snowflake doesn’t support materialized view which involves joins and so it always must be 1-1. Also, Snowflake has released a new feature by which Snowflake optimizer is capable of using a materialized view (if one exists) against a table automatically if same aggregations as that of materialized view definition is being performed in the query. Below are some of useful pointers that should be considered before creating a materialized view.

Based on history find out how many times an aggregation query is getting triggered, associated compute cost for execution without materialized view. Create a materialized view and find out associated compute cost with materialized view and add this to materialized view refresh cost. If summation of both cost is less than original compute cost, then it is a good use case for creation of materialized view

If aggregated data can be preserved as separate physical table as part of ETL process and associated compute cost to build such table is less than materialized view refresh cost, it is best to do the same. This can help to save cost without compromising performance.

Query History

Snowflake’s query_history is a view shared from Snowflake’s internal metadata database. It has entries for each and every query that runs against a Snowflake system and it is a great resource to optimize cost based on query pattern and history. It can be queried either from snowflake.information_schema or snowflake.account_usage schema. Given below are subset of columns from this view which provides useful insight and thus can help to improve query performance and reduce associated Snowflake cost at system level.

WAREHOUSE_SIZE – This along with TOTAL_ELAPSED_TIME can be used to determine approximate cost of a query
EXECUTION_STATUS – Can help to identify cost associated with failed queries.
PERCENTAGE_SCANNED_FROM_CACHE – Useful to validate size and uptime of a warehouse
PARTITIONS_SCANNED – This along with PARTITIONS_TOTAL can help to identify efficiency of pruning and hence need to cluster a table.
QUEUED_PROVISIONING_TIME – This along with QUEUED_OVERLOAD_TIME can help in designing multi cluster warehouse.
QUERY_LOAD_PERCENT – Useful to determine warehouse load

Best Practices

Given below are some of the best practices based on our experience which can help to reduce overall Snowflake license cost.

Optimize storage cost

Review data retention period for all permanent tables in Snowflake. While Snowflake comes with a very cool feature of time travel out of box, do remember there is a cost associated with this. Greater the number of days for which you retain history, larger will be the storage cost. Hence always consider the data set size, amount of changes those data set undergo and business requirement for time travel before arriving at optimal value for data retention. It is ideal to set different time travel duration for different type of databases.

Also be aware of different types of tables that snowflake provides out of the box and associated properties. For instance, table which doesn’t need resiliency can be created of type “Transient” which can help to save associated storage cost. Tables which are created only for a particular session can be of type “Temporary” which gets dropped automatically at end of the session

Zero copy cloning is a very cool feature available right out of the box. This can be used to keep multiple copies of data at any given point in time with zero to minimal storage cost depending on changes that respective base table undergoes.

Whenever you drop a table still data will be retained till the duration set for time travel using DATA_RETENTION_TIME_IN_DAYS parameter. Now if you know you don’t want to retain time travel for a table that is getting dropped for sure, then it is better to use alter table command and set DATA_RETENTION_TIME_IN_DAYS=0. Then proceeding with dropping the table so that time travel will be purged immediately and associated storage cost can be saved.

Optimize compute cost

Always keep a checkpoint on whether the right sized warehouse is being used. Just to give a perspective in one of the use cases a single sized warehouse was used for ingesting data from source system to Snowflake. Ingestion involves different query types like Put, Copy, Insert, Merge, Select, Delete etc. Since this involves different sized data set (some are look up tables with less than 1000 rows whereas some are transaction tables with billions of rows) one size wasn’t proved to be optimal fit for all. So, solution has been modified to use different sized warehouse for different tables dynamically depending on underlying data size and this resulted in $375,000 USD savings per year. While it is not always straight forward to arrive at correct size at very first time this can be very well be a trial-and-error process where optimal size can be determined after multiple iterations. Query History section of this document gives an idea on how to pick a candidate to optimize and play around with that query to arrive at optimal warehouse size.

Efficient query techniques

Table join efficacy

Self-join will always be a costly operation especially if table is huge. Check the query which involves such operation and look at possible ways to avoid the same
Many to Many joins should be avoided under all scenarios. While joining two tables on key columns make sure they are either one to many or many to one. Reason being many to many will result in product or popularly known as cartesian join and hence enormous number of rows in result which has to be passed to subsequent step.

Aggregation

Aggregation operations such as group by, order by and distinct is always a costly operation, especially when result dataset is huge. While these operations cannot be avoided completely look at using them efficiently. For instance, if the underlying table don’t have any duplicates then there is no need to use distinct or group by. Using such operation will only deteriorate query performance and make it costly.

Filter efficiency

Any in equality usage in filter will beat the purpose of clustering and will result in full table scan. Example of such operations are <>, NOT IN, NOT LIKE
Any usage of functions in filter will not use clustering and result in full table scan. A potential solution for such use case would be using the function itself while defining clustering key. For example, if a column employee_name has UPPER function always while filtering, then cluster the table using UPPER(employee_name) instead of just employee_name

JSON parsing

Usage of PARSE_JSON while extracting JSON data into variant is an additional overhead and will result expensive. Directly extracting JSON data into variant is supported out of the box.
Usage of “Lateral flatten” function to traverse through JSON elements will result in optimized performance to traverse through JSON elements in a query.

Common Table Expressions (CTE)

Usage of common table expression will always be preferred to store intermediate result and use it down the line rather than creating temporary/transient tables.
Since CTE is entirely memory-based operation huge I/O savings can be reaped compared to physicalizing such data.

Out of memory (OOM) error

At times when an inefficient table join results in billions of rows, Snowflake query execution might fail with Out of memory error. If such an incident occurs always examine the data in underlying tables, join conditions, filters used, clustering key efficiency and not but not least size of warehouse used.

Union Operation

Union operator will always perform better than “OR” operation. For instance, if you have a filter which says org_id=1 or bu_id=1 same can be rewritten as two queries with an union operator in between one with filter org_id=1 and other with a filter of bu_id=1
“Union all” always performs better than union. Union should only be used if you want to filter the duplicates in the result set. Otherwise by default union all should only be used in place of union.

Insert Operation

Row by row insert is always a very costly operation. Not only it is inefficient, but it results in keeping the warehouse up with least load and hence result in burning credit cost.
Best method to do batch inserts would be leveraging “COPY INTO” Operation
If “COPY INTO” is not possible, then next best alternate would be Multi row inserts.

Custom Dashboards

While Snowflake provides multiple views to have a check at credits utilized, it is best practice to build custom dashboards for easy consumption.
For instance Warehouse credit usage can be queried from metering_history view for hourly usage. However it may be not be easily consumable and trend analysis could be difficult.
A custom dashboard can be built on top of this data showing trends warehouse wise/date wise. It will help end users to identify their usage pattern and fine tune them accordingly.
Individual dashboards can be built to find pattern of Storage cost, Clustering cost, Materialized view refresh cost, Overall Snowflake cost etc.
Usage of query tag is highly recommended so that work loads can be grouped logically and associated cost can be derived upon.