You are currently viewing Part 1:Top 100+ Snowflake Interview Questions and Answers: A Guide to Mastering the Snowflake Interview:2023(Real Time Interview Questions for Freshers & Experienced candidates)

Part 1:Top 100+ Snowflake Interview Questions and Answers: A Guide to Mastering the Snowflake Interview:2023(Real Time Interview Questions for Freshers & Experienced candidates)

As data warehousing becomes increasingly critical for organizations of all sizes, Snowflake has emerged as a leading cloud-based platform for storing, processing, and analyzing large volumes of structured and semi-structured data. As a result, many companies are seeking to hire experienced Snowflake professionals to help manage and optimize their data operations.

If you are preparing for a Snowflake interview, it is essential to be familiar with the key concepts and features of the platform. In this post, we will cover the top 100 Snowflake interview questions and answers to help you prepare for your next interview.

1.What’s unique with snowflake architecture?
How snowflake architecture is different from traditional data bases(RDBMS like SQL Server, Oracle, My SQL) ?

Snowflake is a cloud-based data warehousing platform that enables organizations to store, process, and analyze large volumes of structured and semi-structured data in real-time. It differs from traditional data warehousing solutions in that it is built on a scalable, elastic, and multi-cluster architecture that allows it to handle massive data sets with ease. 

Snowflake has three layer architecture

Storage layer => that stores data in hybrid columnar format and saved in cloud

Query Processing => Virtual Warehouse to run queries/for compute resources (muscle)

Cloud Services => brain of snowflake architecture coordinates between storage and query processing layers

2.Why to go with Snowflake instead of other Cloud Data warehouses like AWS Redshift, Azure Synapse, Google Big query?

  • Snowflake is available on all cloud platforms like AWS, Azure and GCP (Multi cloud). Other cloud Data warehouses were restricted to that one cloud example: Redshift will be available in only AWS, Big Query will be available in GCP.
  • Unique features like Data Market place, Data Exchange, Zero Copy Clone, Time travel, Data sharing, Streams for CDC
  • Separate Storage and compute costs, Zero Administration/Maintenance (partitioning, data backups, index maintenance, performance tuning will be taken care by snowflake itself).
  • No software or hardware required to be installed or upgraded as snowflake is SaaS (Software as a service). Synapse and BigQuery are PaaS (Platform as a service)

3.What is Virtual Warehouse?

 Ans.

To perform any operations, we require resources. In snowflake, these resources are called virtual warehouse. 

A virtual warehouse is a cluster of compute nodes/machines

Cluster is a group of nodes/machines connected together

Compute Resources is a combination of CPU + Memory+ Temporary storage

4.Different sizes of warehouses?

 Ans.

XS(1) will have one compute nodes/machines, S(2) will have 2 compute nodes, M(4), L(8), XL(16), 2XL(32), 3XL(64), 4XL(128) — generally available

5XL(256) and 6 XL(512) are in preview state

5.When we create a warehouse from UI, What is default warehouse size?

 Ans.

XL(16) is the default warehouse size, when created from UI

 

6.When we create a warehouse in worksheet, what is default warehouse size?

 Ans.

XS(1) is the default warehouse size, when created in worksheet

 

7.What is the default Auto suspend time in a Warehouse?

Ans.

10 minutes is the default auto suspend time in a warehouse

8.What is Auto resume in Virtual Warehouse?

Ans.

If the warehouse is in suspended state and if we set Auto Resume as true then as soon as a query is ran using this warehouse it will automatically get resumed

9.Can we increase or decrease the warehouse size any time?

Ans.

Yes, on fly (instantly) it will get resized

10.What will happen to already running queries if we resize the warehouse?

 Ans.

Already running queries will still use the old warehouse size but subsequent queries in queue will run with the resized warehouse

11.Can we start or stop the warehouse any time?

Ans.  

Yes, suspend will stop it and resume will start it again

12.What is difference between scale up and scale out?

Ans.

Increasing the warehouse size is scale up and it’s to increase performance of complex queries and for large volumes of data (when remote disk spillage happens)

Adding clusters to a warehouse is scale out and it is to handle concurrency (parallel queries) issues

13.How to set a warehouse to auto scale mode?

 Ans.

While creating a warehouse If you set max_cluster_count (10)> min_cluster_count(1) then the Virtual warehouse will be in auto scale mode and

it will start with minimum number of clusters and scale out to maximum clusters based on workloads on demand

14.What is maximize mode?

Ans.

When the max_cluster_count and min_cluster_count are given as the same number, then that warehouse is said to be running a maximize mode.

Ex:-  

 Maximized Mode=> max_cluster_count(5)= min_cluster_count (5)

Snowflake starts all the clusters so that maximum resources are available while the warehouse is running.

15.What is horizontal scaling and vertical scaling?

Ans.

Scale up is named as Horizontal scaling i.e, increasing the compute capacity

Scale out is named as Vertical scaling i.e, increasing number of cluster in a warehouse

 

16.Can we automate scale up?

Ans.

No, we have to manual resize(increase/decrease) the warehouse

       Scaling out is an automated process taken care by snowflake based on your work load

17.What is a scaling policy?

Ans.

Scaling policy is used to determine when to start or shut down a warehouse.

Standard (default policy) – performance (starts immediately when a query is queued)

Economy — conserves credits (cluster starts only if it estimates there is enough query load to keep the cluster busy for at least 6 minutes)

18.What is time travel in snowflake?

Ans.

Time travel enables accessing historical data (i.e. data that has been changed or deleted) at any point within a defined period based on retention period. If we drop a table, with time travel we can restore it if it is with in retention period

19.What’s the maximum retention period for permanent tables?

Ans.

For Standard edition, maximum retention period is 1 day

For Enterprise edition and above, maximum retention period is 90 days

20.What’s the maximum retention period for temporary tables and transient tables?

Ans.

 Maximum Retention period for temporary tables and transient tables is 1 day

 

21.What’s the default retention period in snowflake?

Ans.

   1 day

22.What is failsafe?

Ans.

Fail-safe is a data recovery service intended only for use when all other recovery options have been attempted offers protection of data in case of disaster.

23.Can we configure failsafe period?

Ans.

 Its Non-configurable 7 days period and can be defined only for permanent

24.Can we disable Fail-safe in Snowflake?

Ans.

Fail safe cannot be disabled on a table

25.Can we interact/read/access data from in failsafe?

Ans.

No user interaction and recoverable only by snowflake, We need to approach snowflake team to get data from failsafe

26.What are the differences between Time Travel and Fail-safe?

Ans.

Time travel: Helps the user to access the historical data at any point within the retention period and restore the previous state of the table.

Fail safe: internally used by snowflake to restore the data during the times of hardware failure/disaster, user can’t access/read data from Fail safe

After Time travel period is over, next 7 days data will be stored in Snowflake

27.Will time travel contribute to additional storage costs?

Ans.

 Yes, since it has to maintain snapshots/historical data

28.Will failsafe contribute to additional storage costs?

Ans.

Yes, since it has to maintain historical data for 7 days

Leave a Reply