Part 3:Top 100+ Snowflake Interview Questions and Answers: A Guide to Mastering the Snowflake Interview: 2023 (Real Time Interview Questions for Freshers & Experienced candidates)

Welcome back to our Snowflake Interview Question & Answers series! To make the most of this post, we recommend reading through Part 1 and Part 2 if you haven’t already done so

1.What is stream?

Ans.

Stream is an object in snowflake used for change data capturing.

Types of streams

STANDARD STREAMS: Capture all changes(INSERT, DELETE, UPDATE)

Append only Streams: Capture only insert related records

Insert only Streams: Can be created on only on external tables

2.What are the additional fields/columns that will get added in streams?

Ans.

Three additional fields will get added in streams namely

METADATA$ACTION

METADATA$ISUPDATE

METADATA$ROW_ID

3.How will stream behave in case of update on source table?

Ans.

Stream will put 2 entries in your stream

It will delete old record and insert new record

4.What is alternative for streams in snowflake?

Ans.

Enable change tracking on tables and use changes clause is an alternative for streams

5.What are limitations of Tasks?

Ans.

We can be able to call or run only one SQL statement in a Task

Earlier we happened to specify which warehouse to use for that manually, where now recently snowflake has introduced server less tasks (Snowflake Managed tasks)

6.Does Snowflake enforce constraints?

Ans.

We can create constraints but they won’t be enforced except NOT NULL constraint

7.What are different caches in Snowflake?

Ans.

There are three types of caches exist in snowflake namely

Result Cache

Metadata Cache

Warehouse Cache

8.If you rerun the same query next time where will the query get data from?

Ans.

Result cache is persisted for 24 hours unless base object is changed. If the same query is reran again, the results will be retrieved from result cache.

9.How many days the Result cache is valid?

Ans.

If the query is rerun after 23 hours that will extended by another 24 hours

If underlying base table data changes, then result cache gets invalidated

10.What is the maximum retention period?

Ans.

90 days is the maximum retention period available in Enterprise and above editions.

11.How to load historical data/old data/existing data in cloud to snowflake using snowpipe?

Ans.

ALTER PIPE REFRSH command will load last 7 days of historical data. Use copy command to load data older than 7 days (i.e say last 6 months data)

12.Will Snowpipe load duplicate files (same filename) into the table?

Ans.

No, it will maintain metadata information for 14 days

13.How snowpipe copy command is different from regular copy command?

Ans.

Regular COPY Command:

The default value for ON_ERROR is ABORT_STATEMENT

It doesn’t support FORCE=TRUE

It stores the metadata for 64 days

Snowpipe COPY Command:

The default value for ON_ERROR is SKIP_FILE

It does not support FORCE=TRUE

It stores the metadata for 14 days

14.What is cloning and what will happen when we clone an objects?

Can we perform DML operations on cloned tables? How will it impact original tables?

Ans.

Cloning is nothing but taking snap shot of any object without actually copying its data. Both original and cloned tables behave as independent tables and we can perform DML on any of these tables

15. For what purpose did you use Cloning?
Ans.
To create Lower environments (DEV, STAGE/QA) from production, we can use cloning.

16. What will happen if we clone a database?
Ans.
Internal stages, snowpipe’s referring internal stages won’t be cloned. All the database objects will get cloned.

17. What is Data Sharing?
Ans.
Secure data sharing is one of the feature in snowflake which enables sharing selected objects (tables, secure views, secure UDFs) in a database from one account to another snowflake account.

18. What are different types of Data shares?
Ans.
Data shares in snowflake are of two types
INBOUND SHARE: This share will automatically appear in CONSUMER Account and we can create shared database.
OUBOUND SHARE: This share has to be created on Provider account to share the data with the consumer.

19. Can we perform DML operations on objects in Data shared database?
Ans.
No, shared databases are read only. We cannot perform DML’s operations on the shared databases.

20. What is clustering in Snowflake?
Ans.
Clustering is a method used to maximize the performance to avoid the unnecessary scanning of micro-partitions.

21. How can we improve performance of query?
Ans.
Increase the warehouse size to improve the performance of query.
If the table is very huge create clustering key on the field that is used in joins, filter conditions

22. How do we check status of snowpipe?
Ans.
The status of the snowpipe can be viewed by running the following command
SELECT SYSTEM$PIPE_STATUS(‘<PIPE_NAME>’)

23. How to set up auto refersh for snowpipe?
Ans.
With Auto_ingest = TRUE in pipe and SQS/SNS notification from AWS, we can refresh pipe automatically.

24. What major challenges that you faced in creating stored procedures in snowflake?
Ans.
Data type issues are the challenges faced while creating stored procedures, since some of Javascript data types are not compatible with SQL data types like timestampntz to handle that snowflake has special data type => sfdate

25. How do we do error handling in stored procedure?
Ans.
Using Try catch block, we can handle errors in stored procedure.

26. Can we return a table in snowflake stored procedure?
Ans.
No, stored procedure returns a single row and single column, If we have to return a table then we can use table valued function

27. How data will be stored in snowflake?
Ans.
Data stored in snowflake in compressed columnar format in micro-partitions each of size 50-500MB

28. How do we establish connection between s3 to snowflake?
Ans.
By creating storage integration and external stage, we can integrate snowflake with AWS S3.

29. How to set up auto refresh for external tables?
Ans.
Set AUTO_ refresh =TRUE on external table and set up SQS notification on AWS S3 to set up auto refresh for external tables.

30. How do we load semi structured data into snowflake?
Ans.
We have special data type called variant which can hold semi structured data and we can load nested JSON into relational tables using lateral flatten function

31. What is the output of Select try_to_number(‘10’), try_to_number(‘12_A’)
Ans.
10, null

32.How the sequence in snowflake behaves in the following
Select SEQ1.nextval, SEQ1.nextval;
Ans.
Output of the given statement is 1,2
Where as in RDBMS it will be 1,1

86. Is TRUNCATE a DDL or DML command in snowflake?
Ans.
INSERT, UPDATE, DELETE, MERGE and TRUNCATE are DML commands in snowflake.
Truncate is a DDL command in snowflake

33. Write a query to get the complete name of an employee from the first name, middle name and last name?

Ans.

Select coalesce(first_name,middle_name,last_name) from emp;

34. What is the output of the following query SELECT CONCAT(123,NULL);
Ans.
NULL

35. What is the default value for MAX_CONCURRENCY_LEVEL ?
Ans.
The maximum number of concurrent or parallel statements a warehouse can execute is called MAX_CONCURRENCY_LEVEL.
The default value for MAX_CONCURRENCY_LEVEL is 8
36. What are the different types of connectors supported by snowflake?
Ans.
There are different types of connectors that snowflake support.
• JDBC Driver
• ODBC Driver
• Node.js Driver
• Python Connector
• .NET Driver
• Spark Connector (spark 3.1,3.2,3.3)
• Kafka Connector

37. What is the alternative way of capturing DMLs on any object other than streams?
Ans.
Enabling change tracking can help in capturing DML’s on any object.

38. What are the copy options which are not supported in snowpipe?
Ans.
The following options are not supported for snowpipe
• On_error= abort_statement
• Force=TRUE/FALSE
• Purge=TRUE/FALSE
• Validation_mode= return_errors/return_all_errors/return_n_errors
• Size_limit=Num

39. How can you resume a staled pipe?
Ans.
If any snowpipe be paused for more than 14 days, it will become stale.
A staled snowpipe can be resumed using the following command
SELECT SYSTEM$PIPE_FORCE_RESUME(‘pipe_name’,’staleness_check_override’)

40. How can you delete the files explicitly from a stage?
Ans.
Files for internal/external stages can be explicitly deleted as follows
remove @%table_stage –to remove from a table stage
remove @~ –to remove from a user stage
remove @int_stage/ext_stage –to remove from an internal stage/external stage

41. How to disable the result cache in snowflake?
The result cache can be disabled in snowflake as follows
ALTER SESSION SET USE_CACHED_RESULT=FALSE;

42. What is remote disk spillage? How do you resolve?
Ans.
If the available memory of a virtual warehouse is not sufficient to hold the data, then it starts spilling data to remote storage. This is called remote disk spillage.
We can avoid remote disk spillage by resizing warehouse to larger one.

Please Share This Share this content

You Might Also Like

Maximizing Data Analytics Efficiency: A Comprehensive Guide to Snowflake Database Optimization Techniques

Connecting to a Snowflake Database Using Shell Scripts: A Step-by-Step Guide

Snowflake Connector for Python: A Step-by-Step Guide to Connect and Query Your Data

Leave a Reply Cancel reply

Share this content