Welcome back to our Snowflake Interview Question & Answers series! To make the most of this post, we recommend reading through Part 1 and Part 2 if you haven’t already done so
1.What is stream?
Ans.
Stream is an object in snowflake used for change data capturing.
Types of streams
STANDARD STREAMS: Capture all changes(INSERT, DELETE, UPDATE)
Append only Streams: Capture only insert related records
Insert only Streams: Can be created on only on external tables
2.What are the additional fields/columns that will get added in streams?
Ans.
Three additional fields will get added in streams namely
METADATA$ACTION
METADATA$ISUPDATE
METADATA$ROW_ID
3.How will stream behave in case of update on source table?
Ans.
Stream will put 2 entries in your stream
It will delete old record and insert new record
4.What is alternative for streams in snowflake?
Ans.
Enable change tracking on tables and use changes clause is an alternative for streams
5.What are limitations of Tasks?
Ans.
We can be able to call or run only one SQL statement in a Task
Earlier we happened to specify which warehouse to use for that manually, where now recently snowflake has introduced server less tasks (Snowflake Managed tasks)
6.Does Snowflake enforce constraints?
Ans.
We can create constraints but they won’t be enforced except NOT NULL constraint
7.What are different caches in Snowflake?
Ans.
There are three types of caches exist in snowflake namely
Result Cache
Metadata Cache
Warehouse Cache
8.If you rerun the same query next time where will the query get data from?
Ans.
Result cache is persisted for 24 hours unless base object is changed. If the same query is reran again, the results will be retrieved from result cache.
9.How many days the Result cache is valid?
Ans.
If the query is rerun after 23 hours that will extended by another 24 hours
If underlying base table data changes, then result cache gets invalidated
10.What is the maximum retention period?
Ans.
90 days is the maximum retention period available in Enterprise and above editions.
11.How to load historical data/old data/existing data in cloud to snowflake using snowpipe?
Ans.
ALTER PIPE REFRSH command will load last 7 days of historical data. Use copy command to load data older than 7 days (i.e say last 6 months data)
12.Will Snowpipe load duplicate files (same filename) into the table?
Ans.
No, it will maintain metadata information for 14 days
13.How snowpipe copy command is different from regular copy command?
Ans.
Regular COPY Command:
The default value for ON_ERROR is ABORT_STATEMENT
It doesn’t support FORCE=TRUE
It stores the metadata for 64 days
Snowpipe COPY Command:
The default value for ON_ERROR is SKIP_FILE
It does not support FORCE=TRUE
It stores the metadata for 14 days
14.What is cloning and what will happen when we clone an objects?
Can we perform DML operations on cloned tables? How will it impact original tables?
Ans.
Cloning is nothing but taking snap shot of any object without actually copying its data. Both original and cloned tables behave as independent tables and we can perform DML on any of these tables
15. For what purpose did you use Cloning?
Ans.
To create Lower environments (DEV, STAGE/QA) from production, we can use cloning.
16. What will happen if we clone a database?
Ans.
Internal stages, snowpipe’s referring internal stages won’t be cloned. All the database objects will get cloned.
17. What is Data Sharing?
Ans.
Secure data sharing is one of the feature in snowflake which enables sharing selected objects (tables, secure views, secure UDFs) in a database from one account to another snowflake account.
18. What are different types of Data shares?
Ans.
Data shares in snowflake are of two types
INBOUND SHARE: This share will automatically appear in CONSUMER Account and we can create shared database.
OUBOUND SHARE: This share has to be created on Provider account to share the data with the consumer.
19. Can we perform DML operations on objects in Data shared database?
Ans.
No, shared databases are read only. We cannot perform DML’s operations on the shared databases.
20. What is clustering in Snowflake?
Ans.
Clustering is a method used to maximize the performance to avoid the unnecessary scanning of micro-partitions.
21. How can we improve performance of query?
Ans.
Increase the warehouse size to improve the performance of query.
If the table is very huge create clustering key on the field that is used in joins, filter conditions
22. How do we check status of snowpipe?
Ans.
The status of the snowpipe can be viewed by running the following command
SELECT SYSTEM$PIPE_STATUS(‘<PIPE_NAME>’)
23. How to set up auto refersh for snowpipe?
Ans.
With Auto_ingest = TRUE in pipe and SQS/SNS notification from AWS, we can refresh pipe automatically.
24. What major challenges that you faced in creating stored procedures in snowflake?
Ans.
Data type issues are the challenges faced while creating stored procedures, since some of Javascript data types are not compatible with SQL data types like timestampntz to handle that snowflake has special data type => sfdate
25. How do we do error handling in stored procedure?
Ans.
Using Try catch block, we can handle errors in stored procedure.
26. Can we return a table in snowflake stored procedure?
Ans.
No, stored procedure returns a single row and single column, If we have to return a table then we can use table valued function
27. How data will be stored in snowflake?
Ans.
Data stored in snowflake in compressed columnar format in micro-partitions each of size 50-500MB
28. How do we establish connection between s3 to snowflake?
Ans.
By creating storage integration and external stage, we can integrate snowflake with AWS S3.
29. How to set up auto refresh for external tables?
Ans.
Set AUTO_ refresh =TRUE on external table and set up SQS notification on AWS S3 to set up auto refresh for external tables.
30. How do we load semi structured data into snowflake?
Ans.
We have special data type called variant which can hold semi structured data and we can load nested JSON into relational tables using lateral flatten function
31. What is the output of Select try_to_number(‘10’), try_to_number(‘12_A’)
Ans.
10, null
32.How the sequence in snowflake behaves in the following
Select SEQ1.nextval, SEQ1.nextval;
Ans.
Output of the given statement is 1,2
Where as in RDBMS it will be 1,1
86. Is TRUNCATE a DDL or DML command in snowflake?
Ans.
INSERT, UPDATE, DELETE, MERGE and TRUNCATE are DML commands in snowflake.
Truncate is a DDL command in snowflake
33. Write a query to get the complete name of an employee from the first name, middle name and last name?
Ans.
Select coalesce(first_name,middle_name,last_name) from emp;
34. What is the output of the following query SELECT CONCAT(123,NULL);
Ans.
NULL
35. What is the default value for MAX_CONCURRENCY_LEVEL ?
Ans.
The maximum number of concurrent or parallel statements a warehouse can execute is called MAX_CONCURRENCY_LEVEL.
The default value for MAX_CONCURRENCY_LEVEL is 8
36. What are the different types of connectors supported by snowflake?
Ans.
There are different types of connectors that snowflake support.
• JDBC Driver
• ODBC Driver
• Node.js Driver
• Python Connector
• .NET Driver
• Spark Connector (spark 3.1,3.2,3.3)
• Kafka Connector
37. What is the alternative way of capturing DMLs on any object other than streams?
Ans.
Enabling change tracking can help in capturing DML’s on any object.
38. What are the copy options which are not supported in snowpipe?
Ans.
The following options are not supported for snowpipe
• On_error= abort_statement
• Force=TRUE/FALSE
• Purge=TRUE/FALSE
• Validation_mode= return_errors/return_all_errors/return_n_errors
• Size_limit=Num
39. How can you resume a staled pipe?
Ans.
If any snowpipe be paused for more than 14 days, it will become stale.
A staled snowpipe can be resumed using the following command
SELECT SYSTEM$PIPE_FORCE_RESUME(‘pipe_name’,’staleness_check_override’)
40. How can you delete the files explicitly from a stage?
Ans.
Files for internal/external stages can be explicitly deleted as follows
remove @%table_stage –to remove from a table stage
remove @~ –to remove from a user stage
remove @int_stage/ext_stage –to remove from an internal stage/external stage
41. How to disable the result cache in snowflake?
The result cache can be disabled in snowflake as follows
ALTER SESSION SET USE_CACHED_RESULT=FALSE;
42. What is remote disk spillage? How do you resolve?
Ans.
If the available memory of a virtual warehouse is not sufficient to hold the data, then it starts spilling data to remote storage. This is called remote disk spillage.
We can avoid remote disk spillage by resizing warehouse to larger one.