Ingest Excel files to Snowflake

Currently there is no Excel file format in Snowflake. Although Excel file cannot be natively loaded to Snowflake from Azure Blob Storage using copy into or snowpipe, it can be ingested using Snowpark with Python using pandas library.

Advantages of using Snowpark with Python:

✅ Ingesting files not supported by native Snowflake file formats
✅ Auto-Creating Table
✅ Any custom formatting supported by Python is available

Stored Procedure definition

--Create procedure
CREATE OR REPLACE PROCEDURE load_excel_file(INPUT_EXCEL_NAME STRING, INPUT_DESTINATION_TABLE STRING)
returns string not null
language python
runtime_version = '3.8'
packages = ('snowflake-snowpark-python', 'pandas', 'openpyxl') -- openpyxl required for pandas to read xlsx
imports = ('@AZURE_BLOB_STORAGE/EXCEL_FILE1.xlsx', '@AZURE_BLOB_STORAGE/EXCEL_FILE2.xlsx')
handler = 'load_excel_file_py'
as
$$

# Import the required modules
import pandas
import sys

# Retrieve the Snowflake import directory
IMPORT_DIRECTORY_NAME = "snowflake_import_directory"
import_dir = sys._xoptions[IMPORT_DIRECTORY_NAME]

# Define main function which load data
def load_excel_file_py(snowpark_session, excel_name, destination_table: str):
# Read excel using Pandas
excel_df = pandas.read_excel(import_dir + excel_name, destination_table)
# Replace headers (not supported characters)
excel_df.columns = excel_df.columns.str.lower().str.replace("#", "").str.replace("(", "").str.replace(")", "").str.replace("-", "_").str.replace("&", "_").str.replace(" ", "_").str.replace("/", "_")
# Convert columns to string 
excel_df = excel_df.astype(str)
# Write the results of the dataframe into a target table
snowpark_session.write_pandas(excel_df, destination_table, quote_identifiers= False, auto_create_table= True, overwrite=True)
# Return Success Message
return f"Succeeded: Results inserted into table {destination_table}"

$$
;

Execute Stored Procedure

Stored Procedured can be triggered either by business user or ETL processing. If Excel file is modified in Azure Blob Storage, executing procedure will re-load latest file version. Stored Procedure doesn’t need to be re-created. If there is new excel file to be loaded, it should be added first in import section of Stored Procedure.

call load_excel_file('EXCEL_FILE1.xlsx','Worksheet1');