Cloud Storage Unload

Dark
Light

Article Summary

Share feedback

Thanks for sharing your feedback!

This article is specific to the following platforms - Snowflake - BigQuery.

Cloud Storage Unload

The Cloud Storage Unload component writes files from a table into a specified Google Cloud Storage (GCS) bucket.

For Google BigQuery users: this component cannot be used to unload views to Google BigQuery. Users wanting to unload views should first create a table with that view's metadata using a Create Table component. Next, use a Table Input component to select your view, then connect it to a Table Output component to copy the data to the new table. Finally, use your new table in the Cloud Storage Unload component.

For Snowflake users: this component can be used to unload views to your Snowflake data warehouse.

Properties

Snowflake Properties
Property	Setting	Description
Name	String	A human-readable name for the component.
Stage	Select	Select your stage. To learn more about stages in Matillion ETL, read Manage Stages.
Google Storage URL Location	GCS Bucket	Input, or select via the file path tree, the URL of the Google Storage bucket from which to retrieve files. This follows a format such as gs://<bucket-name>/<path>. When a user enters a forward slash character / after a folder name, a validation of the file path is triggered. This works in the same manner as the Go button.
File Prefix	String	A prefix added to files during the unload. The default prefix is `data`
Storage Integration	Select	Select the storage integration. Storage integrations are required to permit Snowflake to read data from and write to a cloud storage location. Integrations must be set up in advance of selecting them in Matillion ETL. To learn more about setting up a storage integration, read Storage Integration Setup Guide. Note: Storage integrations can be configured to support Amazon S3, Google Cloud Storage, or Microsoft Azure cloud storage regardless of the cloud provider that hosts your Snowflake account.
Warehouse	Select	A Snowflake warehouse to perform the data load. The special value, [Environment Default], uses the warehouse specified in the Matillion ETL environment. For more information about warehouses, please refer to the Snowflake documentation.
Database	Select	A Snowflake database to create the new table in. The special value, [Environment Default], uses the database specified in the Matillion ETL environment. For more information about databases, please refer to the Snowflake documentation.
Schema	Select	A Snowflake schema. A Snowflake schema is a logical grouping of database objects such as tables or views. The special value, [Environment Default], uses the schema specified in the Matillion ETL environment.
Target Table	Select	Choose a target table to unload data from into a GCS bucket. Warning: This table will be recreated and drop any existing table of the same name upon each run.
Format	Select	A named file format that describes a set of staged data to access or load into Snowflake tables. The default is [Custom].
File Type	Select	The file type for the file format. Available file types are: CSV, JSON, and PARQUET. Note: Component properties beneath this property will change depending on the chosen file type. Matillion ETL provides settings based on the specific file type. For additional information on file type options, please refer to the Snowflake documentation.
Compression	Select	Select the compression method if you wish to compress your data. If you do not wish to compress at all, select NONE. The default setting is AUTO. For more information, please refer to the relevant Snowflake documentation per your file type: CSV compression JSON compression PARQUET compression
Nest Columns	Boolean	(JSON only) When "True", columns are nested. Default setting is "False".
Record Delimiter	String	(CSV only) Input a delimiter for records. This can be one or more singlebyte or multibyte chatracters that separate records in a file. Notes: Accepted characters include common escape sequences, octal values (prefixed by \\\\), or hex values (prefixed by 0x). Also accepts a value of NONE. This delimiter is limited to a maximum of 20 characters. While multi-character delimiters are supported, the record delimiter cannot be a substring of the field delimiter, and vice versa. For example, if the record delimiter is "aa", the field delimiter cannot be "aabb". The specified delimiter must be a valid UTF-8 character and not a random sequence of bytes. Do not specify characters used for other file type options such as Escape or Escape Unenclosed Field. Default setting is new line character.
Field Delimiter	String	(CSV only) Input a delimiter for fields. This can be one or more singlebyte or multibyte characters that separate fields in a file. Notes: Accepted characters include common escape sequences, octal values (prefixed by \\\\), or hex values (prefixed by 0x). Also accepts a value of NONE. This delimiter is limited to a maximum of 20 characters. While multi-character delimiters are supported, the field delimiter cannot be a substring of the record delimiter, and vice versa. For example, if the field delimiter is "aa", the record delimiter cannot be "aabb". The specified delimiter must be a valid UTF-8 character and not a random sequence of bytes. Do not specify characters used for other file type options such as Escape or Escape Unenclosed Field. Default setting is a comma (,).
Date Format	String	(CSV only) Specify the date format. Snowflake accepts dates in the most common forms, including YYYY-MM-DD and DD-MON-YYYY among others. Additionally, all accepted timestamps are valid inputs for dates. Please refer to the Snowflake documentation for more information about date and time formats.
Time Format	String	(CSV only) Specify the time format. Snowflake supports a single TIME data type, for storing times in the form of HH:MI:SS. TIME supports an optional precision parameter for fractional seconds, e.g. TIME(3). Time precision ranges between 0 (seconds) to 9 (nanoseconds). The default precision is 9. Please refer to the Snowflake documentation for more information about time and date formats.
Timestamp Format	String	(CSV only) Specify the timestamp format. Snowflake supports three variations of timestamp: TIMESTAMP_LTZ, TIMESTAMP_NTZ, and TIMESTAMP_TZ. To learn more, please consult the Snowflake documentation.
Escape	Select	(CSV only) Specify a single character to be used as the escape character for field values that are enclosed. Default is NONE.
Escape Unenclosed Field	String	(CSV only) Specify a single character to be used as the escape character for unenclosed field values only. Accepts common escape sequences, octal values, or hex values. Also accepts a value of NONE (default). Default is \\\\. If a character is specified in the "Escape" field, it will override this field. If you have set a value in the property Field Optionally Enclosed, all fields will become enclosed, rendering the Escape Unenclosed Field property redundant, in which case it will be ignored.
Field Optionally Enclosed	String	(CSV only) Specify a character used to enclose strings. The value can be NONE, single quote character ('), or double quote character ("). To use the single quote character, use the octal or hex representation (0x27) or the double single-quoted escape (''). Default is NONE. Note: When a field contains one of these characters, escape the field using the same character. For example, to escape a string like this: 1 "2" 3, use double quotation to escape, like this: 1 ""2"" 3.
Null If	String	Specify a string to convert to SQL NULL values. In this field, users can specify a value that will be used when unloading the data to the chosen file. If an SQL NULL value is found, then the NULL value is replaced with the first value listed in this. For unloads, only the first string that you specify in this property is used.
Trim Space	Boolean	When "True", removes white space from fields. Default setting is "False".
Overwrite	Boolean	When "True", overwrites existing files with matching names in the location where files are stored. The option does not remove any existing files that do not match the names of the files that the COPY command unloads. Default setting is "False".
Single File	Boolean	When True, the unload will work in serial rather than parallel. This results in a slower unload but a single, complete file. The default setting is False. When True, no file extension is used in the output filename (regardless of the file type, and regardless of whether or not the file is compressed). When False, a filename prefix must be included in the path.
Max File Size	Number	A number, greater than 0, that specifies the upper size limit (in bytes) of each file to be generated in parallel per thread. Note: the actual file size and number of files unloaded are determined by the total amount of data and number of nodes available for parallel processing.
Include Headers	Boolean	When "True" header rows are included. Default setting is "False".

BigQuery Properties
Property	Setting	Description
Name	String	A human-readable name for the component.
Project	Select	Select the target BigQuery project to load data into. [Environment Default] is set as the default and uses the project defined in the environment.
Dataset	Select	Select the target BigQuery dataset to load data into. [Environment Default] is set as the default and uses the dataset defined in the environment. For more information on Google Cloud Datasets, visit the official documentation
Table	Select	Select the table from which data will be unloaded to the GCS bucket.
Google Storage URL Location	Filepath \| Select	Select the Google Cloud Storage bucket. Users can click through the file tree, or use the URL template: `gs://<bucket>/<path>`. When a user enters a forward slash character / after a folder name, a validation of the file path is triggered. This works in the same manner as the Go button.
Output Object Name	String	Specify a name for the output object (the object that will be created in the chosen GCS bucket).
Format	Select	Select the format of the data. Users can select one of: AVRO, CSV, JSON (New line delimited).
Include Header	Yes \| No	(CSV format only) Select "Yes" to add a header line to the top of each file that has a column name. The default setting is "Yes".
Compression	Select	(AVRO format only) Select the AVRO file format compression type. Options include: Deflate, Snappy, or no compression (None). (CSV, JSON formats only) Select whether or not output files are to be compressed via gzip compression.
Delimiter	Delimiting Character	(CSV format only) Specify a delimiter character to separate columns. The default value is a comma , A [TAB] character can be specified as "/t".

What's Next

Cloud Storage Load

Table of contents

Cloud Storage Unload
Properties
Snowflake Properties
BigQuery Properties

Cloud Storage Unload

Cloud Storage Unload

Properties

Snowflake Properties

BigQuery Properties

What's Next