Cloud Storage Unload
    • Dark
      Light

    Cloud Storage Unload

    • Dark
      Light

    Article Summary

    This article is specific to the following platforms - Snowflake - BigQuery.

    Cloud Storage Unload

    The Cloud Storage Unload component writes files from a table into a specified Google Cloud Storage (GCS) bucket.

    For Google BigQuery users: this component cannot be used to unload views to Google BigQuery. Users wanting to unload views should first create a table with that view's metadata using a Create Table component. Next, use a Table Input component to select your view, then connect it to a Table Output component to copy the data to the new table. Finally, use your new table in the Cloud Storage Unload component.

    For Snowflake users: this component can be used to unload views to your Snowflake data warehouse.


    Properties

    Snowflake Properties

    PropertySettingDescription
    NameStringA human-readable name for the component.
    StageSelectSelect your stage. To learn more about stages in Matillion ETL, read Manage Stages.
    Google Storage URL LocationGCS BucketInput, or select via the file path tree, the URL of the Google Storage bucket from which to retrieve files. This follows a format such as gs://<bucket-name>/<path>.
    When a user enters a forward slash character / after a folder name, a validation of the file path is triggered. This works in the same manner as the Go button.
    File PrefixStringA prefix added to files during the unload. The default prefix is data
    Storage IntegrationSelectSelect the storage integration. Storage integrations are required to permit Snowflake to read data from and write to a cloud storage location. Integrations must be set up in advance of selecting them in Matillion ETL. To learn more about setting up a storage integration, read Storage Integration Setup Guide.
    Note: Storage integrations can be configured to support Amazon S3, Google Cloud Storage, or Microsoft Azure cloud storage regardless of the cloud provider that hosts your Snowflake account.
    WarehouseSelectA Snowflake warehouse to perform the data load. The special value, [Environment Default], uses the warehouse specified in the Matillion ETL environment. For more information about warehouses, please refer to the Snowflake documentation.
    DatabaseSelectA Snowflake database to create the new table in. The special value, [Environment Default], uses the database specified in the Matillion ETL environment. For more information about databases, please refer to the Snowflake documentation.
    SchemaSelectA Snowflake schema. A Snowflake schema is a logical grouping of database objects such as tables or views. The special value, [Environment Default], uses the schema specified in the Matillion ETL environment.
    Target TableSelectChoose a target table to unload data from into a GCS bucket.
    Warning: This table will be recreated and drop any existing table of the same name upon each run.
    FormatSelectA named file format that describes a set of staged data to access or load into Snowflake tables. The default is [Custom].
    File TypeSelectThe file type for the file format. Available file types are: CSV, JSON, and PARQUET.
    Note: Component properties beneath this property will change depending on the chosen file type. Matillion ETL provides settings based on the specific file type.
    For additional information on file type options, please refer to the Snowflake documentation.
    CompressionSelectSelect the compression method if you wish to compress your data. If you do not wish to compress at all, select NONE. The default setting is AUTO. For more information, please refer to the relevant Snowflake documentation per your file type:
    Nest ColumnsBoolean(JSON only) When "True", columns are nested. Default setting is "False".
    Record DelimiterString(CSV only) Input a delimiter for records. This can be one or more singlebyte or multibyte chatracters that separate records in a file.
    Notes:
    • Accepted characters include common escape sequences, octal values (prefixed by \\\\), or hex values (prefixed by 0x). Also accepts a value of NONE.
    • This delimiter is limited to a maximum of 20 characters.
    • While multi-character delimiters are supported, the record delimiter cannot be a substring of the field delimiter, and vice versa. For example, if the record delimiter is "aa", the field delimiter cannot be "aabb".
    • The specified delimiter must be a valid UTF-8 character and not a random sequence of bytes.
    • Do not specify characters used for other file type options such as Escape or Escape Unenclosed Field.
    • Default setting is new line character.
    Field DelimiterString(CSV only) Input a delimiter for fields. This can be one or more singlebyte or multibyte characters that separate fields in a file.
    Notes:
    • Accepted characters include common escape sequences, octal values (prefixed by \\\\), or hex values (prefixed by 0x). Also accepts a value of NONE.
    • This delimiter is limited to a maximum of 20 characters.
    • While multi-character delimiters are supported, the field delimiter cannot be a substring of the record delimiter, and vice versa. For example, if the field delimiter is "aa", the record delimiter cannot be "aabb".
    • The specified delimiter must be a valid UTF-8 character and not a random sequence of bytes.
    • Do not specify characters used for other file type options such as Escape or Escape Unenclosed Field.
    • Default setting is a comma (,).
    Date FormatString(CSV only) Specify the date format. Snowflake accepts dates in the most common forms, including YYYY-MM-DD and DD-MON-YYYY among others. Additionally, all accepted timestamps are valid inputs for dates.
    Please refer to the Snowflake documentation for more information about date and time formats.
    Time FormatString(CSV only) Specify the time format. Snowflake supports a single TIME data type, for storing times in the form of HH:MI:SS. TIME supports an optional precision parameter for fractional seconds, e.g. TIME(3). Time precision ranges between 0 (seconds) to 9 (nanoseconds). The default precision is 9. Please refer to the Snowflake documentation for more information about time and date formats.
    Timestamp FormatString(CSV only) Specify the timestamp format. Snowflake supports three variations of timestamp: TIMESTAMP_LTZ, TIMESTAMP_NTZ, and TIMESTAMP_TZ. To learn more, please consult the Snowflake documentation.
    EscapeSelect(CSV only) Specify a single character to be used as the escape character for field values that are enclosed. Default is NONE.
    Escape Unenclosed FieldString(CSV only) Specify a single character to be used as the escape character for unenclosed field values only. Accepts common escape sequences, octal values, or hex values. Also accepts a value of NONE (default). Default is \\\\.
    If a character is specified in the "Escape" field, it will override this field.
    If you have set a value in the property Field Optionally Enclosed, all fields will become enclosed, rendering the Escape Unenclosed Field property redundant, in which case it will be ignored.
    Field Optionally EnclosedString(CSV only) Specify a character used to enclose strings. The value can be NONE, single quote character ('), or double quote character ("). To use the single quote character, use the octal or hex representation (0x27) or the double single-quoted escape (''). Default is NONE.
    Note: When a field contains one of these characters, escape the field using the same character. For example, to escape a string like this: 1 "2" 3, use double quotation to escape, like this: 1 ""2"" 3.
    Null IfStringSpecify a string to convert to SQL NULL values. In this field, users can specify a value that will be used when unloading the data to the chosen file. If an SQL NULL value is found, then the NULL value is replaced with the first value listed in this. For unloads, only the first string that you specify in this property is used.
    Trim SpaceBooleanWhen "True", removes white space from fields. Default setting is "False".
    OverwriteBooleanWhen "True", overwrites existing files with matching names in the location where files are stored. The option does not remove any existing files that do not match the names of the files that the COPY command unloads. Default setting is "False".
    Single FileBooleanWhen True, the unload will work in serial rather than parallel. This results in a slower unload but a single, complete file.
    The default setting is False.
    When True, no file extension is used in the output filename (regardless of the file type, and regardless of whether or not the file is compressed).
    When False, a filename prefix must be included in the path.
    Max File SizeNumberA number, greater than 0, that specifies the upper size limit (in bytes) of each file to be generated in parallel per thread.
    Note: the actual file size and number of files unloaded are determined by the total amount of data and number of nodes available for parallel processing.
    Include HeadersBooleanWhen "True" header rows are included. Default setting is "False".

    BigQuery Properties

    PropertySettingDescription
    NameStringA human-readable name for the component.
    ProjectSelectSelect the target BigQuery project to load data into. [Environment Default] is set as the default and uses the project defined in the environment.
    DatasetSelectSelect the target BigQuery dataset to load data into. [Environment Default] is set as the default and uses the dataset defined in the environment.
    For more information on Google Cloud Datasets, visit the official documentation
    TableSelectSelect the table from which data will be unloaded to the GCS bucket.
    Google Storage URL LocationFilepath | SelectSelect the Google Cloud Storage bucket. Users can click through the file tree, or use the URL template: gs://<bucket>/<path>.
    When a user enters a forward slash character / after a folder name, a validation of the file path is triggered. This works in the same manner as the Go button.
    Output Object NameStringSpecify a name for the output object (the object that will be created in the chosen GCS bucket).
    FormatSelectSelect the format of the data. Users can select one of: AVRO, CSV, JSON (New line delimited).
    Include HeaderYes | No(CSV format only) Select "Yes" to add a header line to the top of each file that has a column name. The default setting is "Yes".
    CompressionSelect(AVRO format only) Select the AVRO file format compression type. Options include: Deflate, Snappy, or no compression (None).
    (CSV, JSON formats only) Select whether or not output files are to be compressed via gzip compression.
    DelimiterDelimiting Character(CSV format only) Specify a delimiter character to separate columns. The default value is a comma ,
    A [TAB] character can be specified as "/t".