Cloud Storage Unload

Cloud Storage Unload

Creates files on a specified Google Cloud Storage Bucket, and loads them with data from an Google Cloud Platform table.



Properties

Property Setting Description
Name Text The descriptive name for the component.
Project Text The name of the Google Cloud Project the source table exists on.
Dataset Select Select the table dataset. The special value, [Environment Default] will use the dataset defined in the environment.
For more information on Google Cloud Datasets, visit the official documentation.
Table Text The table or view to unload to a Bucket.
Destination Path Text The URL of the GS bucket to load the data into.
Format Select CSV
JSON: This requires an additional "JSON Format".
Avro: This requires an additional "Avro Format".
Include Header Select Defaults to 'Yes'. Include a header line at the top of each file with column names.
Compression Select Output files can be compressed using GZIP compression if selected.

Example

In this example, we have a table of email data that we wish to back up on a bucket for long-term storage. But we also want to create a copy of the table to transform, leaving the original in tact. One of the many ways to do this is to unload the table to a bucket using the Cloud Storage Unload component, then reload that data into a new table using the Cloud Storage Load component. The job layout is shown below.

The Cloud Storage Unload Unload component properties are shown below. The component is pointed to the table we want to unload through the 'Table' property and we choose a 'Google Storage Location URL' to place the physical file on. At this location, a file with the name given in 'Output Object Name' will be created there with your data. Choosing a format, header and compression is completely at the discretion of the user.

This creates the file 'docs_unload' on the storage bucket. This file is then read back in by the Cloud Storage Load component and loaded into the 'docs_email' table created by the Create Table component. Sample the new 'docs_email' table in a Transformation job confirms that the data has been unloaded and loaded correctly.