Create Table
    • Dark
      Light

    Create Table

    • Dark
      Light

    Article Summary

    Create Table Component

    Create or replace a table.

    Matillion ETL for Azure Synapse: when using the FLOAT data type, users can enter any value that is greater than or equal to 1 and less than or equal to 2,147,483,647. If a FLOAT with a size less than or equal to 7 is used, a column of type REAL will be produced in the database—note that this column will still appear as a FLOAT type within Matillion ETL, simply with a smaller size. Conversely, if a FLOAT with a size greater than or equal to 8 is used, the traditional FLOAT value is used for the column both in the database and Matillion ETL. For more information, please refer to the float and real (Transact-SQL).

    Matillion ETL for Delta Lake on Databricks: users may experience casting errors if using binary values.

    Warning: Depending on the chosen settings, this component is destructive. Take care when running this component as it may remove existing data.


    Properties

    Snowflake Properties

    PropertySettingDescription
    NameStringA human-readable name for the component.
    Create/ReplaceSelectCreate: (default option for this property) this option creates a new table. Creating a new table will generate an error if a table with the same name already exists. However, existing data will not be destroyed.
    Create if not exists: this option will only create a new table when a table of the same name does not already exist. This option will not destroy existing data. If the schema of the existing table does not match the schema defined in this component, no attempt is made to fix or correct the schema, which could lead to errors later on in the ETL job if:
    1. Users did not expect a table to already exist.
    2. Users did not expect to have a different schema to the one defined in this component.
    Replace: this option drops any existing table of the same name and then creates a new table. Selecting this option guarantees that after the component succeeds, the table matches the schema defined in this component. However, any data from the existing table (to be replaced) will be lost. Note: Since other database objects might depend upon this table,
    drop ... cascade
    is used in the "Comment" property, which may remove many other database objects.
    DatabaseSelectChoose the database that the newly created table will belong to.
    SchemaSelectSelect the table schema. The special value, [Environment Default], will use the schema defined in the environment. For more information on using multiple schemas, see Schema Support.
    New Table NameTextThe name of the table to create or replace. Note: This field is case-sensitive by default, since Matillion ETL uses quoted identifiers. To change this behaviour, please consult Snowflake documentation target="_blank">case sensitivity.
    Table TypeSelectThe type of Snowflake table to create:
    Permanent: a table that holds data indefinitely and that can be restored using Snowflake's Time Travel.
    Temporary: a table that is automatically destroyed at the end of the Snowflake session.
    Transient: a table that holds data indefinitely, but that cannot be restored.
    ColumnsColumn NameThe name of each new column.
    Data TypeVARCHAR: this type can hold any kind of data, subject to a maximum size. More...
    NUMBER: this type is suitable for whole-number types (no decimals). More...
    Note: You can enter AUTOINCREMENT or IDENTITY as the default value for any numeric-type column and the column will automatically give an incremented value on each new row starting from 1. This feature is useful when creating a unique key on the table.
    FLOAT: this type is suitable for numeric types, with or without decimals. More...
    BOOLEAN: this type is suitable for data that is either 'true' or 'false'. More...
    DATE: this type is suitable for dates without times. More...
    TIMESTAMP: this type is suitable for timestamps. More...
    TIME: this type is suitable for times target="_blank">More...
    VARIANT: a flexible type that can be used for any purpose. More...
    SizeFor Text types, this is the maximum length. This is a limit on the number of bytes, not characters. Generally speaking, in all data stored using UTF-8, any non-ASCII character will count as 2 or more bytes.
    PrecisionRelevant only for Numeric, it is the maximum number of digits that may appear to the right of the decimal point.
    Default ValueThe default value under this column for any row.
    Not NullTrue if this column does not accept null values.
    UniqueMark this column as unique, declaring that all values in it are unique with respect to one another. You may have multiple unique columns per table.
    CommentA location to store descriptive VARCHAR comments against columns. Used to contextualize the content being stored in tables in your database.
    Default DDL CollationStringSet the default DDL collation. Setting this parameter forces all subsequently created columns in the affected table to have the specified collation as the default, unless the collation for the column is explicitly defined in the DDL. For more information, refer to the Snowflake documentation.
    Primary KeysSelectDeclare one column to be a primary key. Primary keys are, by default, unique. Users can only have one primary key per table.
    Clustering KeysSelectSpecify clustering key(s) to use. Defining clustering keys can allow for more efficient sorting and scanning of large tables. More...
    Data Retention Time in DaysIntegerSet a number of days for which data is retained after deletion. More...
    CommentTextAttach a comment to the table. More...

    Redshift Properties

    PropertySettingDescription
    NameStringA human-readable name for the component.
    SchemaSelectSelect the table schema. The special value, [Environment Default], will use the schema defined in the environment. For more information on using multiple schemas, see Schema Support.
    New Table NameTextThe name of the table to create or replace.
    Create/ReplaceSelectCreate: (default option for this property) this option creates a new table. Creating a new table will generate an error if a table with the same name already exists. However, existing data will not be destroyed.
    Create if not exists: this option will only create a new table when a table of the same name does not already exist. This option will not destroy existing data. If the schema of the existing table does not match the schema defined in this component, no attempt is made to fix or correct the schema, which could lead to errors later on in the ETL job if:
    1. Users did not expect a table to already exist.
    2. Users did not expect to have a different schema to the one defined in this component.
    Replace: this option drops any existing table of the same name and then creates a new table. Selecting this option guarantees that after the component succeeds, the table matches the schema defined in this component. However, any data from the existing table (to be replaced) will be lost.
    Table MetadataColumn NameThe name of the new column.
    Data TypeSelect the data type. For more information, read Data types.
    Text: this type can hold any kind of data, subject to a maximum size. More...
    Integer: this type is suitable for whole-number types (no decimals). Selecting the integer type with Size 1-4 activates the Redshift type SMALLINT. Size 5-9 activates the Redshift type INTEGER. Size 10 or greater activates the Redshift type BIGINT. More...
    Numeric: this type is suitable for numeric types, with or without decimals. More...
    Real: this type is suitable for data of a single precision floating-point number. More...
    Double Precision: this type is suitable for data of a double precision floating-point number. More...
    Boolean: this type is suitable for data that is either 'true' or 'false'. More...
    Date: this type is suitable for dates without times. More...
    DateTime: this type is suitable for dates, times, or timestamps (both date and time). More...
    SUPER: use the SUPER data type to store semi-structured data or documents as values. More...
    SizeFor Text types, this is the maximum length. This is a limit on the number of bytes, not characters. With Redshift, since all data is stored using UTF-8, any non-ASCII character will count as 2 or more bytes.
    For Numeric types, this is the total number of digits allowed, whether before or after the decimal point.
    Decimal PlacesRelevant only for 'Numeric' data, it is the maximum number of digits that may appear to the right of the decimal point.
    Encoding TypeThe Redshift compression encoding. See the Redshift documentation for details of the various available encodings.
    Note: Although users can manually specify the encoding type here, it is advised to use automatic compression analysis to select the optimal compression. When loading into empty tables, this is performed by the S3 Load (unless you disable COMPUPDATE). It can also be performed by the Table Output component, provided you are truncating the table.
    Allow NullableWhen 'True', Matillion ETL specifies that the column accepts null values. When 'False', Matillion ETL specifies that the column is not allowed to contain null values. Default is 'True'.
    CommentA location to store descriptive text comments against columns. Used to contextualize the content being stored in tables in your database.
    Distribution StyleSelectAll: copy rows to all nodes in the Redshift Cluster.
    Auto: Redshift assigns an optimal distribution style based on the size of the table data.
    Even: the leader node distributes the rows across the slices in a round-robin fashion, regardless of the values in any particular column.
    Key: the rows are distributed according to the values in one column. The leader node places matching values on the same node slice. If you distribute a pair of tables on the joining keys, the leader node collocates the rows on the slices according to the values in the joining columns, so that matching values from the common columns are physically stored together.
    For more information, please refer to the Redshift documentation.
    Distribution KeySelectThis property is only activated if the Distribution Style is set to 'Key'. Users must select which column is used to determine which cluster node the row is stored on.
    Sort KeySelectThis setting is optional, and allows users to specify the columns from the input that should be set as the table's sort key. Sort keys are critical for efficient performance. For more information, we advise users refer to the Redshift documentation.
    Sort Key OptionsSelectChoose whether the sort key is of a 'compound' or 'interleaved' variety. For more information, please refer to the Redshift documentation.
    Primary KeySelectThis setting is optional, and specifies the column from the input that should be set as the table's primary key.
    Identity ColumnsColumn NameThe name of the column. It must match the name of a column defined in the Table Metadata property, which will be set as an identity column. An identity column contains automatically generated values. The specified column must be of Data Type 'Integer' or 'Numeric' with zero (0) decimal places.
    SeedSpecifies the starting value.
    StepSpecifies the increment between values.
    Backup TableSelectSpecify whether the created table is to be included in automated and manual cluster snapshots. 'No' has no effect on automatic replication of data to other nodes within the cluster, meaning that tables set with 'No' in this property are restored in a node failure. The default setting is 'Yes'.

    BigQuery Properties

    PropertySettingDescription
    NameStringA human-readable name for the component.
    ProjectSelectSelect the project that the newly created table will belong to.
    DatasetSelectSelect the table dataset. For more information on using datasets, please refer to the BigQuery documentation.
    New Table NameTextThe name of the table to create or replace.
    Table MetadataField NameThe name of the new field.
    Data TypeFor more information on available BigQuery data types please refer to the GCP documentation.
    String: this type can hold any kind of data, subject to a maximum size.
    Integer: this type is suitable for whole-number types (no decimals).
    Float: this type is suitable for numeric types, with or without decimals.
    Numeric: this data type is suitable for data of an exact numeric value, allowing 38 digits of precision and 9 decimal digits of scale.
    Boolean: this type is suitable for data that is either 'true' or 'false'.
    Date: a formatted date object without time. See the GCP documentation.
    Time: a formatted time object without date. See the GCP documentation.
    DateTime: a formatted timestamp containing both date and time that is easily readable by the user. See the GCP documentation.
    Timestamp: this type is a timestamp left unformatted (exists as Unix/Epoch Time).
    ModeThe field mode. Default is 'NULLABLE'.
    NULLABLE: Field allows null values.
    REQUIRED: Field does not accept null values.
    REPEATED: Field can accept multiple values.
    CommentA location to store descriptive text comments against columns. Used to contextualize the content being stored in tables in your database.
    Define Nested MetadataWhen the Define Nested Metadata checkbox is ticked inside the 'Table Metadata' property, a tree-structure can be defined for metadata.
    Create/ReplaceSelectCreate: (default option for this property) this option creates a new table. Creating a new table will generate an error if a table with the same name already exists. However, existing data will not be destroyed.
    Create if not exists: this option will only create a new table when a table of the same name does not already exist. This option will not destroy existing data. If the schema of the existing table does not match the schema defined in this component, no attempt is made to fix or correct the schema, which could lead to errors later on in the ETL job if:
    1. Users did not expect a table to already exist.
    2. Users did not expect to have a different schema to the one defined in this component.
    Replace: this option drops any existing table of the same name and then creates a new table. Selecting this option guarantees that after the component succeeds, the table matches the schema defined in this component. However, any data from the existing table (to be replaced) will be lost.
    PartitioningSelectChoose whether to create a date-partitioned table or not.
    Column: Partitions the table's data according to the specification in the Partitioning Field property, activated when this option is selected.
    Day: Selecting Day will create the table as a partition table, which will load data into separate date-partitions and can be queried with the '_PARTITIONTIME' pseudo-column. Select Day when your data is spread out over a wide range of dates or if your data is continually added over time. This allows large datasets to be split into multiple parts with each part corresponding to a single day of data.
    Hour: Select Hour if your tables have a high volume of data spanning a short date range (typically less than six months of timestamp values). Hourly partitioning allows the addressing of data at hour-level granularity, such as when appending, truncating, or deleting data from a particular partition.
    Month: Select Month if your tables have a relatively small amount of data for each day, but span a wide date range.
    None: No partitioning.
    Year: Select Year if your tables have a relatively small amount of data for each day, but span a wide date range. Both Year and Month are recommended options if your workflow requires frequently updating or adding rows that span a wide date range (for example, more than a few hundred dates).
    For more information, we recommend section "Daily partitioning versus hourly, monthly, or yearly partitioning" from Introduction to partitioned tables.
    Enabling partitioning on a table will activate additional properties in the Table Input component, which allows the partition column to be included in the input.
    Partitioning FieldSelectThis parameter appears when 'Column' is selected in the Partitioning property. Partitioning Field has a drop-down, which should be populated only by Date or Timestamp column types defined in the Table Metadata. For more information, see here.
    Partition Expiration TimeTextSpecify the number of milliseconds before the partition expires (field can be left blank for no expiration).
    KMS EncryptionSelectSupport for customer-managed encryption. By default, this is set to 'No'. Your encryption keys are stored within Cloud KMS. For more information, see here.
    LocationSelectThis parameter appears when KMS Encryption is set to 'Yes'. Choose a location that matches the location of the BigQuery dataset.
    Key RingSelectThis parameter appears when KMS Encryption is set to 'Yes'. Choose from a group of keys. On the GCP console in KMS, a Key Ring or multiple Key Rings will have been created.
    KeySelectThis parameter appears when KMS Encryption is set to 'Yes'. Choose the Encryption Key from the Key Ring.
    Cluster ColumnsSelectCreates a clustered table. Allows the optional selection of up to four columns. Columns of types Integer, String, Boolean, Date, and Timestamp are supported. For more information, see here.

    Synapse Properties

    PropertySettingDescription
    NameStringA human-readable name for the component.
    Create/ReplaceSelectSelect the function of the component from the following options: Create: (default setting) this option creates a new table. Creating a new table will generate an error if a table with the same name already exists. However, existing data will not be destroyed.
    Create If Not Exists: this option will only create a new table when a table of the same name does not already exist. This option will not destroy existing data.
    Replace: this option drops any existing table of the same name and then creates a new table. Selecting this option guarantees that after the Orchestration Job has run, the table matches the schema defined in this component. However, any data from the existing table (that is, the table to be replaced upon the new job run) will be lost.
    SchemaSelectSelect the table schema. The special value, [Environment Default], will use the schema defined in the environment.
    For more information on schemas, please see the Azure Synapse documentation.
    New Table NameStringSpecify the name of the table to create or replace.
    Note: This field is case-sensitive by default, since Matillion ETL uses quoted identifiers. For more information, please refer to the Azure Synapse documentation.
    Table MetadataColumn NameProvide the name of each new column.
    Data TypeSelect the data type. The available data types are: DATE: this data type is suitable for dates without times. More...
    DATETIME: this data type is suitable for timestamps. More...
    TIME: this data type is suitable for times target="_blank">More...
    INTEGER: this data type is suitable for whole number types (no decimals). More...
    NUMERIC: this data type is suitable for numeric types, with or without decimals. More...
    TEXT: this data type is suitable for text types. More...
    FLOAT: this data type is suitable for approximate number data types for use with floating point numeric data. More...
    BOOLEAN: this data type is suitable for data whether values are either "true" or "false". More...
    SizeThe size column sets the size, except for the data types BOOLEAN, DATE, DATETIME, and TIME.
    For more information about DATE, read date (Transact-SQL).

    For DATETIME, the size column sets the precision of the datetime offset, and accepted values range from 0-7, ranging from 0 to 9999999, that represent the fractional seconds.
    If the DATETIME size is set to 0, the returned datetime will be 26 positions (YYYY-MM-DD hh:mm:ss {+|-}hh:mm).
    If the DATETIME size is set to a precision of 1, the returned datetime will be 28 positions (26 + precision + 1) (YYYY-MM-DD hh:mm:ss.n {+|-}hh:mm).
    If the DATETIME size is set to a precision of 7, the returned datetime will be 34 positions (26 + precision + 1) (YYYY-MM-DD hh:mm:ss.nnnnnnn {+|-}hh:mm).
    For more information, read datetime (Transact-SQL).

    For TIME, the size column sets the precision of the time offset, and accepted values range from 0-7, ranging from 0 to 9999999, that represent the fractional seconds.
    If the TIME size is set to 0, the returned time will be 8 positions (hh:mm:ss).
    If the TIME size is set to 1, the returned time will be 10 positions (8 + precision + 1) (hh:mm:ss.n).
    If the TIME size is set to 7, the returned time will be 16 positions (8 + precision + 1) (hh:mm:ss.nnnnnnn).
    ScaleDefine the scale. More...
    Allow NullableSelect whether to allow nullable values. More...
    Distribution StyleSelectSelect the distribution style
    Hash: This setting assigns each row to one distribution by hashing the value stored in the distribution_column_name. The algorithm is deterministic, meaning it always hashes the same value to the same distribution. The distribution column should be defined as NOT NULL, because all rows that have NULL are assigned to the same distribution.
    Replicate: This setting stores one copy of the table on each Compute node. For SQL Data Warehouse, the table is stored on a distribution database on each Compute node. For Parallel Data Warehouse, the table is stored in an SQL Server filegroup that spans the Compute node. This behavior is the default for Parallel Data Warehouse.
    Round Robin: Distributes the rows evenly in a round-robin fashion. This is the default behaviour.
    For more information, please read this article.
    Distribution ColumnSelectSelect the column to act as the distribution column. This property is only available when the Distribution Style property is set to "Hash".
    Index TypeSelectSelect the table indexing type. Options include:
    Clustered: A clustered index may outperform a clustered columnstore table when a single row needs to be retrieved quickly. The disadvantage to using a clustered index is that only queries that benefit are the ones that use a highly selective filter on the clustered index column. Choosing this option prompts the Index Column Grid property.
    Clustered Column Store: This is the default setting. Clustered columnstore tables offer both the highest level of data compression and the best overall query performance, especially for large tables. Choosing this option prompts the Index Column Order property.
    Heap: Users may find that using a heap table is faster for temporarily landing data in Synapse SQL pool. This is because loads to heaps are faster than to index tables, and in some cases, the subsequent read can be done from cache. When a user is loading data only to stage it before running additional transformations, loading the table to a heap table is much faster than loading the data to a clustered columnstore table.
    For more information, please consult the Azure Synapse documentation.
    Index Column GridNameThe name of each column.
    SortAssign a sort orientation of either ascending (Asc) or descending (Desc).
    Index Column OrderMultiple SelectSelect the columns in the order to be indexed.
    Partition KeySelectSelect the table's partition key. Table partitions determine how rows are grouped and stored within a distribution.
    For more information on table partitions, please refer to this article.

    Delta Lake Properties

    PropertySettingDescription
    NameStringA human-readable name for the component.
    Create MethodSelectSelect the function of the component from the following options:
    Create: (default setting) this option creates a new table. Creating a new table will generate an error if a table with the same name already exists. However, existing data will not be destroyed.
    Create If Not Exists: this option will only create a new table when a table of the same name does not already exist. This option will not destroy existing data.
    Replace: this option drops any existing table of the same name and then creates a new table. Selecting this option guarantees that after the Orchestration Job has run, the table matches the schema defined in this component. However, any data from the existing table (that is, the table to be replaced upon the new job run) will be lost. This method is restricted to tables of file type Delta.
    CatalogSelectSelect a Databricks Unity Catalog. The special value, [Environment Default], will use the catalog specified in the Matillion ETL environment setup. Selecting a catalog will determine which databases are available in the next parameter.
    DatabaseSelectSelect the Delta Lake database. The special value, [Environment Default], will use the database specified in the Matillion ETL environment setup.
    Table NameStringThe name of the table to create or replace.
    Table MetadataColumn NameThe column name from the input flow. Add as many rows to the editor as you need, one per input column.
    Data TypeSelect from INTEGER, NUMBER, FLOAT, TEXT, TIMESTAMP, DATE, BOOLEAN, BINARY as the data type for this column.
    SizeSet the data type size.
    ScaleSet the data type scale.
    CommentA location to store descriptive text comments against columns. Used to contextualize the content being stored in tables in your database.
    Table TypeSelectSelect whether the new table will be of type Managed or Unmanaged. The default setting is Managed.
    Read Managed table for more information.
    UsingSelectSelect the file type. Available types: CSV, Delta, JSON, ORC, Parquet. This property is only available when Table Type is set to Unmanaged.
    LocationS3 Bucket(AWS only) Specify the S3 bucket location.
    This property is only available when Table Type is set to Unmanaged.
    Storage AccountSelect(Azure only) Select an Azure Blob Storage account. An Azure storage account contains all of your Azure Storage data objects: blobs, files, queues, tables, and disks. For more information, read Storage account overview.
    This property is only available when Table Type is set to Unmanaged.
    Blob ContainerSelect(Azure only) A Blob Storage location. The available blob containers will depend on the selected storage account.
    This property is only available when Table Type is set to Unmanaged.
    Partition KeysColumn SelectSpecify any columns to be used as partition keys.
    Table PropertiesKeySpecify any table metadata using key=value pairs. Specify the key.
    ValueSpecify the key's value.
    CommentStringUse this property to prescribe any meta comments about the table.

    Variable Exports

    This component makes the following values available to export into variables:

    SourceDescription
    Table RecreatedWhether or not the table was (re)created. This is useful when "Create/replace" is set to 'Create if not exists' so users can tell whether the table needed to be created or not.

    Strategy

    Generates any required DDL to manage an existing table of the same name, followed by a CREATE TABLE statement.


    What's Next