File Iterator
    • Dark
      Light

    File Iterator

    • Dark
      Light

    Article Summary

    File Iterator

    The File Iterator component lets users loop over matching files in a remote file system.

    The component searches for files in a number of remote file systems, running its attached component once for each file found. Filenames and path names are mapped into environment variables, which can then be referenced from the attached component(s).

    To attach the iterator to another component, use the blue output connector and link to the desired component. To detach, right-click on the attached component and click Disconnect from Iterator.

    If you need to iterate more than one component, put them into a separate orchestration job or transformation job and use a Run Transformation or Run Orchestration component attached to the iterator. In this way, you can run an entire ETL flow multiple times, once for each row of variable values.

    All iterator components are limited to a maximum 5000 iterations.


    Properties

    PropertySettingDescription
    NameStringA human-readable name for the component.
    Input Data TypeSelectSelect the remote file system to search. Available data types include: Azure Blob Storage, Cloud Storage, FTP, HDFS, S3, SFTP, and Windows Fileshare.
    Input Data URLString / SelectSelect or input the URL, including the full path and file name, that will point to the files to download to the selected staging area. Once you have selected the connection's Input Data Type, Matillion ETL will provide a template URL string.
    Note: Special characters used in this field (e.g. in usernames and passwords) must be URL-safe. For more information, please refer to our Safe Characters documentation.
    DomainStringInput your connection domain.
    SFTP KeyStringInput your SFTP private key. This property will only be used if the data source requests it. This property is only available when the Input Data Type is set to SFTP.
    UsernameStringInput your URL connection username. This property will only be used if the data source requests it.
    PasswordStringInput your URL connection password. This property will only be used if the data source requests it. Users can store passwords in the component itself, or use the secure Password Manager feature (recommended).
    Set Home Directory as RootSelectNo: Designates that the URL path is from the server root.
    Yes: Designates that the URL path is relative to the user's home directory. Default setting is Yes.
    This property is only available when the Input Data Type is set to either FTP or SFTP.
    RecursiveSelectNo: Only search for files within the folder identified by the Input Data Url.
    Yes: Consider files in subdirectories when searching for files.
    This property is only available when the Input Data Type is set to FTP, SFTP, or Windows Fileshare.
    Max Recursion DepthIntegerSet the maximum recursion depth into subdirectories. This property is only available when Recursive is set to Yes.
    Ignore HiddenSelectNo: Include "hidden" files.
    Yes: Ignore "hidden" files, even if they otherwise match the Filter Regex. Default setting is Yes.
    Max IterationsIntegerSet the total number of iterations to perform. As mentioned earlier, the maximum cannot exceed 5000.
    Filter RegexStringThe java-standard regular expression used to test against each candidate file's full path. If you want ALL files, specify .*
    ConcurrencySelectConcurrent: Iterations are run concurrently. This requires all "Variables to Iterate" to be defined as copied variables, so that each iteration gets its own copy of the variable isolated from the same variable being used by other concurrent executions.
    Sequential: Iterations are done in sequence, waiting for each to complete before starting the next. This is the default setting.
    Note: The maximum concurrency is limited by the number of available threads (2x the number of processors on your cloud instance).
    VariablesVariableAn existing environment variable to hold the given value of the Path Selection.
    File AttributeFor each matched file, the target variable can be populated with the Base Path, the Subfolder (useful when recursing), the Filename, or the date of when the file was Last Modified. You can export any or all of these into variables used by each iteration.
    For the Last Modified attribute, the date is formatted as ISO8601, with a UTC indicator. For example, 2021-01-04T10:45:15.123Z
    Users may experience a lag in how their data warehousing platform updates the last modified date, for example between when Matillion ETL interacts with the file versus the actual last modified date. This behaviour is a limitation to the platform and is subject to that platform's metadata.
    See the example section in the full documentation for the difference between these.
    Break on FailureSelectNo: Attempt to run the attached component for each iteration, regardless of success or failure. This is the default setting.
    Yes: If the attached component does not run successfully, fail immediately.
    Note i: If a failure occurs during any iteration, the failure link is followed. This parameter controls whether it is followed immediately or after all iterations have been attempted.
    Note ii: This property is only available when Concurrency is set to Sequential. When set to Concurrent, all iterations will be attempted.
    Record Values In Task HistorySelectChoose whether to record iteration values in the Matillion ETL Task History. The default setting is Yes.
    Stop On ConditionSelectSelect Yes to stop the iteration based on a condition specified in the Condition property. The default setting is No.
    For this property to be available, set Concurrency to Sequential.
    ModeSelectSelect the method of creating the condition.
    Simple: A no-code Condition UI will open, where users must specify an Input Variable, Qualifier, Comparator, and Value using drop-down menus and text fields. This is the default setting.
    Advanced: An editor will open, where users must write the condition manually using SQL.
    Condition (Simple mode)Input VariableAn input variable to form a condition around.
    QualifierIs: Compares the input variable to the value using the comparator.
    Not: Reverses the effect of the comparison, so "Equals" becomes "Not equals", "Less than" becomes "Greater than or equal to", etc.
    ComparatorSelect the comparator. Available comparison operators include "Less than", "Less than or equal to", "Equal to", "Greater than or equal to", "Greater than", and "Blank".
    ValueSpecify the value to be compared.
    Condition (Advanced mode)Text EditorManually write the condition in the editor. This editor accepts conditions written in JavaScript.
    Combine ConditionsSelectUse the defined conditions in combination with one another according to either And or Or.
    This property is only available when Mode is set to Simple.

    Variable Exports

    This component makes the following values available to export into variables:

    SourceDescription
    Iteration AttemptedThe number of iterations that this component attempts to reach (Max Iterations parameter).
    Iteration GeneratedThe number of iterations that have been initiated. Iterators terminate after failure, so this number will be the successful iterations plus any potential failures.
    Iteration SuccessfulThe number of iterations successfully performed. This is the max iteration number, minus failures and any unattempted iterations (since the component terminates after failure).


    Example

    This example shows how specific files can be transferred from an S3 bucket to a Google Cloud Storage bucket. This will be done by using the File Iterator component in conjunction with the Data Transfer component.

    The File Iterator component is set up to point to an Input Data URL (this is the Base Path). The File Iterator recurses any found folders/directories (this is the Subfolder), and matches files like "sales_.*.gz" (this is the Filename).

    In this example, the variable mapping is set up to provide both the "subfolder" and the "filename" into environment variables.

    Those variables can then be referenced from the attached Data Transfer component both in the Input Data URL and Target Object Name.

    At runtime, any matching files are uploaded to the Google Cloud Storage bucket.



    Video


    What's Next