Hi! Im iterating through a bunch of files in Google Storage. This works fine but I would like to do it in a certain order. The file names contain dates and I would like to read the oldest file first. Any suggestions on how to do this? Br Cris
2 Community Answers
Kalyan Arangam —
The file-iterator itself seems to return filenames alphabetically. If your files are named appropriately (prefix_yyyymmdd), then it may return files in the order you need.
Add a file-iterator to your job and configure it to your folder and map a variable to Filename. Attach a python script component to it and print the filename.
What order do you see?
Another approach is to pull the filenames into a table in bigquery and iterate over them (Table/Grid iterator) using a sort order.
You may also read list of filenames from a Python component using boto library. Sort the filenames as necessary and take some action.
Hope that gives certain pointers on what’s possible.