Name is required.
Email address is required.
Invalid email address
Answer is required.
Exceeding max length of 5KB

XML files

Hi All!
Any good examples of how to read xml files?
We have XML files with correspondent XSD files in GCP Storage. I would like to fetch this data and load a BQ table.
Any ideas?
Cheers,
Cris


#xml #xsd #gcp #bigquery

6 Community Answers

Matillion Agent  

Veronica Kupetz —

Hi Christian,

Thanks for reaching out. There is not a direct way to read in an XML, but you can use the API Query component. There are some additional steps to getting this set up in an Orchestration job:

  • Use a BASH script component to download the XML file to the Matillion server from Google Cloud Storage
  • Reference the file URI from local system
  • Delete the local file

Details:
1. Use a Bash Script component to use the gsutil cp command to copy from Google Cloud Storage to your Matillion instance. For example:

gsutil cp gs://<bucket_name>/<file_name.xml> /tmp/<file_name.xml>

2. Create your API Profile

3. Since the file is available locally, set the following under Connection Options on the API Query component:

URI = /tmp/filename.json 

4. Delete the local file via the following command again run from BASH script

rm /tmp/<file_name.xml>

So the Job would look as below:

START ---> BASH Script (Download file) ----> API QUERY (process/load file) ---> BASH Script (delete local file)  

I can share an example of a job that is doing this, but will send to your e-mail directly since this message is through the support center. Hope this helps out.

Best Regards,
Veronica


Cristian Ivanoff —

Hi Veronica,
Thanks for the suggestion. I think this will work for us. I want to test it first though be fore adding components to copy the file so I manually copied the xml-file to the Matillion instance to /tmp/file.xml.
I choose to create an RSD file in the API profile and it seems to work. But when I do a test the table created is empty.

I also tried the "Generate" button but this doesnt work - I just get "Error:Null". So I dont understand whats wrong.

Could you guide me?

Cheers,
Cristian


Matillion Agent  

Veronica Kupetz —

Hi Cristian,

Glad to hear this may work for you. Were you able to walk through the sample job I sent over via e-mail? This a good way to go through the process. Here are some additional steps on how to set up the example I shared with you. Hoping this will help you out with your similar set up.

  1. Select Project —> Manage API Profiles and select the + button to add a new Profile and name it (Sample in this case)
  2. Once in Configure API Profile, select the + button under Files. Change the file type to XML and name the file. In the sample, you can name it “books”. Copy the books.xml file I shared into the console. Select OK to save and exit out. This will save the books.xml file to the following directory in your instance: /usr/share/tomcat/api_profiles/
  3. Select Project —> Manage API Profiles and edit the “Sample” profile or whatever you have named it. Select Generate within the Configure API Profiile
    1. Add the Table Name (books for this sample)
    2. For XPath to Repeat Element, add the following: /catalog/book
    3. Data Format = XML
    4. URI = /usr/share/tomcat/api_profiles/Sample/books.xml
  4. Once you complete that those steps, select OK to save and select “Test” to see the sample “books” data. This is a good way to see if your repeat element is set up correctly.
  5. While in the Configure API Profile screen, select the “books.rsd” file and comment out the following section in the file (the connection parameter will specify the URI location to your /tmp directory):
<!--<rsb:set attr="uri" value="/usr/share/tomcat/api_profiles/Sample/books.xml" />
     NOTE: This is used in the Connection Parameters via the API Query Component -->

I hope these detailed steps help out to test the example. Once you get that running, you can follow similar steps for your use case.

Best Regards,
Veronica


Cristian Ivanoff —

Hi Veronica,
Your example files work fine. I think there's a problem with the xml format which I not fully understand. The "Query Failed: [CreateSchema]" is in the error. Im a total xml-ignorant :D and it's a SAP xml with the headings <ns0:ZFPR002 xmlns:ns0="http://ZFPR002.V3"> and it seems to disturb I think - it should have a stylesheet or something associated.

Any clues?


Matillion Agent  

Veronica Kupetz —

Hi Cristian,

Hope you had a good weekend. Can you send over a few items to support@matillion.com and reference case #20486? This will help in figuring out the issue:

  1. Is it possible to share a sample of the XML file you are trying to parse?
  2. Can you share screenshots of the errors?
  3. Can you share a copy of your RSD file and include the exported job you have created so far?

Best Regards,
Veronica


Cristian Ivanoff —

A new ticket was created - 20858.

Br
Cristian

Post Your Community Answer

To add an answer please login