Thanks for coming to InDesignSecrets.com, the world's #1 resource for all things InDesign!

Getting Started With XML in InDesign

Recently, I worked on a project where I had to bring XML data into InDesign. Prior to this, I had heard about XML but never worked with it, so this project was a big challenge to me. I learned a lot along the way and thought it would be helpful to others if I shared the details of my experience.

XML Basics

Like HTML, XML is also a markup language for describing data. XML stands for eXtensible Markup Language. “Extensible” means that it allows users to define their own tags, unlike in HTML where you’re limited to using the predefined tags. All the other rules which apply to HTML tags also apply to XML as well. For example, a tag must begin with < and ends with >. A beginning <tag> should have an ending </tag>. For example:

<tagname>Some text</tagname>

The Project Template and Data

For my project, here’s what the template and XML looked like.

Sample Template PDF

Sample XML Data

So the first big question is… how can this data be imported to InDesign? For starters, the XML and template don’t match. There are only 4 columns in the template with a Category head, but there are 12 tags in each record in the XML file without a Category.

XML Process

Here’s what I came up with for a process:

1. Create a new document, and design the layout as per your requirements.

2. Now open the Tags Panel from the Window > Utilities > Tags menu.

3. Also, open the Structure Pane from the View > Structure > Show Structure menu.

4. In the Tags Panel and Structure pane, you’ll see the default Root Tag. Every document must have a single root tag that will encompass all the other tags and data. A root tag doesn’t have to be called ‘Root’. It can have any name. In my example, I have a root tag called Jobs.

5. In the new InDesign document, import the XML either from the File menu or the fly-out menu of the Structure pane. You’ll see an Import XML dialog box, something like this:

6. Select the XML file you want to import and click on Show XML Import Options and Merge Content and click Open. When you click Open, you’ll be greeted with another dialog box full of options. Deselect all of them for now. We’ll deal with them later. Click OK.

7. The XML gets imported in the document but it still resides only in the Structure pane. All the tags used in the XML are now visible in the Tags panel.

8. In the Structure pane, you’ll see a Jobs tag instead of the root tag, with a gray triangle before it.

9. The triangle indicates that the Jobs tag contains child tags (with or without data). When you twirl open the Jobs tag, it will show its content. In the example below, you’ll see that the main tag contains child tags as well.

Tip: To expand all the tags including child tags, Alt+click/Option+click on the gray triangle.

10. When you expand the <Job> child tag, you’ll see one full record containing subtags. The tag icon with horizontal lines means that the tag contains text.

11. To check, let’s select and drag the <LocationDescription> tag in our Document.

12. In the above example, you’ll see that the text is now imported to our InDesign document. And the tag icon changes in the Structure pane. The icon now looks like a paragraph with a Drop Cap ‘T’. Which means that this tag is now in use. Secondly, the tag name <LocationDescription> is underlined, which means that the tag content is currently selected.

13. You can verify the content in the XML file. Go to the first record in the XML file and then look for <LocationDescription> tag, you’ll find the text “Blackpool, Lancashire“.

14. This drag & drop method gives us the flexibility of importing particular text from XML to InDesign. Similarly, to see the entire content of a record, simply drag & drop the child tag to your Document.

15. If you look at the same text in Story Editor, you’ll see the tags as well. In fact, using Story Editor is a best practice when you’re working on data with tags since it helps you to rearrange the tagged content in the layout without messing up with the tags.

16. The data looks okay but it is not in the format we’re looking for. There are line breaks, spaces, tabs etc, which need to be cleaned-up at the time of importing the XML. Let’s clean this up by removing the unwanted text and/or spaces.

17. If you look back at the template, you’ll notice that we need only four tags to be imported: Job ID, Recruiter, Job Type and Salary. The tags in the XML that we’re going to use are <JobID>, <RecruiterName>, <Title> and <SalaryDescription>. We also need to rearrange these by moving the <RecuiterName> after the <JobID>.

18. Let’s go back to the step where we imported the XML in a new InDesign document with all the XML Import Options turned-off.

19. Now delete all the tags from second tag onwards. Select the second tag <Job> and then scroll to the bottom until you see the last tag, then shift-click the last tag <Job>. Once you have the selection, click on the Trash icon on the top-right of the Structure Pane.

20. This will now leave you with only one Tag / Record within the root tag.

21. Now select the <Jobs> (root) tag and drag and drop it into the layout. You can verify which element you have selected in the layout by the underline under the <Jobs> (root) tag in the Structure pane. Also, notice that the text is more messed up here compared to the one we dragged previously.

22. Open the text in Story Editor for clean up. Pay special attention to tags. Remove all the extra carriage returns, tabs, and spaces. Also, remove all the unwanted tags and move the <RecruiterName> tag before the <Title> tag.

23. After the cleanup, it will look like this, but it still needs some work.

24. Since the output is needed in a tabulated form, therefore, we’ll replace the returns with tabs. <Job> and </Job> tag is the starting and ending tag of each record.

25. In between these tags resides one full record starting from <JobID> and ending with </SalaryDescription>. Also, notice the return symbol (hard return) after the </SalaryDescription> tag, which tells XML that a hard return is required after each record. Without this return, all the text will come in one single paragraph. The final look will be like this:

26. Now format the text to give it a nice clean look. Formatting can also be done after importing the complete data set, but I prefer to do it first. Format the text as desired. I have applied some minimal formatting here.

27. Once our basic template or text frame is ready to hold data, we’ll now import the XML once again to fetch the complete data. Select Import XML from the File menu or fly-out menu of the Structure pane. You’ll see the Import XML dialog box. Select the XML file you want to import and check Show XML Import Options and Merge Content and click Open.

28. You’ll be prompted again with another dialog box full of options. This time we’ll select some of the options, as shown in the image (described below). Click OK button to import the XML.

Create Link — This option will create a link, which will allow us to update the text without re-importing if any changes are made to the original XML file. This is similar to an image link.

Clone repeating text elements — In our template, we formatted only one record and applied tags to it. This Clone option will clone/duplicate this one instance to all the records in the XML file.

Only import elements that match existing structure — We have used only 4 tags in our template but there are 12 tags in the XML file. This option will ignore all the unused tag(s).

Do not import contents of whitespace-only elements — In the beginning, when we imported a record, we saw that it also bought the spaces, tabs, and returns along with the text. This option will ignore all those extra spaces and import the text only in the desired layout.

29. All the desired data gets imported into the document and is formatted.

30. If you open and look at the tags in the Structure Pane, you’ll see that every record now contains only 4 tags instead of 12 tags in each child tag.

31. If you look at the Links panel, you’ll see the name of the XML file, which means that any changes made to the XML file can be updated instantly in the InDesign file, without re-importing.

32. Let’s try this by making a small change to the XML file. XML can be edited in any text editor. I have used Adobe ExtendScript Toolkit here. In the third record, (line three of the InDesign file), I have changed the title from “Agricultural Technician“, to “Farming Technician“.

33. As soon, I save the XML file, the Links Panel will show that the file has been modified.

34. After updating the XML file (link), the text gets updated in the InDesign Document.

Off to a Good Start

Since I’m still a novice in XML, this is just a basic workflow but it was good enough to get the job done. I didn’t attempt to use features like XSLT (which is beyond my expertise) to transform the data. If you have any thoughts or advice on how to improve this workflow, please share them in the comments.

Masood Ahmad

Masood Ahmad

Masood Ahmad has been working on InDesign since v2.0 and mostly with the Middle East version. He started his career as a Linguistic Operator in 1996 and presently working as an Associate Service Delivery Manager at Express KCS, India. Client communication, understanding requirements, distribution of jobs and monitoring of service delivery are part of his daily schedule. He assists his team in their works and also tries to educate them the best possible and efficient way. He is more interested in giving trainings within and outside the company. With his trouble-shooting skills, he tries to deal with all sorts of work. Visit http://indesignsecrets.com/author/masoodahmad to see his articles or email him at masood.designs@gmail.com
Masood Ahmad

Latest posts by Masood Ahmad (see all)

  • - November 30, -0001
Tags
Related Articles
Comments

11 Comments on “Getting Started With XML in InDesign

  1. Hello. I’m glad you are writing about the XML capabilities in InDesign. I have good experience with InDesign using the Structure panel to create HTML structure, which I then use for creating EPUB 3. I create an XML file from InDesign and renamed XHTML. Just hide a ZIP file with your own CSS, JS, and SVG. Finally, rename the file to EPUB. I have full control over the HTML structure I have created. EPUB is fully responsive due to the styles in CSS and SVG images.

    • Hi Robert,

      Can you please share your experience regarding this as a Demo whereas I will try the same..
      Thanks

      • Hi Amit,
        I will be glad to share your experience here. The entire HTML structure I create manually in InDesign in the Structure panel is exported to XML. I do not use XML to import into InDesign but only for export. I use a predefined structure of text frames with tags + attributes in InDesign.
        If HTML works well, it is necessary to have completed global styles in CSS and JavaScript etc. The root element in InDesign will not be “root”, but “html”. In the “html” tag then I put the “head” and “body” tags … Then all the attributes (class, id, src, and WAI-ARIA attributes !!! for screen-readers) are defined. It looks complicated, but for many things a template can be created. It is relatively easy to link every chapter with global styles to your own CSS.
        If I want to try different styles, I haven`t to change global styles in a CSS file right now. I create a text frame in InDesign that will not be for the printed publication – will be outside the page.
        This text frame is labeled by tag “style”. By the tag “Style” place into the tag “head” and add the attributes “type” … So you can override the CSS properties, that I have in the global CSS file style. All style changes are saved in the “style” tag in InDesign.
        When I do not like the current styles in HTML, I can change these styles every time I export from InDesign.

        Now I have ready-made templates for the workbook for Physics classes for primary school. If you are still interested, please write to my mail janakrobert@seznam.cz
        Wishing you a nice day
        Robert

  2. Be extremely careful with removing the no break codes. I had a job last year and the first part of the file had the no break code either preceded or followed by a regular space. So I deleted all the no break codes.

    Later on, I discovered that the no break code had also been used for regular spaces. And I ended up with a lot of sentences with no space between words.

    Didn’t realize until after I had tagged the file and had a thousand pages done in InDesign (book was 2,000 pages).

    I normally take the XML file the customer gives us (once a year). It’s from their website about their university is offereing that year and the instruction, course description, etc.

    And I’ve got to turn that mess into a book.

    • Thanks Dwayne for sharing your experience, I’ll take care of it too.

      Cleanup on 2000 pages, Oh my God! I do not wish to repeat the same.

      • Yeah, that wouldn’t be fun. Another reason I screwed up was because a second file I opened in TextWrangler. Well, that converts XML to HTML and HTML does not recognize the no break space, so it deletes it.

  3. Masood, I think you are wonderful to have posted this article. Well done. Practically all of my clients ask for XML but the documentation floating around, even from Adobe, is either lightweight or, if detailed, disjointed. It is so refreshing to read something that investigates a real workflow.

    • Wow! that’s really a good relief. Thanks Alistair and I’m glad you found it useful. I was a little worried while submitting this to the IS team, but your nice words made me confident now.

      The idea behind writing on the XML is that when I got this project I asked my colleagues who have prior worked on it, And everyone told me about something called DTD and when I asked them to show or teach me, I found them blank. The information collected from net was also not quite useful as you said “lightweight or if detailed, disjointed”.

      Please do share your experience on XML with us so that we can also improve our knowledge.

  4. Hi Massod, if you want to be considered the king of XML, please explain how to insert a hyperlink into xml associated with a label. For example “click here” and under the link at our “ApplyURL”. When we export the publication in .pdf, the text “click here” must contain the link vs. http://www….
    I hope I explained.
    You can help us?
    Thanks in advance

Leave a Reply

Your email address will not be published. Required fields are marked *