Thanks for coming to InDesignSecrets.com, the world's #1 resource for all things InDesign!

Tools of Change Notes: XML in Practice

Mike’s notes from the XML in Practice session at the O’Reilly Tools of Change Conference 2/9/09

This was the second-half of a day-long set of XML tutorials. In the morning, there was an Introduction to XML for Publishers talk that I did not attend. If you’re interested, you can find the description of that session and the presenter’s slides here.

The descriptions, slides, handouts, and working files for XML in Practice can be found here.

Speaker 1: Bill Kasdorf (Apex Content Solutions)

Bill spoke about XML Models for books. He gave quick-hit looks at several XML schema for describing book content: ISO 12083, TEI, NLM, DocBook, DITA, and DTBook. He described how there often is no easily identifiable target schema for most publishers because “books are messy,” meaning they are often more complex than they seem and require some degree of customization, no matter what schema you choose for your XML workflow. I wholeheartedly agree. Here’s my 2 cents on two of the most popular schema for books, DocBook and DITA:

The advantage of working with off-the-shelf schema is that they are widely understood and a number of XML tools support them out-of-the-box. If you can avoid customization, you’ll save time, work, and money. But there is no magic. Books are messy.

DocBook was designed to describe technical documentation. Though it describes narrative documents (books, of course), it may not (probably won’t) fit your content precisely, and you’ll either have to make do or end up doing other sorts of customization (and paying for it somewhere else along the line). Still, it has been around a long time (since long before there was such a thing as XML). It wouldn’t have survived and thrived if it hadn’t been very useful for publishers.

DITA (Darwin Information Typing Architecture) was also designed to describe technical documentation, but in a very different way. It describes independent chunks of content called topics, linked together to make documents. It’s relatively new and has become very popular very quickly. Lots of tools support DITA out of the box. You just press the DITA button and start writing. It is wonderfully efficient for creating those chunks. But it is very hard to bring existing narrative content into DITA, (or to make nicely-flowing narrative content out of DITA), because it is so granular and modular. One of the keys for choosing a schema is finding one not only with appropriate tags for your content, but also right level of granularity.

If you want a wider view of the issues involved with creating an XML publishing workflow, you might want to check out the ongoing series of articles on that topic by Eric Damitz, one of my partners on Publicious.net. OK, now back to the O’Reilly show.

Bill Kasdorf also spoke a little about epub. epub is an XML standard for producing eBooks. It is composed of three open standards, the Open Publication Structure (OPS), Open Packaging Format (OPF) and Open Container Format (OCF), all produced by the IDPF.

IDFP maintains user forums where you can ask questions and learn more about epub.

Font rights are an issue. (I believe all fonts are born with certain inalienable rights: life, ligatures, and the pursuit of happiness. But I digress.) IDPF is working on font-mangling spec for epub to prevent pirates (Arrr!) from extracting fonts from epub.

Speakers 2 & 3: Bob Kelly, The American Physical Society and John Gardner, ViewPlus Technologies, Inc.

Their topic was Universally Usable Mainstream Online Publishing. They spoke about the challenges and benefits of working in XML to make content accessible. Here are a few interesting technologies they referenced.

ONIX is an XML standard for marketing and distribution information about published material.

Fire Vox is a free open source, talking browser extension for the Firefox. Install it and it acts a screen reader for content in Firefox.

The IVEO learning system by ViewPlus enables you to output an SVG file in three sensory modalities: sight, sound, and touch (utilizing an embosser). If this interests you, hang on to your copy of InDesign CS3, since CS4 does not export SVG.

Speaker 4: Norman Walsh (Mark Logic Co.)

Norm’s topic was “Where is the New When.” He spoke about the possibilities of adding geospatial information to content. He cited new opportunities for delivering (pushing) content to people with devices (basically phones with GPS) that know where they are. The example: you can get a message that the bar down the block has half-price drinks right now. You can also seek out geospatial content. So you can search for a sushi restaurant nearby that has a movie theater and an ATM within certain radius. Clearly we’re talking Big City stuff, but you get the point.

What is meant by “were is the new when?” One example is that instead of having our search results ordered by most recent first, we can have them ordered by distance from our location.

Some digital cameras have GPS chips to tag the images taken with them. Norm showed a map that aggregated data from hundreds of geotagged Photoshop images. It looked like one of those “Earth at night” images. The datapoints made a perfect outline of the US, with cities, and even major highways visible (apparently people take a lot of pictures as they drive down the road). Not sure exactly how this relates to publishing books, magazines, journals, etc, but it was neat.

Speaker 5: Marisa DeMeglio (DAISY Consortium)

In XML circles, DAISY stands for Digital Accessible Information SYstem. It is the stage name for the NISO Z39.86 standard. (You can see why they needed to come up with something catchy like “DAISY”). It is a standard for making digital content accessible to blind, visual impaired, print-disabled, and learning-disabled people. When you follow the DAISY standard, you produce a Digital Talking Book (dtbook). The dtbook DTD is the set of tags and rules for creating a valid dtbook XML file. A subset of the DAISY format comprises the NIMAS (National Instructional Material Accessibility Standard) for K-12 core student-facing content. There are several tools worth knowing about for working with DAISY.

The Save as DAISY add-in for Microsoft Word allows you to save content from Word XP, Word 2003, and Word 2007 as DAISY output.

Odt2dtbook is an OpenOffice.org Writer extension, that allows you to save content from OpenOffice documents as DAISY output.

The DAISY Pipeline is a very cool, open source package for converting files to and from DTBook format.

Design Science makes an product called called MathDaisy (currently in beta, PC-only : p), which handles the math content for either the DAISY Pipeline or the Save As DAISY add-in for Word.

And some obscure application called InDesign CS4 can export dtbook XML in an epub document (via the Export to Digital Editions feature).

Next up: Google Book Search, Adobe, and Quark.

Related Articles
Comments

13 Comments on “Tools of Change Notes: XML in Practice

  1. Ah I’m scared… so many things to do with XML scare me!

    Ok, I’ve read a lot of publicious.net for Eriks write ups and they are very good indeed. I’ve read a lot about XML workflow and how and when to implement them. However, I am waiting in earnest for a “how-to” post for XML. I suppose there isn’t an easy way to do a “how-to” though? Well how about sample XML workflows and files with some easy 1 – 2 – 3 steps ? Am I asking too much? It maybe a little too much, but I’ve never seen or worked with XML, I’d love to see it working and how it works and how to get it to work, even at a basic level.

    Mike, your post on the conferences are very good indeed. Many thanks for taking the time to not only attend but to write up reviews on everything you’re seeing and doing. It’s very much appreciated.

    XML still terrifies me though. So much to learn.

  2. I hope the XML for InDesign book will get an update this year (the previuous version was fine but lacked working with tables) and will give even more day-to day use examples…

  3. Eugene-

    No doubt Eric will get to more specifics as the series continues. Fear not the XML. It wants to be your friend. I think your comment captures the whole thing: what drives a lot of people’s terror is precisely the fact that there aren’t many good, clear how-tos available.

    I think there are a lot of reasons for that. Here’s some.

    1. It’s hard to write something that applies to a large audience. There is no magic bullet. InDesign can only go so far on its own. To really close the loop and get the job done, every solution I’ve seen ends up being a custom solution. And some of them don’t reliably work, when you get right down to it. Which brings me to…

    2. Because it’s so foreign and complex, there’s a lot of value in that information. Consultants and developers are the gatekeepers on XML. That’s how they make their living, so they won’t give it away for free. Also, some specifics are valuable business assets that folks can’t share without getting in trouble.

    3. Uh, I forgot what 3 was.

    4. IMHO, the companies who make the software and solutions for XML-based publishing workflows have done a lousy job of reaching out to traditional publishing people, and teaching the specifics. Some expensive, insanely complex applications don’t even have user guides. Just some meager HTML help that assumes you already know what a namespace is. So you go on the forums, and they’re posting code you don’t understand. Then you Google. Then you give up.

    4. This stuff takes a long time to explain and to comprehend. There’s much jargon to deal with just to cover the foundation topics. But in the end, it’s still just text. Hang in there.

  4. Well Mike, thanks for the info, I’m sure most of what you’ve said crossed my mind before, and I’m sure other readers will value it as much as I do, so thank you for that.

    I’ll be sure to keep on eye on publicious.net for more of Eric’s postings (sorry for the spelling error in his name last time). I am looking forward to getting into actually using XML and understanding it.

  5. Eugene.

    In my travels through the internets, I’ve discovered a few things…A few things I’m not happy with.

    1. No information on XML Workflow for the layout/typesetter…

    2. No step by step howto guides…

    3. The only examples are of Database to Indesign Catalogues, or business card creation. Nothing on Books.

    4. They only describe what to do in InDesign.

    5. The most important part of the XML Workflow, is the Code…They NEVER EVER explain how that’s generated from word.

    It’s always assumed you already have the XML text file already done. Or have a degree in Computer Science.

    The XML website (http://www.xml.com/) is confusing and crammed full of I don’t know what.

    The W3Schools website (http://www.w3schools.com/xml/default.asp) is worse… You don’t know what’s an add and what’s a lesson…

    They really IMHO make it impossible to learn anything for people with limited time…

    Unless of course you work in India, and have 200 people coding your one word file…

    None of this makes sense…

    I guess if you had the money you could fly from Ireland to the States to attend a conference and walk away thinking they talked a lot about nothing relevant to you…

    Or I could fly from Australia to the States.

    They “XML Guru’s” are not making it easy…

    [*as I cower back to the internets looking for a solution*]

    Marcus Stringer
    Midland Typesetters
    Australia.

    Excuse any spelling errors as I do not check in forums…

  6. Thanks for that Marcus, it more or less is the same this end. I’ve spent nearly ?3,000 learning xhtml, css, flash, html, fireworks, dreamweaver and more web apps. There are no XML to InDesign or vice versa courses, and I find as Marcus did, very little information out there on “how-to” for XML and InDesign. It can be frustrating, as you can imagine.

    I think my ?3,000 would have been better spent flying to the States for the seminars – definitely next time, I hope.

  7. There are one stop shops out there for solutions to the XML Workflow…

    I am in talks presently with Typefi…
    (thanks to the guys here for introducing me to Typefi)

    But these solutions are really really expensive…

    And…the problem I’ve found is that no-one really wants to get into the nitty gritty, of a step by step solution, they seem to just point you in the direction of an XML book or course…

    I would do it… if I knew anything about it… maybe I should keep a diary… It’d be a horror story…

    Step One: Simply Tag your word file using the XML codes, which you have already learned from…..???????????????

    Step Two: Simply Create a DTD

    Step Three: Import into InDesign Template.

    I’m sorry, on re-reading this, It sounds like I’m really bitter about this whole process…
    Maybe I am… Maybe it’s because there is nothing out there which helps me, and thus I will loose business to India who can do it on the cheap. Which means I will have to sack staff because I won’t be able to financially keep them because the work isn’t there…

    All down to not finding any help on XML workflows.

    Very dramatic… This won’t happen, as my business is large enough to pay for a “One Stop Shop”…

    But what about the people who can’t…

  8. Everyone seems to wanting a simple how-to book to show them the secrets to XML in InDesign. I guess you haven’t seen my book, “A Designers Guide to Adobe InDesign and XML.” (Adobe Press 2008) http://xmlfordesigners.com/designerguide.html

    Four years ago I was desperate to learn XML and searched frantically for answer with little luck. There were few resources. So, I wrote my own. From the outside XML seems hard and esoteric. But it is actually quite simple and straight-forward. If you aren’t afraid of HTML then you shouldn’t be afraid of XML. One marks up text for display (HTML) the other marks up text for content (XML).

    InDesign does a great job with XML both on import and export. Sure, there are things to be desired. If you are desperate for every feature of XML then use Framemaker. It has XML down as a science. But for most tasks InDesign is all you need.

    I’m surprised by all the complaints about DocBook. I have an entire chapter in my book regarding it. It is a very elaborate DTD that has gobs of built-in flexibility and can adapt to most existing structures. You’ll have to show me a layout that can’t adapt.

    On the other hand, if you know you are creating content that will be structured in the end, perhaps you could design it to the standard in the first place. It takes only a little time to familarize the production and design team to legal structures and methods for adapting elements to a book so they work within the guidelines. One simple way we use is to create complex illustrations outside of InDesign, so that the captions, headings and any call-outs are incorporated into the graphic itself and not in the layout.

    CS3 and CS4 have much better XML features than CS2 or CS. You need to upgrade if you are working with XML.

    In the last two years I have spoken at several InDesign and Creative Suite Conferences on XML. I teach XML with the sensibilities of a designer, not a programmer or coder. I will be in Orlando in May at the InDesign User Group demonstrating XML and at the InDesign Conference in November.

    If you need help sooner, I am also available for on-site training or seminars. Or you could just buy the book. It teaches you almost everything you need to know to get up and started.

  9. Marcus-

    I’m glad you’re talking to Typefi. Their product has a lot going for it, and they have come up with some truly ingenious methods and workarounds for dealing with InDesign’s XML shortcomings.

    If you need workflow & project management too, you might check out something like PageSeeder.
    http://www.pageseeder.com/

    I don’t know if DITA’s appropriate for your work, but if it is, a total end-to-end DITA solution that operates in the browser is DocZone.
    http://www.doczone.com/

    But like you say, none of these is cheap. Plus, once you get in, it can be hard to get out if it’s not working. I think the truth is that people who can’t engage a solution on the scale of Typefi, are in for rough ride.

  10. Eugene,
    If you can organize a meeting of the InDesign users’ group in Dublin, I’d be happy to show you how I’ve been using XML to automate the production of cattle catalogues (yes, I’m in Ireland!).

    I strongly recommend Jim’s book, but should add that it came out a little too late for me. I had to learn the hard way with CS2 a few years ago.

  11. Marcus, I’m not getting what you are trying to do with XML. Yes, you can tag text in Word (Windows-only) but to what effect? What are you trying to do with the XML?

    InDesign uses XML in three basic ways.

    1. On Import – Usually to create dynamic documents or to simplify the formatting process. (Map Tags to Styles).

    2. To manage assets – Import images or text based on a content management system (CMS).

    3. On export – Usually to create content for the CMS systems or directly for the Web or other InDesign documents. Many are now using the XML for electronic books. Adding the XML structure can be done manually in InDesign or by using Map Styles to Tags. Elaborate structures can be created using Nested styles to help minimize the manual process.

    If you need to learn some basics on XML, check out w3schools.com. They have wonderful free tutorials that teach everything you need to know about XML, DTD, XSLT and a host of other Web-based technologies. It was very helpful to me.

    For example, it is really helpful if you learn how to read a DTD before you apply the XML structure so you can do it correctly.

    If you load the DTD in InDesign it automatically loads the XML tag list and gives you a method to proof your structure. The error messages are kind of cryptic but it’s better than nothing.

    I hope this helps.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>