is now part of CreativePro.com!

eBook Production Backwards: From EPUB to InDesign

20

[This is a guest post by Nick Barreto, who makes ebooks and other digital products at Quercus Books in London. He’s a fellow denizen of the #eprdctn Twitter intrest group — ebook design and production people from around the world — and originally titled this, “#eprdctn Backwards”.]

The Problem

Recently I had an interesting conundrum. We released a title as an ebook which was due to be published in print in a few months’ time. It turned out that the files we had for this title were old image PDFs, and the quality really left something to be desired. We couldn’t exactly print it as it was, and rekeying the entire thing would be time consuming and expensive.

We did, however, have perfectly good ebook assets for this title, which we had made using OCR and some careful QA (quality assurance). This is probably something of a rarity, but it will no doubt become more and more common as backlist titles continue to be published as ebooks. At the very worst, we wouldn’t need to rekey because we had the text in a digital format. The problem, however, is that EPUB is a format InDesign can create (with a little help), but isn’t able to import.

I knew there must have been someone who enountered a similar issue before, and that there are a wealth of InDesign plugins that extend functionality, so I went to do some digging online.

The Solution: ickmull

Turns out, I got lucky. I came across ickmull. Ickmull was created to allow you to import XHTML files into InDesign. As EPUB is actually a collection of XHTML files, this means it works for EPUB, too! I’ll explain how it works, then go throught the actual process of getting it done.

Ickmull is actually an XSL Transformation, which is a way to manipulate XML files. XHTML is an XML-compliant version of HTML, so XSL works on these files. XHTML is also the required format for HTML files in an EPUB. What ickmull does is transform your XHTML files into an ICML file, which you can import into InDesign.

ICML is the native file format for Adobe InCopy documents.  When used in conjunction with InDesign, InCopy lets writers, editors and designers collaborate on files without stepping on each other’s toes. It’s got many great uses, but for this purpose we’re just using it as a styled text format to place into InDesign.

[Note from AMC: Why is this Google Code project called “ickmull”? When I asked, they told me it’s how you pronounce ICML. An inside geek joke.]

The Process

The first thing you need to do it unpack your EPUB file so you can get at its internal files. I personally use EPUB unzip (Mac only) which makes it as easy as a drag and drop. It’s actually an update to these scripts that Anne-Marie wrote about a few years ago.

EPUB Package

The contents of the XHTML files —the ones that end with an .xhtml or .html extension, as in the screen shot above — are what you’re concerned with here. (Note that your EPUB file may well have a different internal structure. If you’re not familiar with EPUB, I do highly recommend Liz Castro’s book to get you started). It’s going to be easiest if you take these out and put them in their own folder.

Then, download the ickmull .xsl file from here and put it in the same folder. I used tkbr2icml-v044.xsl because it is the newest.

Now, running XSL is something I usually do in oXygen XML Editor, which is a fantastic Mac, Windows, or Linux application for handling XML and I highly recommend it for professional EPUB editing. I couldn’t live without it. You can, however, easily run ickmull without it. It does involve going into the command line, though.

Running ickmull from the command line

On the Mac, open up Terminal (Applications > Utilities > Terminal) and type “cd ” (without the quotes, of course) at the prompt. Don’t press Return or anything else yet. Drag and drop the folder in which you’ve placed your XHTML files and the ickmull.xsl anywhere onto the Terminal window.

DragAndDrop-2

Dragging and dropping like this tells Terminal the exact location (filepath) of the folder containing the files you want to work on, and saves you from having to write out the filepath and potentially making a mistake. Press the Return key and you’ll see the entry point change a little. Now you’re ready to run ickmull.

You’ll be doing this using the xsltproc command, which comes built into pretty much every Mac. All you need to do is type in the following command (or copy and paste):

xsltproc --output NEWFILENAME.icml --novalid tkbr2icml-v044.xsl FILENAME.html

NEWFILENAME, in the line above, stands for the name of the file you want to create with ickmull. FILENAME is the name of the file you’re doing the transform to. You’ll need to do it once for each file you have. In my case, they’re all called ch01.html, ch02.html, ch03.html, and so on. So I run:

xsltproc --output ch01.icml --novalid tkbr2icml-v044.xsl ch01.html
xsltproc --output ch02.icml --novalid tkbr2icml-v044.xsl ch02.html
xsltproc --output ch03.icml --novalid tkbr2icml-v044.xsl ch03.html

… and so on. After you do this you’ll have a .icml file for each of your html files. All that’s left to do is to import them into InDesign.

If you’re on a Windows computer you can still use xsltproc, by following the instructions here, or you can use another XSLT processor.

Placing into InDesign

Now you’ve got your ICML files, all you need to do is place them in an InDesign document. Create a new blank document in the dimensions you want, then use File > Place to load the ICML files onto the cursor. Hold down Shift as you click on the page to auto-add as many pages as necessary.

After you’ve placed the first file, you’ll notice the paragraphs are all unstyled and look the same. They will, however, have brought styles with them. It’s just that each style has the same lack of settings.

importedstyles-2

If you edit those styles, give them the look you want, you’ll start to see your document change. Note that any italics or bold will have come through as either “i” or “b” in the Character Styles panel, not Paragraph Styles. Feel free to delete any unused styles to make your life easier.

The Caveats

Now you’ve got all the text from your EPUB into InDesign, and you’ve used your styles to set it up how you want it, there should be very little work left to get it into print-ready shape. There are, however, a few caveats to keep in mind with this method.

If you try to make any text changes, you’ll notice that InDesign will ask you to ‘check out’ that file. This is because all your text is actually linked to the native ICML files you placed, and InDesign is preventing two people from editing the same content at once (you, and someone with the same file open in InCopy). It thinks you’re working in a collaborative InDesign/InCopy workflow.

You can quickly change that. Select all the entries ending with .icml in the Links panel and choose Unlink from the Links panel menu.  Now the text frames are normal InDesign text frames again and you don’t need to check anything out or keep track of the ICML files.

Additionally, I had an issue when I did this where the first few paragraphs of every .icml had the italic character style applied. It didn’t take much cleanup to sort out, but I thought it was worth nothing.

The final thing to keep in mind is that I did this with EPUB files that had good, clean HTML code. I have no idea what you’ll get if you try it with some of the horrible tag-soup EPUB files that you can get from some conversion houses. Not all EPUB is created equally, so your mileage may vary.

Anne-Marie “Her Geekness” Concepción is the co-founder (with David Blatner) and CEO of Creative Publishing Network, which produces InDesignSecrets, InDesign Magazine, and other resources for creative professionals. Through her cross-media design studio, Seneca Design & Training, Anne-Marie develops ebooks and trains and consults with companies who want to master the tools and workflows of digital publishing. She has authored over 20 courses on lynda.com on these topics and others. Keep up with Anne-Marie by subscribing to her ezine, HerGeekness Gazette, and contact her by email at [email protected] or on Twitter @amarie
  • This is extra-geeky goodness! No way to batch process all the xhtml files?

    • Gabriele says:

      One possible strategy is to concat the HTML files together (I use a slightly modified htmlcat version (https://freecode.com/projects/htmlcat)) the run the xsl on the resulting file.

      • Rick Gordon says:

        @Gabriele, wouldn’t that require rebuilding the OPF and nav/ncx files as well?

      • Gabriele says:

        The ickmull.xsl stylesheet converts (i.e. transforms) XHTML->ICML, so OPF and NCX are not considered. The assumption here is that you place the singular ICML files in the correct order, or that you have a single XHTML concatenated in the proper order. I used to use a python script to convert legacy EPUBs in single XHTML source, which extracted the XHTML in spine order and renamed them prepending a padded number, so that the bash shell could expand the * wildcard in the proper order. Is that what you mean by “rebuilding the OPF”? Maybe I’m missing something.

    • Bruno Herfst says:

      You can setup batch processing like this:

      #!/bin/bash
      for filename in /Folder/*.html; do
      for ((i=0; i<=3; i++)); do
      base=${filename%.html}
      xsltproc –output "$base.icml" –novalid tkbr2icml-v044.xsl filename
      done
      done

  • Rick Gordon says:

    @Gabriele, that clarifies it for me. Thanks.

  • Sharon says:

    You had me up until you gave the instructions for using the cd line only on a Mac. When I drag the folder onto the cd window on my PC nothing happens. Our ebook is coming out before the print book, so most of the final editing and proofing has been done in the epub file. Now I have to put it into inDesign to prepare it for printing. I thought I had found the answer.

  • Sharon, well the drag-dropping is only so you don’t have to type in the path … it’s a shortcut. You could always type it in manually.

    Also, Terminal on the Mac is the command line interface to its UNIX core. Windows isn’t based on UNIX, so assuming you’ve found the Windows equivalent to Terminal, it probably uses different command to do stuff.

    I think in your case you should consider buying oXyGen Author (which is avail for Mac and Windows) and running the XSL there. … Possible?

    • Sharon says:

      I can manage the dragging and dropping to the command line in windows and have used it from time to time for other things, but there’s no way I can afford to buy oXyGen Author for just occasional use. I’m not an expert in XML or HTML, although I can do passable CSS. I’ll either have to find some other way to transform the epub to IDD or just copy and paste and then battle the paragraph styles that will follow. Thanks.

  • @Sharon A lower cost Mac XML editor for working with XML and XSLT is Ximplify. It has built in support for XSLT version 1.0 and will shortly have support for external parsers like Saxon. it is a nice little program with a very responsive developer.

  • Sam says:

    You opened my eyes to a few things I didn’t know about. Very helpful. Please do more in-depth tips like this soon!

  • mr1m says:

    Hi,

    First let me declare that I know little of coding.
    That said I really need to import a epub file into my indesign.
    Now I am running windows.I got up to placing the xsltproc necessary files in the proper windows folder aka system32.
    Even checked the version list to see if it is working and thank God it is.
    Now what has me baffled is what parameters to use for in cmd.I don’t know if draggin and dropiing will work.Please help out a poor know-it-nill on what parameters to use to convert those epub extracted html to icml with the cmd.

  • Himanshu Arora says:

    I have been able to convert html into icml but when i am importing it in Indesign, It is just displaying me the text but not the Image and attached Assets.

    Could you Please provide me a solution for this. It would be really helpful.

  • Thank you for this tutorial!!! I have a successful ebook that I am converting into paperback and have been putting it off due to the daunting task of conversion via Indesign with which I have little experience. Uour tutorial makes me feel confident that this will be easier than I thought! Much appreciated.

  • Epubber says:

    Everything went fine until I got to the command xsltproc –output NEW FILENAME.icml –novalid tkbr2icml-v044.xsl FILENAME.xhtml, which caused the terminal to respond: warning: failed to load external entity “tkbr2icml-v044.xsl”
    cannot parse tkbr2icml-v044.xsl

    I put the ickmull file in the same folder as the xhtml files. Does it matter that my files are xhtml and not .html as in the command line example offered in this tutorial?

  • Anon says:

    If it’s of help to anyone, here’s some code:

    https://gist.github.com/mhulse/4f3fe401c76c2ec2a79382279c7c42bd

  • John Malcolmson says:

    Stuck. I can’t get to the ickmull directory with terminal OS X. Says No such file or directory regardless of where I put it (root, desktop, etc). Here’s a pic of what terminal produces:
    https://www.dropbox.com/s/tuj46lkp88rvjpp/path.png?dl=0

    I have installed Oxygen XML editor, is there a way to produce same result with this software (bypassing terminal)?

    Thanks in advance.

  • Michael Hamilton says:

    I have a question. I have an indesign file made up of scanned images of text. I want to save it as a pdf for ebook readers, but the files are too big. I tried exporting as pdf(interactive) 72dpi at low res and the text is just illegible at that point. Any ways to make a small file without ruining the legibility?

  • Felix Vigor says:

    I would like to use this method to import epubs with many footnotes into inDesign. I have done a test with an epub with a single xhtml file with well linked text and footnotes. I’ve gotten an icml file, and put it into a new InDesign document. However, the footnotes do not appear at the bottom of each page as when I import text from Word but they appear all together at the end. I am doing something wrong?

  • >