eBook Production Backwards: From EPUB to InDesign
[This is a guest post by Nick Barreto, who makes ebooks and other digital products at Quercus Books in London. He's a fellow denizen of the #eprdctn Twitter intrest group — ebook design and production people from around the world — and originally titled this, "#eprdctn Backwards".]
Recently I had an interesting conundrum. We released a title as an ebook which was due to be published in print in a few months’ time. It turned out that the files we had for this title were old image PDFs, and the quality really left something to be desired. We couldn’t exactly print it as it was, and rekeying the entire thing would be time consuming and expensive.
We did, however, have perfectly good ebook assets for this title, which we had made using OCR and some careful QA (quality assurance). This is probably something of a rarity, but it will no doubt become more and more common as backlist titles continue to be published as ebooks. At the very worst, we wouldn’t need to rekey because we had the text in a digital format. The problem, however, is that EPUB is a format InDesign can create (with a little help), but isn’t able to import.
I knew there must have been someone who enountered a similar issue before, and that there are a wealth of InDesign plugins that extend functionality, so I went to do some digging online.
The Solution: ickmull
Turns out, I got lucky. I came across ickmull. Ickmull was created to allow you to import XHTML files into InDesign. As EPUB is actually a collection of XHTML files, this means it works for EPUB, too! I’ll explain how it works, then go throught the actual process of getting it done.
Ickmull is actually an XSL Transformation, which is a way to manipulate XML files. XHTML is an XML-compliant version of HTML, so XSL works on these files. XHTML is also the required format for HTML files in an EPUB. What ickmull does is transform your XHTML files into an ICML file, which you can import into InDesign.
ICML is the native file format for Adobe InCopy documents. When used in conjunction with InDesign, InCopy lets writers, editors and designers collaborate on files without stepping on each other’s toes. It’s got many great uses, but for this purpose we’re just using it as a styled text format to place into InDesign.
[Note from AMC: Why is this Google Code project called "ickmull"? When I asked, they told me it's how you pronounce ICML. An inside geek joke.]
The first thing you need to do it unpack your EPUB file so you can get at its internal files. I personally use EPUB unzip (Mac only) which makes it as easy as a drag and drop. It’s actually an update to these scripts that Anne-Marie wrote about a few years ago.
The contents of the XHTML files —the ones that end with an .xhtml or .html extension, as in the screen shot above — are what you’re concerned with here. (Note that your EPUB file may well have a different internal structure. If you’re not familiar with EPUB, I do highly recommend Liz Castro’s book to get you started). It’s going to be easiest if you take these out and put them in their own folder.
Then, download the ickmull .xsl file from here and put it in the same folder. I used tkbr2icml-v044.xsl because it is the newest.
Now, running XSL is something I usually do in oXygen XML Editor, which is a fantastic Mac, Windows, or Linux application for handling XML and I highly recommend it for professional EPUB editing. I couldn’t live without it. You can, however, easily run ickmull without it. It does involve going into the command line, though.
Running ickmull from the command line
On the Mac, open up Terminal (Applications > Utilities > Terminal) and type “cd ” (without the quotes, of course) at the prompt. Don’t press Return or anything else yet. Drag and drop the folder in which you’ve placed your XHTML files and the ickmull.xsl anywhere onto the Terminal window.
Dragging and dropping like this tells Terminal the exact location (filepath) of the folder containing the files you want to work on, and saves you from having to write out the filepath and potentially making a mistake. Press the Return key and you’ll see the entry point change a little. Now you’re ready to run ickmull.
You’ll be doing this using the xsltproc command, which comes built into pretty much every Mac. All you need to do is type in the following command (or copy and paste):
xsltproc --output NEWFILENAME.icml --novalid tkbr2icml-v044.xsl FILENAME.html
NEWFILENAME, in the line above, stands for the name of the file you want to create with ickmull. FILENAME is the name of the file you’re doing the transform to. You’ll need to do it once for each file you have. In my case, they’re all called ch01.html, ch02.html, ch03.html, and so on. So I run:
xsltproc --output ch01.icml --novalid tkbr2icml-v044.xsl ch01.html
xsltproc --output ch02.icml --novalid tkbr2icml-v044.xsl ch02.html
xsltproc --output ch03.icml --novalid tkbr2icml-v044.xsl ch03.html
… and so on. After you do this you’ll have a .icml file for each of your html files. All that’s left to do is to import them into InDesign.
If you’re on a Windows computer you can still use xsltproc, by following the instructions here, or you can use another XSLT processor.
Placing into InDesign
Now you’ve got your ICML files, all you need to do is place them in an InDesign document. Create a new blank document in the dimensions you want, then use File > Place to load the ICML files onto the cursor. Hold down Shift as you click on the page to auto-add as many pages as necessary.
After you’ve placed the first file, you’ll notice the paragraphs are all unstyled and look the same. They will, however, have brought styles with them. It’s just that each style has the same lack of settings.
If you edit those styles, give them the look you want, you’ll start to see your document change. Note that any italics or bold will have come through as either “i” or “b” in the Character Styles panel, not Paragraph Styles. Feel free to delete any unused styles to make your life easier.
Now you’ve got all the text from your EPUB into InDesign, and you’ve used your styles to set it up how you want it, there should be very little work left to get it into print-ready shape. There are, however, a few caveats to keep in mind with this method.
If you try to make any text changes, you’ll notice that InDesign will ask you to ‘check out’ that file. This is because all your text is actually linked to the native ICML files you placed, and InDesign is preventing two people from editing the same content at once (you, and someone with the same file open in InCopy). It thinks you’re working in a collaborative InDesign/InCopy workflow.
You can quickly change that. Select all the entries ending with .icml in the Links panel and choose Unlink from the Links panel menu. Now the text frames are normal InDesign text frames again and you don’t need to check anything out or keep track of the ICML files.
Additionally, I had an issue when I did this where the first few paragraphs of every .icml had the italic character style applied. It didn’t take much cleanup to sort out, but I thought it was worth nothing.
The final thing to keep in mind is that I did this with EPUB files that had good, clean HTML code. I have no idea what you’ll get if you try it with some of the horrible tag-soup EPUB files that you can get from some conversion houses. Not all EPUB is created equally, so your mileage may vary.