eBook Production Backwards: From EPUB to InDesign
[This is a guest post by Nick Barreto, who makes ebooks and other digital products at Quercus Books in London. He’s a fellow denizen of the #eprdctn Twitter intrest group — ebook design and production people from around the world — and originally titled this, “#eprdctn Backwards”.]
The Problem
Recently I had an interesting conundrum. We released a title as an ebook which was due to be published in print in a few months’ time. It turned out that the files we had for this title were old image PDFs, and the quality really left something to be desired. We couldn’t exactly print it as it was, and rekeying the entire thing would be time consuming and expensive.
We did, however, have perfectly good ebook assets for this title, which we had made using OCR and some careful QA (quality assurance). This is probably something of a rarity, but it will no doubt become more and more common as backlist titles continue to be published as ebooks. At the very worst, we wouldn’t need to rekey because we had the text in a digital format. The problem, however, is that EPUB is a format InDesign can create (with a little help), but isn’t able to import.
I knew there must have been someone who enountered a similar issue before, and that there are a wealth of InDesign plugins that extend functionality, so I went to do some digging online.
The Solution: ickmull
Turns out, I got lucky. I came across ickmull. Ickmull was created to allow you to import XHTML files into InDesign. As EPUB is actually a collection of XHTML files, this means it works for EPUB, too! I’ll explain how it works, then go throught the actual process of getting it done.
Ickmull is actually an XSL Transformation, which is a way to manipulate XML files. XHTML is an XML-compliant version of HTML, so XSL works on these files. XHTML is also the required format for HTML files in an EPUB. What ickmull does is transform your XHTML files into an ICML file, which you can import into InDesign.
ICML is the native file format for Adobe InCopy documents. When used in conjunction with InDesign, InCopy lets writers, editors and designers collaborate on files without stepping on each other’s toes. It’s got many great uses, but for this purpose we’re just using it as a styled text format to place into InDesign.
[Note from AMC: Why is this Google Code project called “ickmull”? When I asked, they told me it’s how you pronounce ICML. An inside geek joke.]
The Process
The first thing you need to do it unpack your EPUB file so you can get at its internal files. I personally use EPUB unzip (Mac only) which makes it as easy as a drag and drop. It’s actually an update to these scripts that Anne-Marie wrote about a few years ago.
The contents of the XHTML files —the ones that end with an .xhtml or .html extension, as in the screen shot above — are what you’re concerned with here. (Note that your EPUB file may well have a different internal structure. If you’re not familiar with EPUB, I do highly recommend Liz Castro’s book to get you started). It’s going to be easiest if you take these out and put them in their own folder.
Then, download the ickmull .xsl file from here and put it in the same folder. I used tkbr2icml-v044.xsl because it is the newest.
Now, running XSL is something I usually do in oXygen XML Editor, which is a fantastic Mac, Windows, or Linux application for handling XML and I highly recommend it for professional EPUB editing. I couldn’t live without it. You can, however, easily run ickmull without it. It does involve going into the command line, though.
Running ickmull from the command line
On the Mac, open up Terminal (Applications > Utilities > Terminal) and type “cd ” (without the quotes, of course) at the prompt. Don’t press Return or anything else yet. Drag and drop the folder in which you’ve placed your XHTML files and the ickmull.xsl anywhere onto the Terminal window.
Dragging and dropping like this tells Terminal the exact location (filepath) of the folder containing the files you want to work on, and saves you from having to write out the filepath and potentially making a mistake. Press the Return key and you’ll see the entry point change a little. Now you’re ready to run ickmull.
You’ll be doing this using the xsltproc command, which comes built into pretty much every Mac. All you need to do is type in the following command (or copy and paste):
xsltproc --output NEWFILENAME.icml --novalid tkbr2icml-v044.xsl FILENAME.html
NEWFILENAME, in the line above, stands for the name of the file you want to create with ickmull. FILENAME is the name of the file you’re doing the transform to. You’ll need to do it once for each file you have. In my case, they’re all called ch01.html, ch02.html, ch03.html, and so on. So I run:
xsltproc --output ch01.icml --novalid tkbr2icml-v044.xsl ch01.html
xsltproc --output ch02.icml --novalid tkbr2icml-v044.xsl ch02.html
xsltproc --output ch03.icml --novalid tkbr2icml-v044.xsl ch03.html
… and so on. After you do this you’ll have a .icml file for each of your html files. All that’s left to do is to import them into InDesign.
If you’re on a Windows computer you can still use xsltproc, by following the instructions here, or you can use another XSLT processor.
Placing into InDesign
Now you’ve got your ICML files, all you need to do is place them in an InDesign document. Create a new blank document in the dimensions you want, then use File > Place to load the ICML files onto the cursor. Hold down Shift as you click on the page to auto-add as many pages as necessary.
After you’ve placed the first file, you’ll notice the paragraphs are all unstyled and look the same. They will, however, have brought styles with them. It’s just that each style has the same lack of settings.
If you edit those styles, give them the look you want, you’ll start to see your document change. Note that any italics or bold will have come through as either “i” or “b” in the Character Styles panel, not Paragraph Styles. Feel free to delete any unused styles to make your life easier.
The Caveats
Now you’ve got all the text from your EPUB into InDesign, and you’ve used your styles to set it up how you want it, there should be very little work left to get it into print-ready shape. There are, however, a few caveats to keep in mind with this method.
If you try to make any text changes, you’ll notice that InDesign will ask you to ‘check out’ that file. This is because all your text is actually linked to the native ICML files you placed, and InDesign is preventing two people from editing the same content at once (you, and someone with the same file open in InCopy). It thinks you’re working in a collaborative InDesign/InCopy workflow.
You can quickly change that. Select all the entries ending with .icml in the Links panel and choose Unlink from the Links panel menu. Now the text frames are normal InDesign text frames again and you don’t need to check anything out or keep track of the ICML files.
Additionally, I had an issue when I did this where the first few paragraphs of every .icml had the italic character style applied. It didn’t take much cleanup to sort out, but I thought it was worth nothing.
The final thing to keep in mind is that I did this with EPUB files that had good, clean HTML code. I have no idea what you’ll get if you try it with some of the horrible tag-soup EPUB files that you can get from some conversion houses. Not all EPUB is created equally, so your mileage may vary.
This is extra-geeky goodness! No way to batch process all the xhtml files?
One possible strategy is to concat the HTML files together (I use a slightly modified htmlcat version (https://freecode.com/projects/htmlcat)) the run the xsl on the resulting file.
@Gabriele, wouldn’t that require rebuilding the OPF and nav/ncx files as well?
The ickmull.xsl stylesheet converts (i.e. transforms) XHTML->ICML, so OPF and NCX are not considered. The assumption here is that you place the singular ICML files in the correct order, or that you have a single XHTML concatenated in the proper order. I used to use a python script to convert legacy EPUBs in single XHTML source, which extracted the XHTML in spine order and renamed them prepending a padded number, so that the bash shell could expand the * wildcard in the proper order. Is that what you mean by “rebuilding the OPF”? Maybe I’m missing something.
You can setup batch processing like this:
#!/bin/bash
for filename in /Folder/*.html; do
for ((i=0; i<=3; i++)); do
base=${filename%.html}
xsltproc –output "$base.icml" –novalid tkbr2icml-v044.xsl filename
done
done
@Gabriele, that clarifies it for me. Thanks.
By the way, the aforementioned python script to extract HTML parts in spine order from a EPUB file is available here: https://github.com/gabalese/epub-helper/blob/master/epubextractrename.py
Requires Python 3 and lxml, and was used before I published pyepub (https://github.com/gabalese/pyepub), a library which could make the task much less tedious. I might share a py2/xml.etree version, if someone requests it.
You had me up until you gave the instructions for using the cd line only on a Mac. When I drag the folder onto the cd window on my PC nothing happens. Our ebook is coming out before the print book, so most of the final editing and proofing has been done in the epub file. Now I have to put it into inDesign to prepare it for printing. I thought I had found the answer.
Sharon, well the drag-dropping is only so you don’t have to type in the path … it’s a shortcut. You could always type it in manually.
Also, Terminal on the Mac is the command line interface to its UNIX core. Windows isn’t based on UNIX, so assuming you’ve found the Windows equivalent to Terminal, it probably uses different command to do stuff.
I think in your case you should consider buying oXyGen Author (which is avail for Mac and Windows) and running the XSL there. … Possible?
I can manage the dragging and dropping to the command line in windows and have used it from time to time for other things, but there’s no way I can afford to buy oXyGen Author for just occasional use. I’m not an expert in XML or HTML, although I can do passable CSS. I’ll either have to find some other way to transform the epub to IDD or just copy and paste and then battle the paragraph styles that will follow. Thanks.
@Sharon A lower cost Mac XML editor for working with XML and XSLT is Ximplify. It has built in support for XSLT version 1.0 and will shortly have support for external parsers like Saxon. it is a nice little program with a very responsive developer.
You opened my eyes to a few things I didn’t know about. Very helpful. Please do more in-depth tips like this soon!
Hi,
First let me declare that I know little of coding.
That said I really need to import a epub file into my indesign.
Now I am running windows.I got up to placing the xsltproc necessary files in the proper windows folder aka system32.
Even checked the version list to see if it is working and thank God it is.
Now what has me baffled is what parameters to use for in cmd.I don’t know if draggin and dropiing will work.Please help out a poor know-it-nill on what parameters to use to convert those epub extracted html to icml with the cmd.
I have been able to convert html into icml but when i am importing it in Indesign, It is just displaying me the text but not the Image and attached Assets.
Could you Please provide me a solution for this. It would be really helpful.
Thank you for this tutorial!!! I have a successful ebook that I am converting into paperback and have been putting it off due to the daunting task of conversion via Indesign with which I have little experience. Uour tutorial makes me feel confident that this will be easier than I thought! Much appreciated.
Everything went fine until I got to the command xsltproc –output NEW FILENAME.icml –novalid tkbr2icml-v044.xsl FILENAME.xhtml, which caused the terminal to respond: warning: failed to load external entity “tkbr2icml-v044.xsl”
cannot parse tkbr2icml-v044.xsl
I put the ickmull file in the same folder as the xhtml files. Does it matter that my files are xhtml and not .html as in the command line example offered in this tutorial?
If it’s of help to anyone, here’s some code:
https://gist.github.com/mhulse/4f3fe401c76c2ec2a79382279c7c42bd
Stuck. I can’t get to the ickmull directory with terminal OS X. Says No such file or directory regardless of where I put it (root, desktop, etc). Here’s a pic of what terminal produces:
https://www.dropbox.com/s/tuj46lkp88rvjpp/path.png?dl=0
I have installed Oxygen XML editor, is there a way to produce same result with this software (bypassing terminal)?
Thanks in advance.
I have a question. I have an indesign file made up of scanned images of text. I want to save it as a pdf for ebook readers, but the files are too big. I tried exporting as pdf(interactive) 72dpi at low res and the text is just illegible at that point. Any ways to make a small file without ruining the legibility?
I would like to use this method to import epubs with many footnotes into inDesign. I have done a test with an epub with a single xhtml file with well linked text and footnotes. I’ve gotten an icml file, and put it into a new InDesign document. However, the footnotes do not appear at the bottom of each page as when I import text from Word but they appear all together at the end. I am doing something wrong?