How to Export Basic HTML Out of CS2
Let's say you have a magazine, annual report or newsletter you've done in InDesign CS2. Something longish with a number of discrete articles or sections within it. The web designers in your company need you to give them the content in a web-friendly format, one .html file with basic XHTML tags per article or section, in a format that easy to integrate into the company's existing web site and CSS.
Guess what: With a little bit of preparation, you can give them what they want, using the CS2 features you already have. No need to cut and paste every dang text frame into text files and apply bold tags manually, no need to buy a plug-in (if there was an Export to HTML one available, which there's not), no need to export to RTF or PDF and rely on other programs doing an unusable conversion. Yes, InDesign CS3 has an Export to XHTML feature, but here's a solution that you can use in the meantime. [Edit: For a detailed comparison of the two methods, see my comments below this post. --AM.]
I have to say that I didn't come up with this on my own. It was an off-hand comment from Jim Maivald that showed me the light. Jim's the leader of Chicago's InDesign User Group and a long-time friend, colleague and occasional freelancer of mine. We were talking XML - his speciality - and he mentioned something about how you can export HTML using InDesign's XML tools. "Show me!" says I. Which was kind of hard, since were on the phone. But the basic procedure he explained was straightforward and do-able by most print designers, I thought.
And that's what I'm going to write about here. I'm sure there are more tweaks and scripts and techniques you could use to get even more useful HTML out of ID CS2, but I'll leave that up to you to explain in your comments to this post.
We want a stand-alone .html file for each article or section in our publication. Each file needs to contain basic HTML mark-up indicating titles, subheads, body text (each paragraph surrounded by
<p> tags), emboldened and italicized text, and bulleted and numbered lists. It's possible to also include images, but they require just enough extra work that I'll leave that for another post. [Edit: That post appears here. --AM].
Most modern web sites use CSS to customize the look of the basic text formatting tags, so we don't have to worry about the final HTML file looking like the web site, we just need to have the proper tags in place. That way, once the web designer links the HTML file to the CSS file, our plain Jane "Header 1" (
<h1>) titles take on the look the designer specified for the
<h1> tag in the CSS file.
It works just like how paragraph styles in a Word file get the formatting specified in InDesign's Paragraph Styles palette after it's placed, if the style names are exactly the same.
Step 1: Style All Text You Want to Export
In each article you want to export to HTML, go through the text and make sure everything is linked to a paragraph or character style. Local formatting will get stripped out during the export (that's a good thing), so if you want to hang on to your bolds and italics, create and apply Bold and Italic character styles to the text instead. You can use Find/Change Formatting to automate this; also Dave Saunders has a script called "Preserve Local Formatting" (download from our Plugins and Scripts page) that could help.
If you have paragraph styles for bulleted and numbered paragraphs, you'll get better results in the HTML if the paragraph style calls for InDesign's auto-bullets and auto-numbers feature (instead of you typing in the bullets or numbers yourself). If not, it just makes for a little extra clean-up by your web team. But make sure that your bulleted and numbered lists are linked to their own paragraph styles, regardless.
Step 2: Load Your Tags Palette
Open the Tags palette (Window > Tags) and add the basic HTML tags one by one: div, h1, h2, h3, h4 (h1=biggest title, h2=slightly smaller), p, img, strong (for bold), em (short for emphasis, makes web text italic), and for your lists, at least li (list item), and perhaps ol (numbered list) and ul (bulleted list) as well... more on lists below. You'll also want to add "html," "table," and "td" to the list of tags.
Too much work? Download my HTML tags file, unzip it and choose Load Tags from your Tags palette menu. Point it at the file you downloaded (it's a small, cross-platform xml file) to import the tags I listed above.
You'll see the default Root tag in the palette. Select it and click the trashcan icon to delete it, and at the prompt replace it with the "html" tag you just added.
Step 3: Tweak InDesign Structure Defaults
Open up the Structure panel (View > Structure > Show Structure) and choose Tagging Preset Options from the Structure panel's fly-out menu. Choose "div" for Text Frames, "table" for Tables, and "td" for table cells. (Actually, this method doesn't translate InDesign tables into HTML tables well at all. I'm open to suggestions, but at least you'll have the correct tag for each cell.) Click OK to close the Options dialog box.
While you're here in the Structure panel menu, choose Show Text Snippets toward the bottom. That'll help you identify content in the panel later on.
At this point the Structure panel should show only one tag, "html," at the top.
Step 4: Automatic Tagging with Map Styles to Tags
Now you need to specify which styles in your layout should become which tags in the HTML. Do this by opening the Map Styles to Tags dialog box from the Tags palette menu.
You likely have many more paragraph and character styles than you have HTML tags; so you'll be mapping multiple styles to the same tag. For example, usually all Body style variations (Basic Paragraph, Body first, Body indented, Sidebar body, etc.) should be mapped to the same basic "p" tag, which sort of stands for a generic body text paragraph that will be styled by the CSS. Remember, we're trying to keep things simple!
Map major headlines to h1, map subheads to h2 or perhaps h3, and so on. Map your character styles for bold and italic to "strong" and "em" respectively. Don't map any style to the following tags, which are applied or used elsewhere: table, td, div, img, ol and ul.
Lists can be partially automated. Map all bulleted and numbered list styles to the same "li" (list item) tag. You'll have to insert the "ol" (numbered) and "ul" (bulleted) tags manually, which I'll get to in a bit.
Are you ready? Now that InDesign knows how to tag 90% of your content, it'll do so, as soon as you click the OK button in the Map Styles to Tag dialog box.
Click OK and bang! Everything in the document is tagged with an HTML tag. Well, almost everything ...
Step 5: A Few Manual Tags (optional)
If you want to experiment with references to your images in the HTML file, you'll have to manually tag your images with the "img" tag. Just select an image and click the img tag, you'll see a reference to the image appear in the Structure panel. You can shift-click multiple images and just click once on the "img" tag to apply the tag to each image in the selection.
Also, you'll have to surround your bulleted and numbered lists with the opening and closing "ul" or "ol" tags manually, as far as I can tell (maybe someone has a suggestion?).
It's easiest to do this in the Story Editor (Edit > Story Editor), where the XML tags are quite visible. Just drag over an entire range of bulleted or numbered paragraphs to select them, making sure the selection includes the first opening li tag and the last paragraph's closing li tag. Then click the "ul" tag in the Tags palette to turn the selection into a bulleted list in the HTML, or click the "ol" tag for a numbered list. Here's a before and after:
Use the same technique whenever you need to "double-tag" something, like adding em tags to some text you already tagged with strong (resulting in bold italics).
If you make a boo-boo, try Undo. Otherwise, you should know that you can't delete the tags themselves with the backspace or delete keys once they've been applied; instead, select the item or block of text and click the Untag button in the Tags palette, then click on a different tag. If there's content you don't want included in the HTML export, Untag it and leave it untagged.
Step 6: Tweak the Structure
If you look at your Structure panel now, you'll see that all the tagged content has been added under the "html" (nee "Root") element. Each story - a text frame or threaded set of frames - is in its own "div" tag. Twirl open the disclosure triangle to the left of each div to reveal each paragraph of the story tagged with "p" or other tags. If you remembered to turn on Show Text Snippets from the Structure panel menu (back in Step 3), you should see the first few words of each paragraph next to their tags.
You'll note that the order of the stories here doesn't match their page order in the layout. You can easily re-order them by dragging the parent div tags in the Structure panel and dropping them in the correct location, if necessary. If you're not sure which div tag refers to which story, and you can't tell from the text snippet, select a text frame in the layout and look for which div sprouts an underline, indicating the parent frame.
Step 7: Export the HTML
To export everything you've tagged in the layout into one long HTML file, begin by selecting the top html tag in the Structure panel. If you want to export individual articles or sections, select the relevant div tags in the Structure panel before you export, so only those selected stories are included in the HTML file. (You could alternatively shift-click frames in the layout, which selects their divs in the Structure panel.)
Then choose Export XML from the Structure panel menu. In the Open/Save dialog box, name the file and change the .xml suffix to .html (important if you want to preview the results in your browser). Then click OK, which opens up the last dialog box we'll have to deal with, XML Export Options.
In the General Options section, turn on the View XML check box and choose the name of your favorite browser or web authoring program. Also, if you had selected specific div's to export, be sure to turn on the check box for "Export from Selected Element." The other panel in XML Options is specific to images. Since I'm not dealing with images in this post, I'll ignore it. You can too, you have my permission.
Step 8: View the Results
As soon as you click the Export button in the XML Export dialog box, your browser or web authoring program opens and shows you the results of your efforts. Even though the HTML file is not fully "legitimate" - it lacks a header section, a DOCTYPE declaration, and other niceties - most programs should be able to show you the text and formatting. Bolds and italics should be intact, and all your titles and subheads should look something like titles and subheads.
It's plain, all right, but at least you have all the basics in place so that you or your web team can easily plonk the source code you supplied into your existing web site for some instant custom formatting, courtesy of the CSS.