Building an Index Using Character Styles or External Word List
Rachel wrote:
Is it possible to create a TOC or index from a character style?
We’ve heard this request so many times — along with requests for building an index based on a pre-existing list of words — it’s a wonder that Adobe hasn’t already implemented these features into InDesign. Fortunately, our friend Marc Autret has stepped up to the plate with two free scripts that seem to fill the gap perfectly.
We’ve mentioned Mr. Autret’s scripts before, including the one for swapping items on a page. Back in Episode 78, we also talked a bit about how his Index Brutal script (must be said aloud with a French accent) could create indexes that merge page numbers together. Index Brutal is an awesome script that can build an index for your documents based on a list of words in a text file. He has built in a number of very cool additional features, such as a way to search for one term, but index it under a different word. You can find more about this (including his new English-language instructions) on his Web site.
His second script — “IndexMatic,” which builds an index based on character styles — is new and seems to work well, but requires some brave people to test it and give him feedback to fine-tune it. You can download IndexMatic from this link. I believe he plans to merge the two scripts together in the future to make one über-indexing script. Currently, this is an “InDesignSecrets” exclusive! If you use it, please write comments and suggestion in the space below this blog post.
Here is a screen capture of the script’s primary dialog box, just to give you a sense of its abilities:

I hope you enjoy using it. Thank you Marc!
I see “Active Document” and “All Documents” as Targets. I assume this means All Open Documents, but I’m wondering, can this index an entire Book? Or do you have to have all the documents in a Book open at once to index the entire thing?
Either way this sounds like a great script. I’ll see if I can’t work up some testing for it.
-Derek =)D:
Hi Derek,
You’re right: “All documents” means exactly “all open documents”. So, to index a Book, you’ve to open *actually* the underlying docs before running IndexMatic.
Maybe I could add an “auto-open” option (for entire Book targetting)… Maybe…
Marc,
It is probably better the way you have it, actually. I like to have separate documents for Table of Contents, Introduction and Conclusion sections, which I usually wouldn’t want to be indexed. So, I wouldn’t actually want a full Index for an entire book, now that I think about it a bit more.
-Derek =)D:
The script does not work with nested styles, which is what I need.
I have a large list of doctors (several thousand) that is being published by an insurance company. The doctors were sent as Excel files which I massaged into a consistent format which I could paste into InDesign. This is the format:
SPECIALTY
Last Name, First Name [tab] City [tab] Phone Number
People most likely will look for docs based on specialty, but occasionally they’ll want to look for a doctor by name, so I need an index of just their names. Space is tight, so I need to exclude their cities and phone numbers. Their names are already in Lastname, Firstname format, so I needn’t worry about that. The problem is that their name, city, and phone number are all on one line, separated by tabs. Nested styles can understand tabs, so it’s easy for InDesign to grab their names. It appears that the script does not understand nested styles though, so I’m back to using a macro I developed last year to manually index all of these names.
JT, you are correct: It doesn’t work with nested styles. I should have mentioned that in my original post. It is a frustrating limitation of InDesign because the character styles aren’t truly applied to the text, so scripts can’t “see” them.
It might work if there were a separate script that converted all nested styles into real styles. That might have other useful applications, too. For example, currently, if you export a file with nested styles out as an RTF document, I think you lose the nested styles, too. An “Expand Appearance” script would help in that situation, as well.
How about the PRESERVELOCALFORMATING.jsx script that you guys linked me up with about a year ago. I think it was Dave Saunders who wrote it. That script is the biggest and most useful script in my script arsenal.
Even when you have a nested style, it will pick if it’s bold, bold italic etc. and then it is given a style accordingly.
I’m sure some scripter could point you in the way of picking out a nested style and applying the character style to it.
That kind of defeats the purpose of having a nested style though. As I found a very useful way to use the nested styles and applying a style to some text so that you can get the same text into 2 different variables. Which came in very handy for a book I was setting.
Thanks for your feedbacks.
About the Nested Style Problem, I wrote this short message to David during the script development:
Nested styles are very tricky to handle because there are not really applied to the characters. Since the “nested-style-effect” depends dynamically on the context (according to the settings made from the parent paragraph style), InDesign is not able –apparently– to target nested styled text from the find/replace dialog. Also did you notice that when you select a character which is currently nested-styled, the underlying char’style is not activated in the Character Styles Panel ?
So, thinking about the functionality that JT is looking for, I wonder if the IndexMatic missing feature is not, in fact, a InDesign missing feature (“Flat Nested Styles”)
From my modest place, I don’t know the way to script that, but Dave Saunders is a great script developer and I’m sure that if he knows how to actually apply nested styles to the concerned characters, he will tell us soon.
I just want to point out that it could be a problem to insert this functionnality within IndexMatic, because it would modify the document before processing…
Thank you for the clarification, Marc. I agree that “expand nested styles” (or “flatten nested styles” or whatever it should be called) should not be part of IndexMatic. It’s a separate script.
I would see this as a temporary measure… You’d save the file, then “expand” the nested styles, then process with IndexMatic, then use Revert to get back to where you were.
Eugene, the script you mention doesn’t entirely do what we need here. But I’ll ask Dave S. if he thinks such a thing can be done.
Just a small piece of feedback:
I tried a quite simple thing: attached a textformat to some (german) names and gave the script a try. The list I got looks like this
aschemann 8
beise 11
döbele 7
heimes 11
höfert 4
jipp 8
kneitschel 5
kollak 6
kroner 9
mamerow 9
margulies 9
mei 4, 7
ner 4, 7
plötz 10
reuter 11
schmidt 7
schneider 11
schwarz 11
specht 11
spornitz 5, 10
As you can see, the names are uncapitalized und there is a problem with the german “sharp s” = ß
mei 4, 7> and ner 4, 7 should be Meißner 4, 7
Robert
David,
I disagree that preserve local formatting doesn’t do it. My copy does (although, only for italic and bold, subscript and superscript). So, I’m not sure how useful that is.
CS3 has a new text property: appliedNestedStyle which looks like the answer to a maiden’s prayer but unfortunately you’d have to look at every character to work out where ranges of appliedNestedStyle happen because nested styles are invisible when using textStyleRanges.
On the other hand, if a nested style applies any of the four formatting attributes that my script looks for, then my script will find them because searching for formatting works even when the formatting is the result of a nested style.
But I don’t think this is getting us closer to the goal. Presumably, you want to have an “Index Me” style that doesn’t actually apply any detectable formatting.
I guess what you could do is to have the script change that style to apply something detectable but benign, e.g., an AltBlack swatch. Now, instead of searching for the style, search for the AltBlack formatting. When finished, delete AltBlack and replace with Black.
Obviously this falls on its nose if the original character style applied Green, but you get the idea.
Robert, I can’t speak to the ß problem, but please try the “Case Formatting” pop-up menu in the script to control how the index is capitalized. The “raw strings” option might be best for you?
Dave, I see your point! We need to pick some formatting that Find/Change can find, but that is virtually never used in the document. I would vote for:
* set the text to 100.01% vertical scaling
* strikethrough with color set to None
* or using a stylistic set in opentype features
Perhaps if one of these is in the nested style, you can then find it and apply the “indexme” style?
Watch out for using 100.01% of anything. Rounding errors in floating point numbers could make that impossible to find. The other ideas are good. But whatever, they could be temporarily added by the indexing script. It could look down a list of features for one that is not already applied by the style, temporarily add that to the style, do the search based on that, then remove it from the style.
Only possible problem is if that same attribute was locally applied to some text that also happens to have the nested style applied — I think that’s why I like my idea of creating a swatch. Given that it didn’t exist before the script was run, it can’t already be applied anywhere. Although, of course, it could wipe out other local styling — geez! These sneaky workarounds always bite you.
Thanks Robert,
The ß-bug is a very very strange thing.
The bad news is that similar “holes” seem to appear with other extra-Ascii chars… on Mac OS only! I don’t know why and that’s a serious problem. A French user noticed that the LATIN SMALL LETTER OE (œ, U+0153) creates the same hole than your “sharp s” (U+00DF).
(The “case formatting” option will not change anything to this.)
It’s not easy for me to fix this bug because it doesn’t occur on the InDesign Win version and where I am today I just can use this one.
Please, could somebody tell me if the GREP [[:alpha:]]+ syntax catch all the characters of the word Meißner (in the Find feature)?
And what about the alternate |w+ syntax?
(replace | by an antislash, the IDS blog doesn’t want me to key it)
I suspect that the Posix scheme [[:alpha:]] could be a problem on Mac OS… Just a supposition…
Dave and David, thank you for your suggestions.
The first method — checking the appliedNestedStyle Text property on each character — could be a solution for short documents. But that’s so frustrating!
The second method, based on a temporary “conversion” of the nested style into something that Find could see, is a genious idea.
Hmmm, I’ve to think about it…
David,
sorry, in my hurry to test the script I ignored the case formating thing.
Marc,
I tried the two GREP-searches you suggested on “Meißner” and in both cases it finds “Mei” and “ner” but not the sharp s.
Robert
Did some more testing with the names and found that there is no way to get two (or more) words into one line in the index. Especially in Germany compound names (like Leutheusser-Schnarrenberger) are very common and it is not possible to keep them together. Would it be possible to respect nonbreaking hyphens and nonbreaking spaces to solve this?
A second thing: words containing a descretionary hyphen do not appear at all in the index.
Robert
Well, I missed loads of this and I caught up this morning. Seems like the preservelocalformatting thing might work after all.
Nested styles need a general rule to be applied don’t they? So if it was the first word of a sentence or a em space text em space that would apply the nested style. I’m guessing it’s for body text, but the nested style has to have some trigger.
Is it not possible to GREP search the triggers and then apply a character style. You can find in between em spaces for example and you could apply a style to the text that you find. It wouldn’t affect the nested style either.
It’s just a matter of searching your text using the criteria you set up in the nested styles.
You could then technically use the FindChangebyList that ships with CS3 to find all the nested styles and apply the style that you want, in one big swoop.
Does any of that sound doable?
Robert said: I tried the two GREP-searches you suggested on “Meißner” and in both cases it finds “Mei” and “ner” but not the sharp s.
OK, here we have the culprit !! On Mac OS, it seems that InDesign DOES NOT recognize ß or other extra-Ascii latin characters AS a word letter!
If other Mac users confirm that point, that’s a InDesign bug, because the GREP [[:alpha:]] (Posix) and antislash w are supposed to match any alphabetic/word letter.
So, I’ve to rewrite the GREP command to get round the bug…
Robert said: Would it be possible to respect nonbreaking hyphens and nonbreaking spaces to solve this? A second thing: words containing a descretionary hyphen do not appear at all in the index.
Good point. I’ve to add some extra parameters in the dialog to let users extend the GREP search scope.
That’s easy, but I need some time to analyse the different cases.
Robert: Also, make sure you are choosing “longest string” from the Search For pop-up menu. If you apply the character style to “silly rabbit” and use “alphabetic words,” then you will get two entries (”silly” and “rabbit”). But if you choose “longest string” then you get one entry.
I solved my problem by using a GREP search and the IndexMatic script. Here’s how, in case anyone reading this post has the same problem:
Again, the format is:
Doctor Name [tab] City, ST [tab] Phone Number
For the index, I care only about the name. The names are already formatted as I need them, I just need to get them into an index.
I do a GREP find-and-replace with the following search string:
^[(backslash)l(backslash)u(backslash)d, (backslash).-"'(backslash)((backslash))]+(Sorry, but my backslash characters get munged when I post.)
This finds strings starting at the beginning of a paragraph, and consisting of upper- and lowercase letters, commas, spaces, periods, hyphens, single- and double quotes, parentheses, and digits (for the heck of it): characters that appear in people’s names in Roman-based languages. When InDesign encounters the tab character, it stops the match. I restrict this search to text with the paragraph style I applied to all the listings.
I replace this string with a special character style. I include a special color with this character style so it will be easy to see what has been matched and what hasn’t. Then I run the IndexMatic script on this character style.
It only took a few seconds for InDesign to apply this character style to 6200+ names, and about 10 minutes for IndexMatic to crank out the index. Compare that to hours doing it by hand or even with a keystroke-saving macro.
I couldn’t have done it without stuff I learned from this site, so… thanks!
I need to make an index (first ever for me) and this script seems to be just the thing I need, but I have two questions:
Can the script search for multi-word keywords (example: radiator coupling) and create multi-word index-words without having to do a find & replace on the word list my customer will have to supply?
And, is it preferable to have words that have multiple spellings or abbreviations mentioned once using the > operator, or to simply mention every spelling?