Thanks for coming to InDesignSecrets.com, the world's #1 resource for all things InDesign!

Stop Hyphenated (Compound) Words from Hyphenating

Cathy wrote:

Is there an easy way  to change preferences so that it will not hyphenate a hyphenated word? For instance “over-the-mountain” will break “over-the-moun-”  and bring “tain” on the next line. Our editors do not like even if it breaks at “over-the-” and bring “mountain” on the next line – they consider that a bad break.

Once again, my answer depends on whether or not you work by the hour. If you’re paid hourly, then I suggest you painfully, slowly, search through the entire document for all the hyphenated words and apply the No Break style to them.

But if you’re trying to get this done efficiently, there is a super easy way to go about it: GREP Styles.

First, make a character style that simply applies the No Break formatting.

no break character style

Then, in your body text paragraph style, add a GREP Style that assigns your character style to this GREP expression: \b\w+?-\w+?\b

That’s a hyphen in the middle of all that. Here’s a picture of it:

Nobreakhyph2

Now, this will only work with phrases with a single hyphen in them. If you also wanted to capture the “over-the-mountain” phrases, you’d have to alter this a little bit. There’s probably a clever way to do it in a single GREP expression, but I can’t think of it right now, so I’d just add a second grep style that applies the same NoBreak character style to \b\w+?-\w+?-\w+?\b

I hope that helps!

David Blatner

David Blatner

David Blatner is the co-founder of the Creative Publishing Network, InDesign Magazine, and the author or co-author of 15 books, including Real World InDesign. His InDesign videos at Lynda.com are among the most watched InDesign training in the world. You can find more about David at 63p.com.
David Blatner

Latest posts by David Blatner (see all)

  • - November 30, -0001
Tags
Related Articles
Comments

40 Comments on “Stop Hyphenated (Compound) Words from Hyphenating

  1. Ah — a challenge! Try this for “over-the-mountain”, it ought to work. Plus points if you can figure it out ;-)

     \w+(-\w+)+

    An easy way to check where, exactly, a GREP style gets applied is to temporarily add something highly visible to the applied character style. My recommendation is a “highlight”: a big fat yellow underline, shifted up a couple of points. After verifying it works as intended, you can remove it again from the style.

    • \w+(-\w+)+ didn’t seem to do anything but David’s \b\w+?-\w+?\b certainly did the trick. Thank you! This little tip is extremely useful for me.

      • Jupiter, are you sure it does not work for you? I admit I made it up while writing, but I just tested it with a generous underline to see where it gets applied. It works exactly as I expected: on “over-the-mountain”, “double-space”, and even “mother-in-law”. Maybe you forgot to set “No Break” in the style?

  2. Isn’t it more correct to say “Stop Compound Words from Hyphenating”?

    An alternative to ‘no break’ is to use a discretionary hyphen at the start of the word (or compound word).

    Because of the relative performance hit with having lots of GREP styles, I generally prefer using GREP Find/Change.

    Find: \w+(-\w+)+
    Replace: ­~-$0

    Which translates to: Find a word (“\w+”) followed by one or more words that are each preceded by a hyphen (“(-\w+)+”); then replace with a discretionary hyphen (“~-“)followed by the previously found text (“$0”).

    • Jongware: Great! Of course: putting the words in parentheses makes them a little group that can be repeated, I believe.

      Caleb: That’s a terrific alternative. Thank you! (And good point about “compound words.” I’ve updated the title.)

  3. The problem with this solution is it won’t break on the hyphen between the two compound words either.

    • Fred: If you want the words to be able to break at the hyphen between the words, then you could apply No Break to the word before and after, but not to the hyphen. To do that, apply it to \w+?(?=-) and also to (?< =-)\w+ (the first is "any word that is followed by a hyphen" and the second is "any word that is preceded by a hyphen).

  4. I don’t like using GREP styles for this purpose, as it’s on body text and I find that running GREP styles on large amounts of text can cause some slow down with InDesign.

    I do a Find Replace

    (\w+)\-(\w+)\-(\w+)

    Change to
    $1~~$2~~$3

    That just finds all the double hyphenated words and changes the hyphens to non-breaking hyphens

    That way – no “over-the-mountain” or other instances are ever broken over a line, it won’t even go to

    “over-” “the-mountain”

    or

    “over-the-” “mountain.

    It will always be on the same line.

  5. @Eugene, I do agree with you that GREP styles slow down a bit, but as you know we are trying to automate the tasks, as David’s usually says “…in an efficient way”. Your idea is perfect but it will also eat up few manual steps each time, the text changes.

    @David, I think you will agree with me that Jongware’s code is much better than you. As the code given by you only works for two or three hyphenated words, whereas there is no limit with Jongware’s code. I tried and it works perfect on “I-Love-Indesign-GREP” and “The-most-powerful-feature-in-InDesign”.

    @Jupiter, please try again, there isn’t anything wrong with Jongware’s code :)

    @Celeb, your style is the best one. I’ll keep it handy.
    Thanks to all.

  6. I agree that an automated way is better to avoid it at all costs. If you’re happy with the performance side of things by all means use it.

    At the end of each document I create I have a series of GREP find and replace I perform. Which takes maximum 5 minutes.

    Dash to En-dash
    on a find by find basis as it tends to find the hyphen on things like “pre- and post-apocalyptic”

    Double Punctuation
    ([[:punct:]])\1

    again find by find – as it picks up things like “(Sample (text))”

    Mutli-Space to single space

    Space before punctuation
    \s+([,.;:”’\/)])
    $1

    as some people tend to do things like “… end of sentence .” or “However , a space before a punctuation is incorrect .”

    Those would be my main find change at the end of a chapter/article/section etc.

    I find it far more efficient to work this way.

    Although – if the GREP solution works in this instance – by all means go for it.

  7. These are all great solutions, as long as your goal is to not have any compound words break at all. While I don’t mind compound words breaking AT THE HYPHEN, I don’t want compound words adding a hyphen in order to break. So,

    twentieth-
    century

    is OK. But

    twentieth-cen-
    tury

    is not. These solutions are somewhat limiting, as they always keep the compound word together by forcing it to the next line, which may solve one problem, but create another, like a very loose line. In my example above, I would prefer the break to happen between the hyphen after “twentieth” and “century”.

    I have found that applying No Break can be finicky as well. Applying No Break to just “century” will bring “twentieth” with it, which, in my opinion, it shouldn’t (Bug?).

    For these reasons, I have stayed away from automated solutions to compound words and walk the lines instead, breaking as necessary. And no, I don’t get paid by the hour.

      • I did David, and I should have made it more clear, but this is what I meant when I was talking about the “finicky-ness” of No Break. My experience is that even when using your combination GREP Styles to apply No Break to the words before and after the hyphen, ID will pull both words down to the next line.

    • Jamie: Hm. I tested this and it worked! But now… you’re right… I cannot get it to hyphenate if the words on either side of the hyphen are set to No Break. I wonder if that is a bug or by design. And I wonder how I got it working originally.

      • David/Jamie, Have you both tried applying the No-Break to the word after the hyphen. I think this is what you were looking :)

        (?<=‑)\w+

        Just kidding, David. Your code is fine except you need not to apply the No-Break to the before and after the hyphen. Just apply to the word immediately following the hyphen and InDesign will avoid hyphenating the word at the end. I tried this on CS5 and it works well.

        Apart from this if you only apply to the word before the hyphen:
        \w+(?=‑)

        then, InDesign will definitely pull the whole compound word onto the next line. Just try it.

      • David, I just ran some tests with CS4 and found the exact same solution: *excluding* the hyphen with your lookbehind and lookahead works for me — all the time.

        Be aware that if you test several compound words in a single paragraph, InDesign may simply have decided to re-format the entire paragraph! That may be skewing your conclusions. I do my tests on a large, real-world document and it seems to work as expected.

  8. David: I’m not surprised that you got it to work once. That has been my experience as well—once in a while, it will work, but 9 times out of 10, it won’t, and it will bring both words down to the next line.

    The more I think about this, the more I think there must be a bug going on. I’m going to file it and hopefully Adobe can fix it, or give an explanation into its behavior.

  9. GREP Styles are great because they’re automatically applied when new text is added, but I accomplish this task with a very simple Find/Change that adds a discretionary hyphen in front of all dashes.

  10. I’ve been battling this problem. I have a compound word that was hyphenating poorly. I wanted it to hyphenate following “self” but it wanted to do this:

    self-in-
    terest

    In this case I didn’t need a GREP solution. It tried all combinations of no breaks, discretionary hyphens, etc. mentioned above.

    I finally solved it by making the hyphen between the two words a discretionary hyphen then it broke like this:

    self-
    interest

  11. If I read this article correctly, the solutions listed above won’t find hyphenated compounds that contain punctuation. I have been using the following for a long time; it finds punctuated words as well. It finds all words that are either before or after a hyphen, and it worlds for over-the-mountain-and-through-the-woods. Like others, I prefer to use a grep find/change than a grep style because I don’t do that much editing after this has been run; but it could be used as a GREP style. I apply a nobreak override, rather than a character style, so I don’t strip out italics.

    (?<=-)[\w’]+|[\w’]+(?=-)

    I hope you find it helpful!

  12. Working with Adobe, it does appear there is a bug, at least in InDesign CC, affecting how compound words break when applying No Break to them. Hopeful we’ll get a fix before too long.

  13. After reading this thread a few times through and trying a few different solutions, I still can’t get my compound words to break on the hyphens:

    over-the-
    mountain

    For me, this is the correct and my preferred style. I have tried everything to get this to break this way, but I cannot. I prefer to use a no-break character style or a no-break override as compared to a GREP style as I want this to be part of every compound work in a book.

    As Jamie points out, this seems to be a bug. Has anyone figured out a way to get compound words to break on their hyphens?

  14. It seems that Adobe changed something with CS6 and CC so that when you have “no break” on the words before and after a hyphen, it keeps the entire phrase together.

    So now, in my Multi-Find Change workflow, I have added a new query that runs right after the one that applies nobreak to hyphenated compounds. It adds a discretionary break after all hyphens. Looks like this (in regular Text find/change dialog, not the GREP one):

    find –
    change to -^k

    That works now, but it requires the extra step.

    I will continue to search for other options. The hyphens do not have “no break” on them, so finding the hyphens and applying something to them doesn’t seem like it will help. I wonder if something can be done to the language definition to fix this? Or can we add a hyphenate here command to the hyphen character in the spelling dictionary? I will play around with this, but would love to hear if anyone else has solved it.

  15. @Matt-

    An investigation by Adobe determined that “it is not a bug but a new behavior that will require modification to the existing behavior.” They’ve added it to their backlog. Translation = keep searching for workarounds…I don’t think it will get fixed, errr, “modified” any time soon.

  16. GUESS WHAT! I was talking with an Adobe InDesign team member today and told him about this problem with hyphenated compounds and the hyphens not working when both words had “no break” applied . . . and he said, “Oh, yeah, that’s because as of CS6 we switched from the Proximity hyphenation dictionary to Hunspell.” All you have to do is switch your hyphenation dictionary back to Proximity in preferences and it should work.

    I tried this, and sure enough, I no longer need to add the discretionary line break after the hyphens.

    Still need to use the GREP find/change to apply “no break” to each word.

    Now I need to figure out what we lose by switching back to Proximity – why did they change in the first place?

  17. After some serious hunting around for a way to figure this out, I finally came across this gem, which seems to have done the trick for me. You need to create a “no language” character style (set that in advanced char. formatting field), and apply it using positive lookbehind and positive lookahead in a paragraph style, which applies the new character style to the first upper / lowercase bit of the word excluding the hyphen, and then the last bit of the word, also excluding the hyphen. You can set that character style to have a different colour or something if you want to check how it’s working. So in my main “body” paragraph style I have:

    Apply:
    “no-lang-compound-word-break”
    To:
    (?<=-)[\l\u]+

    And another:
    Apply:
    "no-lang-compound-word-break"
    To:
    [\l\u]+(?=-)

    Kudos to JohnB here for thinking of it, it hadn't occurred to me to use "no language": http://graphicdesign.stackexchange.com/a/62586

  18. Alex,

    The reason I use a local override for hyphenated compounds, rather than a character style, is that I want to be able to apply other character styles to some words that may be part of a compound (for instance, italics). If I understand correctly, you’re using a GREP style, right? So, if you have a compound that has an italic character style on it, does the GREP style applying the “no language” style co-exist with the italic character style?

    I know that you can only apply one character style at a time manually, but that you can have two if one is done manually and another is done via nested styles or GREP styles . . . but I don’t know how well the two coexist. Do we have a rule to follow for which takes precedence, or how multiple character styles interact?

    • Whatever works for you, but you can indeed apply a character style (e.g. italics) manually onto the word. I believe the grep style would go “under” the main character style applied to a selection by the user. I agree it’s not a terribly neat workaround, and it depends on your workflow, but I work on a lot of books where this is extremely handy! In an ideal world, Adobe would recognise that hyphenating words at other points is considered bad typographic practice and would fix in future, but this will do me for the next X years until that happens…

    • Yes, there is an internal hierarchy InDesign uses when applying multiple character styles to the same text (manually, GREP style, Nested style, Line style, Drop Cap style).

      Usually, though, the formatting will be a combination of all character styles. Example: a line style makes the first line all caps but a nested style makes the first word bold, results in that word being bold and all caps. But if there’s a conflict — a line style makes the text green but a nested style makes the first word red — then the hierarchy takes over. Nested styles are higher up in the pecking order, so in that example, the first word would be red, and the rest of the line, green.

      Here’s the hierarchy — what beats when there are conflicting attributes::

      1. Manually-applied character style (select text and click a character style)
      2. GREP Style
      3. Drop cap
      4. Nested style
      5. Nested Line style

  19. My tactic is to put “no language” instead of “no break”, that way the double hyphen desapears….
    Using GREP to find these words, finds everything that has hyphens…. and it’s not the case for me…. my books have too many hyphened words… specially in portuguese.

    • If you use the “no language” method and also use spell check, be aware that spell check will skip all “no language” text.

Leave a Reply

Your email address will not be published. Required fields are marked *