GREP Tricks: Using LookBehind and LookAhead expressions to convert a SPACE into a NON-BREAK SPACE

Learn / Forums / General InDesign Topics / GREP Tricks: Using LookBehind and LookAhead expressions to convert a SPACE into a NON-BREAK SPACE

Viewing 20 reply threads
  • Author
    Posts
    • #72063
      Allan Shearer
      Participant

      Hi Folks

      I thought I’d share some of my GREP tricks that I find quite useful.

      This one: Use GREP’s “Positive LookBehind” and “Positive LookAhead” expressions to help you ‘glue’ strings of text together – to prevent unwanted breaks.

      Let’s say you have the following which you NEVER want to break:

      140 miles
      10 gallons
      18 kg
      4 days

      Notice that each is separated with a normal ‘space’. Of course, you COULD insert a ‘non-break space’ instead … but … you could also just use GREP command to look for these expressions and CHANGE the normal space into a non-break space (sort of). How?

      First, create a Character Style called “No Break”. The ONLY attribute you are to turn on here is the “No Break” command in the Basic Character Formats page.

      Then, for whichever Paragraph Styles you wish to apply this setting to, add the following GREP command and choose the No Break character style to the Apply Style field, and in the To Text field you can enter the following:

      (?<=\d) (?=mile)|(?<=\d) (?=gallon)|(?<=\d) (?=kg)|(?<=\d) (?=day)

      First, notice the use of the | character in the above. This is an ‘OR’ command. Thus, we can list different expressions to search for and simply separate them with an OR.

      Next, notice the repeated use of the (?<=…) and (?=…). These are called “Positive LookBehind” and “Positive LookAhead” expressions. The character style will NOT be applied to what is found using these … rather, the character style will ONLY be applied to what is OUTSIDE of them. Thus in each case above the ONLY character that is found outside of these LookBehind and LookAhead expressions is the ‘space’ character.

      As a result of the above GREP command(s), whenever “a digit” is followed by “a space” and then the word “mile” is found, that “space” will be applied the “No Break” character style. And similarly with “a digit” followed by “space” followed by “gallon”, etc.

      Now, initially, one might ask: “Why use the LookBehind and LookAhead expressions? Why not just look for: “\d mile”, or “\d gallon”, etc.?

      Well, in the case of “mile”, “kg” and “day”, that will no doubt suffice. However, with “gallon” you may actually WANT “gallon” to hyphenate if necessary. Thus, “10 gal-lons”. This would NOT happen if you applied the No Break character style to the WHOLE string of text found. And that’s why we ONLY choose to apply the No Break character style to the ‘space’ alone. Thus converting it (sort of) into a “non-break space”.

      Neat trick, eh? (there’s my Canadian side kicking-in).

      You might especially want to use this trick with longer words which you definitely want to allow to hyphenate, such as:

      Subtitle A … you can use: (?<=Subtitle) (?=\u)|(?<=Subtitles) (?=\u)
      Chapter 10 … you can use: (?<=Chapter) (?=\d)|(?<=Chapters) (?=\d)
      Section 18 … you can use: (?<=Section) (?=\d)|(?<=Sections) (?=\d)
      Case Study 12-1 … you can use: (?<=Case Study) (?=\d)|(?<=Case Studies) (?=\d)
      etc.

      Notice in the above examples we have used the OR separator so that both the singular and plural of each can be found. Also note that the first example is looking for an UPPERCASE letter rather than a digit.

      Now … just a warning when you use the LookAhead and LookBehind expressions. You will actually see that there are FOUR such expressions:

      Positive LookBehind looks like: (?<=)
      Negative LookBehind looks like: (?<!)
      Positive LookAhead looks like: (?=)
      NegativeLookAhead looks like: (?!)

      In our example above, we were using the **Positive** LookBehind and LookAhead. This means, in our example above: when “a digit” **IS FOUND** before “a space” and the word “day” **IS FOUND** following it … then, apply the Character Style. This is all good. But . . .

      If you were to choose the **Negative** LookBehind and LookAhead, well … you might not like the results, in this particular example. Why? Because … it would pretty much (almost!) change ALL of your “spaces” into “non-break spaces”. YIKES!

      Why? Because it essentially changes the logic in the same expression to: when “a digit” is **NOT FOUND** before “a space” and the word “day” is **NOT FOUND** following it … then … well, you can imagine the mayhem that would unfold! :)

      I can make for an interesting exercise in InDesign, though. When I tried this ‘Negative’ rendition I ended up with every-single-line-hyphenated! That was COOL … but, not practical. (maybe, somehow, that might work well in a Hyphenation Testing document)

      Hopefully you’ll find this helpful.

      Next I’ll write-up another trick I have used the LookBehind and LookAhead expressions to be ever-so helpful.

      Yours,

      Allan

    • #72068
      Eugene Tyson
      Member

      Here’s another little trick to find the same thing :)

      (?s)(\d+\smiles).+?days

      See if you can figure it out :)

    • #72069
      Eugene Tyson
      Member

      You don’t really need the brackets around some of it

      (?s)\d+\smiles.+?days

    • #72070

      Why not just do a search on any digit/space/any letter and replace with “no break” character style sheet?

    • #72072
      Eugene Tyson
      Member

      Because that would find “10 Fake Street”

      What we want to do is find

      10 Miles
      111 Gallons
      1222 Kg
      13330 Days

      The single line (?) allows you to search multiple lines

      Essentially I am saying
      Find any digit with and a space up to miles and carry on searching, over single lines up to days.

    • #72073
      Allan Shearer
      Participant

      Oh sorry … I wasn’t trying to include the line break in that find. I was just presenting a list of examples that I want to find.

      In my case, I cannot simply look for any digit followed by any character (or vice versa). It’s only specific combinations that need to be glued together, where it is an undetermined digit that will be in the combination.

      Allan

      • #72077

        To be honest, Eugene, our proof room would consider “10 Fake Street” together (i.e., not have the “10” ending a line.

        The same rule applies with things such as “10 percent” and things like that. The number always starts the next line in most cases.

        Anyway–that’s the only reason I do it like I mentioned.

    • #72076
      Eugene Tyson
      Member

      Ah ok lol

      Fair enough.

    • #72145
      Aaron Troia
      Participant

      I’m wondering why you pipped and wrote out all the lookbehinds and lookaheads, the lookahead is non-greedy, unlike the lookbehind, so you could’ve done something like this

      (?<=\d) (?=mile|gallon|kg|day)

      or if you need both to be non-greedy, I use this more in HTML/CSS after ePub Export, but you could use Keep (\K)

      \d+\K (?=mile|gallon|kg|day)

    • #72149
      Allan Shearer
      Participant

      Thanks A.A.

      Yeah, I didn’t bother to refine it further. Just hammered out some examples that would be somewhat easy to read, rather than optimising it and possibly making it a little more difficult for a novice (like me!) to read.

      Thanks for the refinement.

      Allan

    • #72152
      Aaron Troia
      Participant

      oh gotcha! Sorry for the nit-picking. I am glad you are posting GREP Tricks, lookbehinds and lookaheads are very powerful and very useful in InDesign :)

    • #72153
      Allan Shearer
      Participant

      >>Sorry for the nit-picking

      Not at all! Quite welcome … that’s how I (we) learn. :)

      I wish I could post images here … cuz I’d love to show what I’ve managed to accomplish using GREP. I really made it work through it’s paces and it was absolutely pivotal to our laying-out of 450+ page books, and GREP helped me create 1000’s of IMAGES. Say what!? Well … sort of. :)

      Allan

    • #88654
      Na Na
      Member

      I’ve been trying to use this grep LookBehind technique without success.

      I’m trying to make the ‘%’ sign a different style (which makes it smaller), but only on instances where it’s after a number. Ie. 45% would have a small % but ALC% wouldn’t.

      I’d expect that the following would achieve the desired character style to be applied.

      (?<=\d+)%

    • #88660
      Peter Kahrel
      Participant

      You can’t use variable-length lookbehinds — not with the classic (?<= ). But you don't need to check if what precedes the % is a number: a digit will do. So (?<=\d)% would serve your purpose. For interest’s sake, if you really wanted a variable-length lookbehind, use \K: \d+\K%.

      Peter

      • #88662
        Peter Kahrel
        Participant

        No, there shouldn’t be. Sorry, don’t know how that got there.

        P.

    • #88661
      Na Na
      Member

      Thanks Peter.

      In that first example you mentioned, the none-variable lookbehind, is there meant to be a space between the ‘<‘ and ‘=’?

    • #88663
      Peter Kahrel
      Participant

      (Posted in the wrong place.)
      No, that space shouldn’t be there. Sorry.
      P.

    • #14323378
      Sadik Erd
      Member

      Hi,
      I have a question about similar issue if this form alive :)

      I’m trying to use positive lookahead for get the words. Here is my base;
      Name1 surname1, name2 surname2, name3 surname3*, name4 surname4.

      Now i want to get “name3 surname3″ only. (Before *)

      I used this => (?<=(,\s))(.*?)(?=[*])

      But it doesn’t work. It returns ” name2 surname2, name3 surname3″

      Can somebody help me, please?

    • #14323375
      David Popham
      Participant

      One option would be \w+\s\w+(?=\*).That’s assuming that both the name and surname are one-word names, as opposed to something like “von Grebmer” or “O’Connell”.

    • #14323374
      David Popham
      Participant

      Wish this forum allowed editing of replies. Should have said one-word names AND names with no characters other than letters or hyphens.

    • #1250924
      Kevin Arthur
      Member

      “If you really wanted a variable-length lookbehind, use: \d+\K%.”

      Excellent tip for positive variable-length lookbehind, but what if I need it to be a _negative_ variable-length lookbehind? Any way to achieve that as well?

      Like if I need to find the phrase “XYZ” except for all instances where it is directly preceded by “ABCDE” or “FGH”.

    • #1250934
      Peter Kahrel
      Participant

      \K is for positive lookbehind only as far as I know. The classic lookbehind doesn’t handle variable-length arguments, so you can’t use operators like + and *. But in your case you can enumerate the options, in which case you can use the classic lookbehind:

      ((?<!ABCDE)(?<!FGH))XYZ

    • #1251024
      Kevin Arthur
      Member

      Sweet! Works like a charm. Thanks Peter!

      Found myself a good use for this technique already. When doing find&replace for words ending with American style -ize to British style -ise (or vice versa obviously).

      My exception list is probably not complete, but at least this will skip a lot of the words that should keep their -ize ending, speeding up the process. If the GREP proves to stand the test over time, I might even include this in my Auto/ReplaceAll collection.

      Find:
      ((?<!(?i)ass)(?<!Bel)(?<!(?i)caps)(?<!(?i)ma)(?<!(?i)pr)(?<!(?i)se)(?<!(?i) s)(?<!(?i)^s)(?<!(?i)downs)(?<!(?i)mids)(?<!(?i)multis)(?<!(?i)outs)(?<!(?i)overs)(?<!(?i)pints)(?<!(?i)res)(?<!(?i)rights)(?<!(?i)supers)(?<!(?i)unders)(?<!(?i)unis)(?<!(?i)ups))ize

      Replace:
      ise

      This tip made my day. Thank you for sharing!

Viewing 20 reply threads
  • You must be logged in to reply to this topic.
>