November 27 2007 • 12:23 AM

FindBetween: A Useful GREP String

Hey, with the help of Peter Kahrel’s GREP in InDesign CS3 book, I was able to figure out how to do something in InDesign that I’ve always said was possible to do with GREP, but didn’t really know how. Not only is it a handy Find/Change action, but it’s very easy to modify for different situations that designers are often confronted with.

The action is this: Find [some text] that’s in between [whatever], and then apply formatting to just the text, not what’s surrounding it. One example would be formatting parenthetical text without formatting the parentheses themselves: turn (this) into (this) and (that other thing) into (that other thing) all at once, throughout the story or document, with a simple click.

Here’s the GREP expression that finds one or more words of parenthetical content, but doesn’t include the parentheses themselves in the found instances:

(?<=\().*?(?=\))

If you copy and paste that GREP string into the Find What field in the Edit > Find/Change > GREP panel of InDesign CS3, and then click the Find First button, InDesign selects the first instance of parenthetical text, but not the parentheses themselves. Cool!

That means that back in the Find/Change > GREP panel, you can change just the found text’s formatting by specifying what you want in the Change Format area (leave the Change To text field blank). Here I’m specifying that InDesign should italicize parenthetical text, but leave the parentheses untouched:

1-findbetween.gif

Easily Modify the Search
Below is the same GREP string, but this time the characters that govern what the “surrounding items” should be are highlighted in red so you can see what I’m talking about:

(?<=\().*?(?=\))

The first instance is an opening parenthesis, which needs to be “escaped” with a backslash so InDesign knows it’s a literal parenthesis, not some more GREP code. The second instance is the closed parenthesis, again preceded by a backslash to escape it.

I typed these in myself, but it’s simpler to let InDesign drop in the special code that GREP needs. Just choose the special character you want from the dropdown menu next to the GREP Find What field:

1-findbetween2.gif

So, to find text surrounded by a pair of em dashes (but not the em dashes themselves) you’d change both instances of the red characters so it finds an em dash instead of an opening or closing parenthesis. According to the dropdown menu, GREPese for an em dash is a tilde followed by an underscore:

(?<=~_).*?(?=~_)

Here’s the string to find text in between any kind of double quotes, nice and simple:

(?<=).*?(?=)

In case you’re wondering, Peter says this type of GREP search is called a “Lookaround” — a combination of a Lookahead and a Lookbehind. All the “look*” type of GREP searches share one thing in common … they let you tell InDesign to find some text based on a character that precedes or follows it, but not to include that character in the found text itself.

Save the GREP String

If you think you’d find this useful, don’t forget to click the disc icon in the GREP panel of Find/Change so you can save it and recall it from the dropdown menu of saved searches from now on. I called mine “FindBetween.”

Just for fun (and to test out sharing saved searches), I uploaded my FindBetween search … it’s a tiny XML file …. in case you want to download it. Drop it into the GREP folder inside your Preferences > Adobe InDesign > Version 5.0 > Find-Change Queries folder on your hard drive.

30 Responses discussing this post. Add yours below.

  1. Eugene
    November 27th, 2007 • 1:09 am • Link

    Hey that’s a neat GREP. Very useful. That is certainly better than how I was approaching the problem.

    May I ask though, what would happen if you had a parentheses inside parentheses though? What if the text read something like

    (This is my example (what would happen to it)) ?

    That can appear in math too

    ($max-depth*k) - (($depth+1)*$max-depth - 1)

    I was often inclined, after I tried a similar GREP search for some text, and it didn’t work because of the parameters that I decided to just select all the text inside the parentheses with something like

    (\(.+\))

    and replaced with

    $0

    Inside Parentheses Italic

    then find [\(\)]
    Inside Parentheses Italic

    and change that to Roman or no character style.

    Sorry if my syntax is off, but I don’t have CS3 available to me right now to test it.

    That is really cool though, I must give it a go.

    Also, would it be possible to reverse engineer this so that that it would only find the parentheses and not the text inside? I often have text that comes across and it’s in italic, that’s fine, but the [ and ] are in italics also. It’s for a [Emphasis Added] that’s in italic. So any instance of this could find the [ before Emphasis Added and find the ] after Emphasis added and change it to Roman, I don’t really want to change any other instances that may occur throughout the text.

    That’s cool though, I like it very much. Thanks.

  2. November 27th, 2007 • 2:55 pm • Link

    Eugene, the GREP expression doesn’t work for parens inside of parens. As soon as it encounters the first closed parens following an opening one, it selects what’s between. If you made it italic it would look like this:
    (Some text (more text) inside parens)

    Perhaps there’s another way to write the expression to do what you need.

    The reverse engineering of a lookahead or lookbehind unfortunately doesn’t do what you want either. The example I wrote about is called a “positive” lookaround because it finds/matches based on the surrounding characters. You can easily turn those into “negative” lookarounds (by swapping out the question mark for an exclamation point in both sides of the example, I believe); which makes InDesign find text not preceded/followed by the specified surrounding character.

    I think you could do what you want with two GREP searches: Find an opening bracket (followed by the phrase Emphasis Added) … the part in parens would be your positive lookahead. Then you could change just the formatting of that opening bracket. Repeat the GREP F/C with a positive lookbehind: (the phrase Emphasis Added) preceding a close bracket.

    The $0, $1, $2 approach is great for swapping positions of found text, but InDesign doesn’t let you apply formatting to just one of those references. I’m pretty sure that the only way to apply formatting to just some of the found text is with the Look* sort of GREP searches.

  3. Nigel Chapman
    November 27th, 2007 • 3:40 pm • Link

    You cannot match nested parentheses with any regular expression. This is a distinctive limitation on the class of strings that can be matched by such expressions. If you need to know why, you would have to go into the formal language theory underlying the notation.

    There have been proposals for ‘recursive’ regular expressions, which would allow them to be used for such nested constructs, and some programming languages support such things, I believe. However, InDesign doesn’t, so the only way to approach such tasks is with a script.

  4. Eugene
    November 27th, 2007 • 4:28 pm • Link

    Anne-Marie, thank you for your reply. I am already using a two-step-grep to get what I need, as I was doing with the parenthesis also. But you have a one-step-grep which is wonderful and obviously that is the goal.

    Thanks

  5. November 27th, 2007 • 10:23 pm • Link

    This is both very useful and instructive, Anne-Marie — thank you! And you’ve even learned, finally, to give your visitors real, downloadable code — who knows, maybe even INX project files are next?! :-)

    Please post more practical GREP tricks like this one now and then.

  6. Nigel Chapman
    November 28th, 2007 • 1:33 pm • Link

    I should have been more precise in what I wrote previously. You can’t match arbitrarily nested brackets with any regular expression, so you can’t, for example, parse arithmetic expressions that way. However, if you know how many brackets there are you can. For instance, it’s easy to adapt the original expression in the article to match two brackets before and after something.

    This isn’t too useful in the present case, though, because although you can use any regular expression as a lookahead, you can only use constant text as look-behind. (Because regular expressions don’t go backwards.) So in your example, you couldn’t put an expression to match a ( followed by any text then another (. You could only do it if you knew the exact text between them.

  7. Bernie
    November 29th, 2007 • 9:14 pm • Link

    I just found out something similar today. I won’t be able to purchase CS3 for a while so my grep work will have to take place outside of InDesign. And since I’m on a PC I can’t use either BBEdit or TextWrangler. On one of the InDesigner postings I saw that someone had suggested UltraEdit as a PC editor with powerful grep capabilities so I downloaded it yesterday and worked with it a little then and today. After getting help on their forum I now have a pretty good template. The examples below are self explanitory.

    sample text:

    NL03–1:30–2:30 pm
    John Smith History of Surgery Lecture:
    John Smith: First Generation American, First Surgeon General
    Lecturer: John Smith, MD, FACS, Chicago, IL
    Sponsored by the Advisory Council for Neurological Surgery

    search for:

    %^(NL*pm^p^)^(*:^p^)^(*^p^)^(Lecturer:*^p^)^(Sponsored*^p^)

    replace with:

    ^1^2^3^4^5

    end result:

    NL03–1:30–2:30 pm
    John Smith History of Surgery Lecture:
    John Smith: First Generation American, First Surgeon General
    Lecturer: John Smith, MD, FACS, Chicago, IL
    Sponsored by the Advisory Council for Neurological Surgery

    Now the only problem is (despite the fact that I turn on import options and choose ansi pc) when I import this into InDesign I see the tags as text. So that’s the next thing I have to figure out.

    Even though I haven’t got this exactly working yet I thought those with CS2 on PCs might find this interesting.

  8. Bernie
    November 29th, 2007 • 9:23 pm • Link

    sorry the end result didn’t show up as it should.

    before line 1 is
    a greater than symbol pstyle:A less than symbol
    before line 2 is
    a greater than symbol pstyle:B less than symbol
    before line 3 is
    a greater than symbol pstyle:C less than symbol
    before line 4 is
    a greater than symbol pstyle:D less than symbol
    before line 5 is
    a greater than symbol pstyle:E less than symbol

    I wouldn’t actually name my styles with an Alpha sequence in a large document but it works to figure out the template

  9. Bernie
    November 29th, 2007 • 9:31 pm • Link

    oh man sorry again the replace with didn’t show up correct either. That should be:

    less than symbol pstyle:A greater than symbol^1
    less than symbol pstyle:B greater than symbol^2
    less than symbol pstyle:C greater than symbol^3
    less than symbol pstyle:D greater than symbol^4
    less than symbol pstyle:E greater than symbol^5

  10. November 29th, 2007 • 9:32 pm • Link

    Hmm, the end result looks the same as what you started with.

    Are you exporting styled text from ID as InDesign Tagged Text? Or are you working with a plain text file and importing that. (I can’t figure out what “tags” you say ID is reading when you import it.)

  11. Bernie
    November 29th, 2007 • 9:36 pm • Link

    never mind i screwed up my explanation of the replace again

    it should be

    greater than symbol pstyle:A less than symbol
    greater than symbol pstyle:B less than symbol
    greater than symbol pstyle:C less than symbol
    greater than symbol pstyle:D less than symbol
    greater than symbol pstyle:E less than symbol

  12. Bernie
    November 29th, 2007 • 9:38 pm • Link

    I’ve totally screwed up my posting Anne-Marie. None of the greater than or less than symbols showed up on your site.

  13. Bernie
    November 29th, 2007 • 9:40 pm • Link

    i can post a link to the exact posting in the forum where i received help if that’s ok with you guys.

  14. Bernie
    November 29th, 2007 • 9:44 pm • Link

    and to answer your questions I did a search and replace on some raw text in UltraEdit using grep and then imported it into InDesign. I managed to come up with text that indicates where paragraph styles should go but when I import the text the coding comes in as text.

  15. November 29th, 2007 • 9:55 pm • Link

    Go ahead and post the link Bernie. After it’s up I’ll clear out your attempts to write it.

    I believe you need to use HTML entities to get left and right brackets in a comment (see the tip under Leave a Reply).

    If you want ID to recognize InDesign tagged text make sure it begins with the right preamble. Export a story from ID as tagged text and open that up to see the preamble, then copy/paste into your file. I think you only need the first few lines (you don’t need the definitions of the styles if the receiving ID doc already has the styles defined).

  16. Bernie
    November 29th, 2007 • 9:55 pm • Link

    ok found the problem that I was having on importing

    i had to ad the following as a first line to the tagged text i’m importing

    greater than symbol ASCII-WIN less than symbol

    now InDesign reads the tags

  17. Bernie
    November 29th, 2007 • 9:58 pm • Link

    take a look at this page and it’ll be cleaer if the url will post correctly:

    http://www.ultraedit.com/index.php?name=Forums&file=viewtopic&t=5614

  18. Bernie
    November 29th, 2007 • 10:00 pm • Link

    ok that works. so the only thing a PC CS2 person needs to do is ad that bit of coding i indicated in posting 16

  19. Bernie
    November 30th, 2007 • 2:24 pm • Link

    Looks like they took down my posting at the UltraEdit site. I’ll e-mail my grep template to you guys and you can decide how to best pass along the information. What I’m giving you is the PC version of what Michael Murphy posted regarding pattern searches using grep way back in episode 9 of the InDesigner.

  20. December 14th, 2007 • 4:03 pm • Link

    This is coming close to being a solution for something I’m trying to automate. Perhaps you can help?
    I, like many other people, like to put a thin space before and after an em dash. Currently it’s awkward, requiring three steps. I wanted to find a way of either having InD CS3 do it automatically when I insert an em dash (perhaps with scripting? Not sure), or to do a find and replace on all em dashes WITHOUT a thin space before and/or after, and to replace all such instances with one that does have a thin space before and after.

    Can anyone here suggest how I can either use the find and replace feature (as outlined in this blog post) to achieve this, or to automate it from the start with a script or something to that effect?

    With thanks,

    Jonathan

  21. Aaron
    December 16th, 2007 • 10:13 pm • Link

    Jonathan, here’s the GREP code you need:
    For ‘Find What’:
    ( )(~=)( )
    For ‘Change to’:
    ?~

  22. Aaron
    December 16th, 2007 • 10:14 pm • Link

    Hey, that truncated my text. Let’s try again.
    For ‘Change to’:
    ~

  23. Aaron
    December 16th, 2007 • 10:15 pm • Link

    One more time. Apparently HTML competes with GREP and wins. So let me just describe:
    In the drop-down menu for ‘Change to’ select White Space/Thin Space, then type $2 (2nd item found, which in your case is your en dash) and then again select White Space/Thin Space.

  24. mike j
    April 15th, 2008 • 3:35 pm • Link

    hello, love the cast and magazine… i’m trying to get the example (?

  25. mike j
    April 15th, 2008 • 3:36 pm • Link

    hello, love the cast and magazine… i’m trying to get the example to work to find text inside () and it says cannot find match… Mac CS3 have text in (), searching document, in the grep find panel… is there something i need to enable…

  26. mike j
    April 15th, 2008 • 3:51 pm • Link

    weird, no grep finds worked, even “any digit” so i restarted ID and now it works - go figure

  27. April 15th, 2008 • 4:48 pm • Link

    Hey Mike, glad you finally got it working! If only life let you restart like software programs or computers. ;-)

  28. mike j
    May 1st, 2008 • 7:12 pm • Link

    So true, anne-marie…

    ok, ran into another issue… doing a non-profit yearbook -yikes, but datamerge is great… but the issue i am having is changing the students 5-digit id number into “their five digits” + “.jpg”… should be easy… i can find them by:

    (\d\d\d\d\d\d).*?(?=”slash”t)
    but i cant figure out how to make the replace work…

    it always replace the characters with whatever i have in the change to box, not the original digits…

    thanks in advance.

  29. mike j
    May 1st, 2008 • 7:20 pm • Link

    ok, i figured it out by just adding different “shift-characters” to the beginning of the string based on bernies post above…

    find: (\d\d\d\d\d\d).*?(?=”slash”t)
    replace: $1.jpg

    think i’m going to now purchase that “GREP in InDesign CS3″ book, as it is really useful. thanks. love your cast and magazine…

  30. S.C.Shaw
    July 12th, 2008 • 4:51 am • Link

    How do you fix a typo when an uppercase character has been typed between two lowercase characters such as:

    HelLo

    to this:

    Hello

    I’ve tried all the variations in this blog archive, been able to isolate the uppercase L but cannot figure how to change it to a lowercase L.

    Note, I have a document where the typist is typing so fast this happens in many words, so I am not just looking for the word hello.

    Appreciate any answer.

Subscribe to the Discussion

Get the ongoing discussion surrounding "FindBetween: A Useful GREP String" delivered to you. Click here to subscribe via RSS.

Leave a Reply

You can use limited HTML tags, such as <em></em> for emphasis/italics and <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong> .

InDesignSecrets reserves the right to edit and/or remove posts and comments.