is now part of CreativePro.com!

GREP in InDesign Excerpt: Learning by Example

15

This is an excerpt from GREP in InDesign  by Peter Kahrel.

GREP by Example

It may be easiest to see how GREP works with a few examples. But first a quick comparison with InDesign’s Text search, which is useful because the comparison reveals GREP’s strength.

InDesign’s text search is used mainly for searching literal text: when you search for cats, you find just that (disregarding settings such as case-sensitivity and whole-word only). But in the Text tab you can use some wildcards: ^9 finds any digit, ^$ stands for any letter, ^? matches any character, and ^w is used to find any whitespace. Thus with Figure^w^9 you search for the literal text Figure followed by any space, followed by a digit. When you use any of these wildcards, you’re no longer looking for literal text, but for a pattern. The four wildcards in the Text tab are useful, though rather limited – for instance, you can’t use them for replacements, only for searching.

In contrast, with GREP you mainly look for patterns. For example, you can look for series of digits rather than for a single digit. Figure \d+ matches the literal text Figure followed by any number (2, 34, 121, etc.): \d stands for digit, the plus sign means “at least one”. The GREP expression \u\l+ finds an upper-case letter \u followed by one or more lower-case letters \l+. GREP also deals with simple alternation. For example, to find both centre and center, search for cent(re|er); alternatives are separated by pipe symbols (?|?). Optionality adds more flexibility: to find both the singular and plural forms of these alternatives, search for cent(re|er)s?. The question mark says that the s should be matched if present, otherwise not. This simple GREP, then, finds centre, centres, center, and centers.

More flexibility is offered by so-called character classes. For instance, p[aeiouy]t matches p followed by one vowel, followed by t, so you’ll find pat, pet, pit, pot, and put (though you’ll find pyt in python and pat in spat as well; we’ll see later how to do whole-word-only searches). This example also demonstrates how you can define your own wildcards: here we defined a wildcard “vowel” by enclosing all vowels in brackets: [aeiouy]. Other homemade wildcards could be “ascender letter” [bdfhkl] and “descender letter” [gjpqy].

GREP expressions can be used to style text patterns. For instance, to apply a character style “smallcaps” to any sequence of two or more capitals, enter \u\u+ in the Find What field, leave the Change To field empty, and specify the style in the Change Format field. Again, \u is the wildcard for upper-case letters, and the plus stands for “one or more”, so \u\u+ matches strings of at least two capital letters.

To demonstrate replacement with wildcards, let’s return to the Figure example. To replace the word Figure with Map when it is used to refer to an illustration – that is, when it is followed by a digit – search for Figure (\d) and replace with Map $1. \d matches any digit, and the parentheses surrounding \d indicate that the contents of the parenthetical should be captured. The string $1 in the replacement string corresponds with what was captured in the search string, so that Figure 1 is replaced with Map 1, Figure 2 with Map 2, etc.

As a last example, and to show that simple expressions can achieve a great deal, we’ll take this seemingly difficult task: you have an address list that contains, among other things, an email address for each person. Your task is to add the word Email: before each email address. Let us assume for the moment that @ is used for nothing else, so that any line that contains the @ symbol is an email address.

What we need now is two expressions that combine to say “If a line contains an @, add Email: at the beginning”. The required expressions are shown in Figure 1. The expression used here to find the beginning of a paragraph that contains an @ is indeed as simple as ^(.+@). To insert Email: at the beginning, simply use it as the replacement text followed by $1, which stands for whatever was captured by the part of the search expression in parentheses, which is in each case the text from the paragraph start up to and including the @.

You can see this in the highlighted part in the document in Figure 1. The figure shows that we’ve done the first address and are about to change the second one. We’ll not go into the details right now; the rest of the text will make clear what happens here (briefly, ^ stands for “beginning of paragraph”, and .+@ says “one or more (+) of any character (.) up to and including an @”).

Figure 1. Adding text to a paragraph depending on its contents

As I said, the details will become clear later in the text; the point of the example is to show that short and simple expressions can achieve a lot.

  • Holly says:

    Even though I’ve had to read through it a few times this was very helpful to me in understanding the basic uses of GREP.

    • Anne-Marie Concepcion says:

      Good to know, Holly! The feature comes with InDesign but the name doesn’t lend itself to understanding. ;-)

  • Vinny 38 says:

    “For instance, to apply a character style “smallcaps” to any sequence of two or more capitals, enter \u\u+ in the Find What field…” I’m afraid this example may be confusing, since it will not change the output. A “small-cap” capital will still be a capital letter, right? ^^ Possibly turning this sequence into bold or red would have make more sense here, wouldn’t it?. That said, I did buy THE Bible last week, and don’t regret it. I’m not a GREP newbie, but it already has increased my knowledge tremendously! My best regards to Peter Kahrel :-)

  • Peter Kahrel says:

    Hi Vinny. Thank you for your kind words. As to the smallcaps thing, when you apply smallcaps to upper-case letter, they appear as smallcaps, which are not the same as caps. I used the smallcaps in the example (rather than bold or italics) because smallcapping is something that comes up in forums regularly.

    • Vinny 38 says:

      Hi Peter. Thanks for your answer. I am very sorry but I still don’t get it…
      Maybe my English is not good enough, or maybe it works differently in CC versions (I use CS6) but applying small caps to an uppercase character, such as A (U0041) doesn’t change its appearance. It still is an A(U0041), right? You need to change it to lowercase at first. I feel like I’m missing something really obvious here. [:-/]

      • Theunis De Jong says:

        Peter is wrong (!!) :)

        (Though it may just because of his use of a character style.) Vinny, you are correct: the regular Small Caps attribute only changes lowercase characters and uppercase remain unchanged. The feature you are looking for is All Small Caps – an OpenType feature, not an InDesign one, and only available when the font supports it.

        I think Peter’s ‘character style “smallcaps” to any sequence of two or more capitals’ is to create fake Small Capitals, with a scaling defined in the character style.

  • Peter Kahrel says:

    The character style I had in mind was set up to use All Small Caps, the OpenType feature. Not any scaling to fake smallcaps.

  • Peter Kahrel says:

    But clearly something to be clarified in the text :) Keep’m coming!

  • Robin Van Kuijk says:

    Great stuff those GREPs. I’ve been using them for some years now, so I thought buying the book GREP in InDesign by Peter Kahrel could be very useful. Alas, I’ve ordered AND paid the book on the 9th of November, and I’m still waiting for it. I’ve emailed to Indesignsecrets on the 18th of November, my complaint was even given a ticket number, and i’m STILL waiting.

    So perhaps there is someone here who might be able to send me the book? Would be very handy.

  • Rocadero says:

    So, what would be the GREP for text that includes the @ but is NOT an email address? It seems to me it would need to check for no spaces after the @. How would you write that?

    • Peter Kahrel says:

      I don’t think it’s possible to find non-email @ with a single expression (not sure how the absence of spaces after @ makes it a non-email @). You’d probably have to look for email addresses containing @, mark them (strikethrough, e.g.), then look for all @ that are not struckthrough.

      The example quoted above doesn’t pretend to be a universal e-mail finder, it’s just an example of a simple but useful GREP expression used in a well-defined context. We’re only on page 4!

      • Rocadero says:

        Hi Peter,
        “(not sure how the absence of spaces after @ makes it a non-email @)” Because the convention of e-mail addresses is that the server name follows the symbol with no spaces. I know that your example isn’t a universal e-mail finder, I was just curious as to how this might be expanded to be one. Thanks for the reply.

  • Peter Kahrel says:

    Oh, ok. To find a @ followed by non-space characters, look for

    @\S+

    \S (capital s) stands for non-space. To include preceding non-space characters, use

    \S+@\S+

  • >