January 20 2006 • 4:09 PM

Grep Pattern Searching

In Episode 8, I mentioned how I used Grep Pattern Searching in BBEdit to search for patterns in my text, rather than searching for specific pieces of text. The value of this is that, even though the actual text varies throughout my text file, if the patterns are consistent, I can do very complex and powerful search-and-replace operations that keep that variable text intact, while changing elements of the pattern around it.

Here’s an example from real life: A monthly magazine column of new products. Each write-up starts with a company name on one line, followed by a paragraph starting with the words “What’s New” followed by a colon, then a sentence or two of descriptive text about the product. The next paragraph starts with “The Value” followed by a colon, then a few sentences describing the positive attributes of the product for prospective buyers. After that, there’s a line for the company’s web address and another line for their phone number. This is a consistent pattern. But the text itself is not consistent. Every company name is different. Everything following “What’s New” is different, as is everything following “The Value” and all of the URLs and phone numbers.

So how do you search this whole document? The thing to understand is that you don’t search for specific text. Rather, you search the pattern within which the text exists. What you’re searching for is any string of text (the company name), followed by a return, followed by the specific text “What’s New” and a colon, followed by any string of text (the product details), followed by a return, followed by the specific text “The Value” followed by a colon, followed by any string of text (the description of the product’s benefits), followed by a return, followed by any string of text (the web address) followed by a return, followed by any string of text (the phone number).

In BBEdit, that search instruction translates to this:

(.+)\rWHAT’S NEW:(.+)\rTHE VALUE:(.+)\r(.+)\r(.+)

The (.+) means any range of character or characters. The period is any character, and the plus sign extends that to mean any range of characters. The parenthesis around them makes them a sub-pattern. In other words, the whole line above is the pattern, the items in parenthesis are sub-patterns within that pattern. In the above example, there are 5 sub-patterns, each representing variable text. The \r elements refer to returns in the original text.

Now…let’s say I wanted to do a replace operation based on this pattern. I have style sheets in InDesign for each element: Company Name, What’s New paragraph, Value paragraph, URL and Phone. On top of that, I have a character from a dingbat font that I put before the URL and another that I put before the phone number to serve as little icons in the layout. All of this is handled expertly by InDesign’s nested style sheets, but I need to put the text characters (in this case, a lower case “u” for the URL icon and an ampersand for the phone icon) first for the nested style sheet in InDesign to work.

To replace this pattern so that all of my style sheet references are included in the right places and have my icon characters added, I use the following replace instruction:

\1\r WHAT’S NEW:\2\r THE VALUE:\3\r u \4\r & \5
The bracketed “pstyle” elements are tags that InDesign will use to format this text automatically when placed in a document with the corresponding styles, the names of which follow the colon in the bracketed tag. The combinations of backslashes and numbers — \1 \2 \3 and so on — refer to the sub-patterns in the original search pattern. They’re numbered by their order in the search instruction. The text of the What’s New Paragraph is \2 and the text of the Phone Number is \5. They’re the second and fifth sub-patterns in the search. By putting in these backslash-number combinations, you’re telling BBEdit to replace the original sub-pattern with itself. So every word of text that follows “What’s New:” until BBEdit finds a return will be replaced by…ITSELF. It remains exactly the same. Same with the other sub-patterns. The Company Name is replaced with itself, but now it has a tag around it. Similarly, the URL is replaced by itself, but now it is preceded by both the tag for its InDesign paragraph style, and the lower case “u” that will appear in a dingbat font in InDesign, thanks to nested style sheets.In my magazine, I have pages and pages of these product write-ups, so I make sure our editors put two returns between each one when they’re writing in Word, so that I can have BBEdit search for the pattern in every write-up. It’s as simple as searching for the same pattern shown above, but with two returns — \r\r — added at the end, like so:

(.+)\rWHAT’S NEW:(.+)\rTHE VALUE:(.+)\r(.+)\r(.+)\r\r

Likewise, my replace pattern would also include those two returns. It would look like this:

\1\r WHAT’S NEW:\2\r THE VALUE:\3\r u \4\r & \5\r\r
The only thing left is to add one little bit of information to the very first line of this text file:

This is all InDesign needs in addition to the bracketed “pstyle” tags to completely format an unlimited number of these product write-ups the instant they’re placed (using the Place function, not copy-and-paste) in an InDesign file. All of this text will come in using your established style sheets, 100% formatted without you having to select any text or apply any style sheets in InDesign.

One more thing…if this sort of thing is something that you do over and over, like I do, for a magazine or other regular publication…you can save your search-and-replace patterns in BBEdit for future use.

Makes you want to go out and start finding patterns in all of you text, doesn’t it?

30 Responses discussing this post. Add yours below.

  1. The InDesigner
    January 28th, 2006 • 11:21 am

    A quick additional note on this
    topic, and to my mentioning in Episode 8 that I didn’t know of a
    Windows equivalent text editor that does pattern searching: I got an
    e-mail from a listener about an application called UltraEdit that
    does regular expression searches. I haven’t used it myself, but if
    you want more info on the product, you can read about it on the
    manufacturer’s web site at rel="nofollow">http://www.ultraedit.com. I’d like to thank Adam for
    that info.

  2. jandeman
    February 17th, 2006 • 12:47 pm

    Thanks for the info.?r?nCan you also use cstyle (character style or
    wahtever the code is) in this working method ?

  3. jandeman
    February 17th, 2006 • 12:52 pm

    Hello,?r?nA question: can you also integrate characterstyles while
    placing text ?

  4. The InDesigner
    February 18th, 2006 • 10:14 am

    Yes, you can also integrate
    character styles into Grep Pattern searches. I didn’t mention it in
    the original post, because applying character styles manually
    doesn’t usually follow a specific pattern. If you have character
    styles that always fall in a specific place (for example, at the
    beginning of a paragraph or after the first sentence), it’s better
    to nest those character styles inside of the paragraph style by using
    the Nested Style Sheet settings (see TARGET="_blank">Episode 11 which explains how Nested Style Sheets
    work). That way, you don’t even have to reference the character
    styles. They’ll be “built in” to your paragraph style.?r?n?r?nIf you
    DO have a specific reason for applying a character style into your
    source text, the syntax is as follows:?r?n?r?n<cstyle:Your Style Name>your text
    here<cstyle:>

  5. jandeman
    February 22nd, 2006 • 4:42 pm

    Thank
    you for the info and sorry for posting the same question twice. My
    mistake.

  6. jandeman
    March 6th, 2006 • 5:57 am

    I’ve
    read your remarks about character styles and nested Styles. But I was
    thinking about applying styles to text within one paragraph without
    fixed following. For instance - some text within a paragraph has to
    be bold, or another font, some other text has to have a color, etc;
    but these applyings don’t come in the same following. I can say to
    my relation to place a certain code before and after these ‘text
    modifications’ so that I can change al of this with pattern
    searching.?r?nOr maybe this is not the wright tool for this???!!!

  7. The InDesigner
    March 7th, 2006 • 6:27 pm

    Without seeing a sample of exactly
    what you’re working with, it’s hard for me to give a concrete
    answer to this question. I’ve sent an e-mail to you with some
    specifics that might help. But the best way for me to answer
    accurately is to see a sample of how your text needs to be
    formatted.

  8. Prlwytskofski
    July 2nd, 2006 • 5:18 pm

    Thank you for this explanation. Two years ago i tried to
    get this automated pattern thing going. I gave up by lack of
    information. Tried to export ‘tagged text’ from indesign and go
    with that. Didn’t work. Today i learned why.?r?n?r?nAs you explained
    to JanDeMan to use your text here i used that to go with. Didn’t
    work… Looking over the text i noticed something. All sharp brackets
    were escaped. your text here looked like ??your text here???r?n?r?nMy
    text editor tried to be smart. As soon as i told it not to touch that
    it worked. I didn’t noticed before. As said, i worked with an export
    out of Indesign, and my editor changed the tags as soon as i edit the
    text. I didn’t know better or it should be /.?r?n?r?nI finally can
    use my scriptable text-editor with grep-pattern to change text
    between curved brackets into italic, capitalized abbreviations to
    small-caps, certain names to bold. Again thanks in a bundle!

  9. Prlwytskofski
    July 2nd, 2006 • 5:20 pm

    Humm, all tags were removed in last post. I’m afraid you
    keep puzzeld about what i tried to explain…

  10. The InDesigner
    July 2nd, 2006 • 9:55 pm

    I couldn’t quite tell if you were
    saying that the special characters in my post weren’t coming through
    properly, or if those in your post didn’t come through. Let me know
    if this isn’t working for you and I’ll try to clarify.

  11. Prlwytskofski
    July 3rd, 2006 • 2:13 pm

    Okay, i should wrote ‘Humm, all tags were removed in MY
    last post. I?m afraid you keep puzzled about what i tried to
    explain’?r?n?r?nIn your post those hooked parenthesized (like
    around pstyle:) didn’t cane trough in my post. Maybe thay also
    don’t come trough in regular mail. Because those are also used in
    html, and can be dangerous if used in posts. So i have to try to
    explain in un-visualy in a for me non native
    language.?r?n?r?n(English is as said not my native language, and
    therefore i did not understand the manual about tagging text. Thats
    why i started of whit an tagged text exported bY Indesign itself.)
    What happened to me, i used an editor witch assumed the tagged text
    exported by Indesign was html or xml, Accordingly it immediately
    ‘corrected al the errors’ resulting in a faulty tagged text. That
    faulty text, i used to do some experimenting withpattern search, and
    putting formatting on it. As i started of faulty, my hard work never
    payed off.?r?n?r?nThanks to you explaining i suddenly (after two year
    on and off trying) saw what was going wrong. And now -by a little
    hear out of an elephants tail- (literal translation of a dutch
    saying) i am finessed making a script i expect safes me about half
    the time this specific job takes. (putting format to specific
    patterns of chars). Just by dropping a bunch of text-files on that
    script thy receive about 95% of their formatting.?r?n?r?nThe
    texteditor i use is Tex-Edit Plus. It treats grep tiny bit different.
    Instead-of /1 /2 /3 it uses ^1 ^2 ^3 to get hold of subexpressions.
    If someone is interested, i can put my (apple)script online.

  12. The InDesigner
    July 4th, 2006 • 10:58 am

    I’m glad to hear that this method
    has cut your work time in half. I’ve put grep to work everywhere I
    can for my job, and it has reduced hours of work to mere minutes.

  13. Prlwytzkofski
    August 5th, 2006 • 6:09 pm

    At first it seems going as a rocket. I must do something
    wrong tough. Using Grep to make a tagged-text it seems to work.
    Importing it in indesign, all ‘cstyle’ elements will do as
    expected, but ‘pstyle’ however seems to gets lost.?r?n?r?nThe
    imported text got no paragraph style applied to it what so ever Not
    even ‘Basic Paragraph’. Below a part of the tagged
    text.?r?n?r?n[code]?r?n?r?n?r?nINGEZONDEN MEDEDELING, van onze
    correspondent facilitaire zaken?r?n?r?n(Heer Bommel en de Hopsa’s,
    BV 147, 8079)?r?n[/code]?r?n?r?nNeedless to say the
    Indesign-doccument got a paragraph style called ‘Brood (artikel
    tekst)’ and the import generates no error.?r?n?r?nThe reason it
    took me quite a bit to realize things did not go that smooth; When
    importing the Tagged-text, the text got the char-style applied witch
    was selected in the pallet at the time of import. More or less
    accidentally my master char-style for the paragraph style I imported.
    ?r?n?r?nI noticed hardly any changes to the text, tweaking the
    settings of ‘Brood (artikel tekst)’. Just moments ago, i realized
    it was not ‘hardly any changes’ but no changes at all…?r?n?r?nEd.

  14. Prlwytzkofski
    August 5th, 2006 • 6:15 pm

    Humm… Your blog does not allow for me to post the
    indesign tag’s. Proberbly becouse the comment system does not use bb
    edit code (if it did, all text between [code] and [code] would show
    literaly)?r?n?r?nSo how can I (we, the users) show a sample in a
    comment post??r?n?r?nEd.

  15. The InDesigner
    August 6th, 2006 • 1:15 pm

    Ed –?r?n?r?nE-mail me the text
    file, and I’ll see if anything’s missing that might be causing your
    problem. The thing about showing code on the site is to “escape out”
    the special characters. For example: to display an opening angle
    bracket, you need type ampersand-l-t-semicolon, and to type a closing
    angle bracket (the l and t stand for “less than”, which makes it
    easier to remember). To display a closing anngle bracket, type
    ampersand-g-t-semicolon (greater than).

  16. Prlwytzkofski
    August 6th, 2006 • 3:05 pm

    [Babble on]?r?nI Did not realized escaping &lt; would
    work. In an earlier post I tried to color-code my text with html,
    that failed to work. That’s why I tried bb-code. The missing code
    was sent to info at your url?r?n[Babble off]?r?n?r?n?r?nLets try
    again in posting the tagged tekst (fingers crossed). It should appear
    between the both
    [code]’s?r?n?r?n[code]?r?n<ASCII-MAC>?r?n<pstyle:Brood
    ??(artikel tekst??)>INGEZONDEN MEDEDELING, van onze correspondent
    facilitaire zaken<pstyle:>?r?n<pstyle:Brood ??(artikel
    tekst??)>(<cstyle:TussenHaakjes>Heer Bommel en de Hopsa’s,
    <cstyle:SmallCaps>BV<cstyle:> 147,
    8079<cstyle:>)<pstyle:>?r?n[code]?r?n[code]?r?n<ASCII-MAC>?r?n<pstyle:Brood
    ??(artikel tekst??)>INGEZONDEN MEDEDELING, van onze correspondent
    facilitaire zaken<pstyle:>?r?n<pstyle:Brood ??(artikel
    tekst??)>(<cstyle:TussenHaakjes>Heer Bommel en de Hopsa’s,
    <cstyle:SmallCaps>BV<cstyle:> 147,
    8079<cstyle:>)<pstyle:>?r?n[/code]?r?n?r?nEd.

  17. Prlwytzkofski
    August 6th, 2006 • 3:08 pm

    Whoops, Must have pressed paste button twice. Sorry for
    wasting the environmental friendly, but expensive recycled electrons
    used in this blog…?r?n?r?nEd.

  18. The InDesigner
    August 6th, 2006 • 3:17 pm

    Ed –?r?n?r?nI think what’s
    breaking your tagging once it’s brought into InDesign is the
    backslashes before the parentesis around “artikel tekst”. Actually,
    I’m surprised you don’t get an error message when placing the text.
    My experience is that InDesign displays a warning that text can’t be
    imported when styles identified with tags do not correspond exactly
    to styles in the document. If your paragraph style is named “Brood
    (artikel tekst)”, your incoming text should not have the backslashes
    in it.

  19. Prlwytzkofski
    August 6th, 2006 • 3:57 pm

    Well Michael, those backslashes before the parenthesis
    were put there by Indesign itself. My paragraph style is indeed named
    Brood (artikel tekst) On exporting it those (also escape?*)
    backslashes appeared.?r?n?r?nBut I tried as you mentioned without
    those backslashes, using exact spelling of the paragraph style in the
    application. Result is the same. paragraph style seems not to get
    imported.?r?n?r?nA thought prang up. What if those backslashes were
    not put there intentionally by Indesign’s export module, but it’s
    an misuse of functions StripSlashes and AdSlashes in the code… A
    bug, or undocumented feature so to speak…?r?n?r?nI’ll try renaming
    al my styles, into not using parenthesis. You will hear the result.
    (should i also not use spaces just to be safe?)?r?n?r?n*?r?nA
    backslash is used as an escape char in a some programming languages.
    It works more or less like the -earlier in this rope- mentioned &
    for HTML. ?r?n?r?nEd.

  20. The InDesigner
    August 6th, 2006 • 4:03 pm

    Spaces are fine in style names.
    Parenthesis can be dealt with, but if they’re causing trouble, try
    removing them and see if it works.

  21. scShaw
    April 14th, 2007 • 12:38 pm

    Great article for understanding how to also use this in CS3. Was wondering if there is a typo in the below paragraph of your example:

    \1\rWHAT’S NEW:\1\rTHE VALUE:\3\ru \4\r& \5

    The bracketed “pstyle” elements are tags that InDesign will use to format this text automatically when placed in a document with the corresponding styles, the names of which follow the colon in the bracketed tag. The combinations of backslashes and numbers — \1 \2 \3 and so on — refer to the sub-patterns in the original search pattern. They’re numbered by their order in the search instruction. The text of the What’s New Paragraph is \2 and the text of the Phone Number is \5.

    Where is the \2 sub-pattern in the example?

  22. Michael Murphy
    April 14th, 2007 • 1:40 pm

    Good catch! It was a typo, and now it’s fixed.

  23. Eugene Tyson
    May 17th, 2007 • 1:54 pm

    I don’t get GREP. I have numbered Paragraphs that the number appears in the running head. I want to search out all the Numbers in my document and replace them with a Char Style so they appear properly in the head. But I don’t know how to GREP it. My Paragraphs consist of 9.4 (tab) Heading, how do I just search for the 9.4, 9.4.1, 9.4.2 etc.? When I originaly did it I searched for \t(.+)\r, but this found 9.4 + (tab). Then I had to search all the tabs and change the Char Style back to NONE so they didn’t show in the Header. Then not only did it find all the numbers but it also found (i) (tab) and (ii) (tab), so I had to change all of them back to Char Style None. I’m sure there is a simple solution?

    Have I gone about this backwards?

  24. Eugene Tyson
    May 17th, 2007 • 2:13 pm

    Sorry (.+)\t was my original GREP search for finding numbers as each number finishes with a (tab) after it.

  25. Eugene Tyson
    May 17th, 2007 • 2:51 pm

    It’s ok, I found a better way to do it. I just nested my Character Styles into my Paragraph Styles so that 9.1 was Character Style 1 and HEADING was Character Style 2. Now all my running heads are perfect, alternating left and right every page. Stupendous.

  26. Michael Murphy
    May 17th, 2007 • 2:59 pm

    Good for you, Eugene. This should be my new strategy…wait long enough and people solve their own problems. :)

    There would have been a way to do this with GREP, but the method you came up with is actually the better and more efficient one. In CS3, you could use the much-improved Numbered List features to accomplish the same thing, but have all of the numbers be generated automatically (even with the decimal formatting).

    If you were using text-based Find/Change in CS2, you would have searched for any digit, a period, and any digit, then put nothing in the replace field, but chosen the appropriate character style in the Change Format area.

  27. Eugene Tyson
    May 18th, 2007 • 7:57 am

    That’s my next task, to number them all manually, it’s such a pain especially when you have 100 level 2 headings. Up to now I’ve been doing it manually, this is an update of last years publications so I’m working from CS2 files in CS3. I couldn’t believe it when it dawned on me how simple the solution actually was, I think it gives me more flexibility too. Now, I’m off to find another problem and solve that too. Oh I love CS3, can’t wait until I have use it fluently. Cheers guys.

  28. da bishop
    March 19th, 2008 • 4:40 am

    NEWS UPDATE:

    GREP is in CS3.

  29. Leona (LeeLee) Weiner
    May 13th, 2008 • 3:23 pm

    I just ordered Indesign CS3 for my MAC. How do I uninstall CS2 or should I?

  30. Michael Murphy
    May 13th, 2008 • 3:33 pm

    There’s no reason I can think of to uninstall CS2. All CS versions of InDesign (and the other suite apps) can co-exist on the same system. They install separately, not as updates to the previous version. This is great for backward compatibility. I personally keep all CS versions on my machine for training and demonstration purposes. If you’ve got the room, it can’t hurt. By default, the OS will open the newest version of the application when you double-click a “.indd” file.

    One argument for not uninstalling it is that, once you do, you can’t re-install an older version when a new version exists on the machine without first uninstalling all of the CS3 apps, then re-installing CS, CS2, and CS3 in that specific order. That’s very time-consuming.

Subscribe to the Discussion

Get the ongoing discussion surrounding "Grep Pattern Searching" delivered to you. Click here to subscribe via RSS.

Leave a Reply

You can use limited HTML tags, such as <em></em> for emphasis/italics and <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong> .

InDesignSecrets reserves the right to edit and/or remove posts and comments.