January 20 2006 • 4:09 PM

Grep Pattern Searching

In Episode 8, I mentioned how I used Grep Pattern Searching in BBEdit to search for patterns in my text, rather than searching for specific pieces of text. The value of this is that, even though the actual text varies throughout my text file, if the patterns are consistent, I can do very complex and powerful search-and-replace operations that keep that variable text intact, while changing elements of the pattern around it.

Here’s an example from real life: A monthly magazine column of new products. Each write-up starts with a company name on one line, followed by a paragraph starting with the words “What’s New” followed by a colon, then a sentence or two of descriptive text about the product. The next paragraph starts with “The Value” followed by a colon, then a few sentences describing the positive attributes of the product for prospective buyers. After that, there’s a line for the company’s web address and another line for their phone number. This is a consistent pattern. But the text itself is not consistent. Every company name is different. Everything following “What’s New” is different, as is everything following “The Value” and all of the URLs and phone numbers.

So how do you search this whole document? The thing to understand is that you don’t search for specific text. Rather, you search the pattern within which the text exists. What you’re searching for is any string of text (the company name), followed by a return, followed by the specific text “What’s New” and a colon, followed by any string of text (the product details), followed by a return, followed by the specific text “The Value” followed by a colon, followed by any string of text (the description of the product’s benefits), followed by a return, followed by any string of text (the web address) followed by a return, followed by any string of text (the phone number).

In BBEdit, that search instruction translates to this:

(.+)\rWHAT’S NEW:(.+)\rTHE VALUE:(.+)\r(.+)\r(.+)

The (.+) means any range of character or characters. The period is any character, and the plus sign extends that to mean any range of characters. The parenthesis around them makes them a sub-pattern. In other words, the whole line above is the pattern, the items in parenthesis are sub-patterns within that pattern. In the above example, there are 5 sub-patterns, each representing variable text. The \r elements refer to returns in the original text.

Now…let’s say I wanted to do a replace operation based on this pattern. I have style sheets in InDesign for each element: Company Name, What’s New paragraph, Value paragraph, URL and Phone. On top of that, I have a character from a dingbat font that I put before the URL and another that I put before the phone number to serve as little icons in the layout. All of this is handled expertly by InDesign’s nested style sheets, but I need to put the text characters (in this case, a lower case “u” for the URL icon and an ampersand for the phone icon) first for the nested style sheet in InDesign to work.

To replace this pattern so that all of my style sheet references are included in the right places and have my icon characters added, I use the following replace instruction:

\1\r WHAT’S NEW:\2\r THE VALUE:\3\ru \4\r & \5
The bracketed “pstyle” elements are tags that InDesign will use to format this text automatically when placed in a document with the corresponding styles, the names of which follow the colon in the bracketed tag. The combinations of backslashes and numbers — \1 \2 \3 and so on — refer to the sub-patterns in the original search pattern. They’re numbered by their order in the search instruction. The text of the What’s New Paragraph is \2 and the text of the Phone Number is \5. They’re the second and fifth sub-patterns in the search. By putting in these backslash-number combinations, you’re telling BBEdit to replace the original sub-pattern with itself. So every word of text that follows “What’s New:” until BBEdit finds a return will be replaced by…ITSELF. It remains exactly the same. Same with the other sub-patterns. The Company Name is replaced with itself, but now it has a tag around it. Similarly, the URL is replaced by itself, but now it is preceded by both the tag for its InDesign paragraph style, and the lower case “u” that will appear in a dingbat font in InDesign, thanks to nested style sheets.In my magazine, I have pages and pages of these product write-ups, so I make sure our editors put two returns between each one when they’re writing in Word, so that I can have BBEdit search for the pattern in every write-up. It’s as simple as searching for the same pattern shown above, but with two returns — \r\r — added at the end, like so:

(.+)\rWHAT’S NEW:(.+)\rTHE VALUE:(.+)\r(.+)\r(.+)\r\r

Likewise, my replace pattern would also include those two returns. It would look like this:

\1\r WHAT’S NEW:\2\r THE VALUE:\3\r u \4\r & \5\r\r

The only thing left is to add one little bit of information to the very first line of this text file:

This is all InDesign needs in addition to the bracketed “pstyle” tags to completely format an unlimited number of these product write-ups the instant they’re placed (using the Place function, not copy-and-paste) in an InDesign file. All of this text will come in using your established style sheets, 100% formatted without you having to select any text or apply any style sheets in InDesign.

One more thing…if this sort of thing is something that you do over and over, like I do, for a magazine or other regular publication…you can save your search-and-replace patterns in BBEdit for future use.

Makes you want to go out and start finding patterns in all of you text, doesn’t it?

53 Responses discussing this post. Add yours below.

  1. January 28th, 2006 • 11:21 am • Link

    A quick additional note on this
    topic, and to my mentioning in Episode 8 that I didn’t know of a
    Windows equivalent text editor that does pattern searching: I got an
    e-mail from a listener about an application called UltraEdit that
    does regular expression searches. I haven’t used it myself, but if
    you want more info on the product, you can read about it on the
    manufacturer’s web site at rel="nofollow">http://www.ultraedit.com. I’d like to thank Adam for
    that info.

  2. jandeman
    February 17th, 2006 • 12:47 pm • Link

    Thanks for the info.?r?nCan you also use cstyle (character style or
    wahtever the code is) in this working method ?

  3. jandeman
    February 17th, 2006 • 12:52 pm • Link

    Hello,?r?nA question: can you also integrate characterstyles while
    placing text ?

  4. February 18th, 2006 • 10:14 am • Link

    Yes, you can also integrate
    character styles into Grep Pattern searches. I didn’t mention it in
    the original post, because applying character styles manually
    doesn’t usually follow a specific pattern. If you have character
    styles that always fall in a specific place (for example, at the
    beginning of a paragraph or after the first sentence), it’s better
    to nest those character styles inside of the paragraph style by using
    the Nested Style Sheet settings (see TARGET="_blank">Episode 11 which explains how Nested Style Sheets
    work). That way, you don’t even have to reference the character
    styles. They’ll be “built in” to your paragraph style.?r?n?r?nIf you
    DO have a specific reason for applying a character style into your
    source text, the syntax is as follows:?r?n?r?n<cstyle:Your Style Name>your text
    here<cstyle:>

  5. jandeman
    February 22nd, 2006 • 4:42 pm • Link

    Thank
    you for the info and sorry for posting the same question twice. My
    mistake.

  6. jandeman
    March 6th, 2006 • 5:57 am • Link

    I’ve
    read your remarks about character styles and nested Styles. But I was
    thinking about applying styles to text within one paragraph without
    fixed following. For instance – some text within a paragraph has to
    be bold, or another font, some other text has to have a color, etc;
    but these applyings don’t come in the same following. I can say to
    my relation to place a certain code before and after these ‘text
    modifications’ so that I can change al of this with pattern
    searching.?r?nOr maybe this is not the wright tool for this???!!!

  7. March 7th, 2006 • 6:27 pm • Link

    Without seeing a sample of exactly
    what you’re working with, it’s hard for me to give a concrete
    answer to this question. I’ve sent an e-mail to you with some
    specifics that might help. But the best way for me to answer
    accurately is to see a sample of how your text needs to be
    formatted.

  8. Prlwytskofski
    July 2nd, 2006 • 5:18 pm • Link

    Thank you for this explanation. Two years ago i tried to
    get this automated pattern thing going. I gave up by lack of
    information. Tried to export ‘tagged text’ from indesign and go
    with that. Didn’t work. Today i learned why.?r?n?r?nAs you explained
    to JanDeMan to use your text here i used that to go with. Didn’t
    work… Looking over the text i noticed something. All sharp brackets
    were escaped. your text here looked like ??your text here???r?n?r?nMy
    text editor tried to be smart. As soon as i told it not to touch that
    it worked. I didn’t noticed before. As said, i worked with an export
    out of Indesign, and my editor changed the tags as soon as i edit the
    text. I didn’t know better or it should be /.?r?n?r?nI finally can
    use my scriptable text-editor with grep-pattern to change text
    between curved brackets into italic, capitalized abbreviations to
    small-caps, certain names to bold. Again thanks in a bundle!

  9. Prlwytskofski
    July 2nd, 2006 • 5:20 pm • Link

    Humm, all tags were removed in last post. I’m afraid you
    keep puzzeld about what i tried to explain…

  10. July 2nd, 2006 • 9:55 pm • Link

    I couldn’t quite tell if you were
    saying that the special characters in my post weren’t coming through
    properly, or if those in your post didn’t come through. Let me know
    if this isn’t working for you and I’ll try to clarify.

  11. Prlwytskofski
    July 3rd, 2006 • 2:13 pm • Link

    Okay, i should wrote ‘Humm, all tags were removed in MY
    last post. I?m afraid you keep puzzled about what i tried to
    explain’?r?n?r?nIn your post those hooked parenthesized (like
    around pstyle:) didn’t cane trough in my post. Maybe thay also
    don’t come trough in regular mail. Because those are also used in
    html, and can be dangerous if used in posts. So i have to try to
    explain in un-visualy in a for me non native
    language.?r?n?r?n(English is as said not my native language, and
    therefore i did not understand the manual about tagging text. Thats
    why i started of whit an tagged text exported bY Indesign itself.)
    What happened to me, i used an editor witch assumed the tagged text
    exported by Indesign was html or xml, Accordingly it immediately
    ‘corrected al the errors’ resulting in a faulty tagged text. That
    faulty text, i used to do some experimenting withpattern search, and
    putting formatting on it. As i started of faulty, my hard work never
    payed off.?r?n?r?nThanks to you explaining i suddenly (after two year
    on and off trying) saw what was going wrong. And now -by a little
    hear out of an elephants tail- (literal translation of a dutch
    saying) i am finessed making a script i expect safes me about half
    the time this specific job takes. (putting format to specific
    patterns of chars). Just by dropping a bunch of text-files on that
    script thy receive about 95% of their formatting.?r?n?r?nThe
    texteditor i use is Tex-Edit Plus. It treats grep tiny bit different.
    Instead-of /1 /2 /3 it uses ^1 ^2 ^3 to get hold of subexpressions.
    If someone is interested, i can put my (apple)script online.

  12. July 4th, 2006 • 10:58 am • Link

    I’m glad to hear that this method
    has cut your work time in half. I’ve put grep to work everywhere I
    can for my job, and it has reduced hours of work to mere minutes.

  13. Prlwytzkofski
    August 5th, 2006 • 6:09 pm • Link

    At first it seems going as a rocket. I must do something
    wrong tough. Using Grep to make a tagged-text it seems to work.
    Importing it in indesign, all ‘cstyle’ elements will do as
    expected, but ‘pstyle’ however seems to gets lost.?r?n?r?nThe
    imported text got no paragraph style applied to it what so ever Not
    even ‘Basic Paragraph’. Below a part of the tagged
    text.?r?n?r?n[code]?r?n?r?n?r?nINGEZONDEN MEDEDELING, van onze
    correspondent facilitaire zaken?r?n?r?n(Heer Bommel en de Hopsa's,
    BV 147, 8079)?r?n[/code]?r?n?r?nNeedless to say the
    Indesign-doccument got a paragraph style called ‘Brood (artikel
    tekst)’ and the import generates no error.?r?n?r?nThe reason it
    took me quite a bit to realize things did not go that smooth; When
    importing the Tagged-text, the text got the char-style applied witch
    was selected in the pallet at the time of import. More or less
    accidentally my master char-style for the paragraph style I imported.
    ?r?n?r?nI noticed hardly any changes to the text, tweaking the
    settings of ‘Brood (artikel tekst)’. Just moments ago, i realized
    it was not ‘hardly any changes’ but no changes at all…?r?n?r?nEd.

  14. Prlwytzkofski
    August 5th, 2006 • 6:15 pm • Link

    Humm… Your blog does not allow for me to post the
    indesign tag’s. Proberbly becouse the comment system does not use bb
    edit code (if it did, all text between [code] and [code] would show
    literaly)?r?n?r?nSo how can I (we, the users) show a sample in a
    comment post??r?n?r?nEd.

  15. August 6th, 2006 • 1:15 pm • Link

    Ed –?r?n?r?nE-mail me the text
    file, and I’ll see if anything’s missing that might be causing your
    problem. The thing about showing code on the site is to “escape out”
    the special characters. For example: to display an opening angle
    bracket, you need type ampersand-l-t-semicolon, and to type a closing
    angle bracket (the l and t stand for “less than”, which makes it
    easier to remember). To display a closing anngle bracket, type
    ampersand-g-t-semicolon (greater than).

  16. Prlwytzkofski
    August 6th, 2006 • 3:05 pm • Link

    [Babble on]?r?nI Did not realized escaping &lt; would
    work. In an earlier post I tried to color-code my text with html,
    that failed to work. That’s why I tried bb-code. The missing code
    was sent to info at your url?r?n[Babble off]?r?n?r?n?r?nLets try
    again in posting the tagged tekst (fingers crossed). It should appear
    between the both
    [code]'s?r?n?r?n[code]?r?n<ASCII-MAC>?r?n<pstyle:Brood
    ??(artikel tekst??)>INGEZONDEN MEDEDELING, van onze correspondent
    facilitaire zaken<pstyle:>?r?n<pstyle:Brood ??(artikel
    tekst??)>(<cstyle:TussenHaakjes>Heer Bommel en de Hopsa's,
    <cstyle:SmallCaps>BV<cstyle:> 147,
    8079<cstyle:>)<pstyle:>?r?n[code]?r?n[code]?r?n<ASCII-MAC>?r?n<pstyle:Brood
    ??(artikel tekst??)>INGEZONDEN MEDEDELING, van onze correspondent
    facilitaire zaken<pstyle:>?r?n<pstyle:Brood ??(artikel
    tekst??)>(<cstyle:TussenHaakjes>Heer Bommel en de Hopsa's,
    <cstyle:SmallCaps>BV<cstyle:> 147,
    8079<cstyle:>)<pstyle:>?r?n[/code]?r?n?r?nEd.

  17. Prlwytzkofski
    August 6th, 2006 • 3:08 pm • Link

    Whoops, Must have pressed paste button twice. Sorry for
    wasting the environmental friendly, but expensive recycled electrons
    used in this blog…?r?n?r?nEd.

  18. August 6th, 2006 • 3:17 pm • Link

    Ed –?r?n?r?nI think what’s
    breaking your tagging once it’s brought into InDesign is the
    backslashes before the parentesis around “artikel tekst”. Actually,
    I’m surprised you don’t get an error message when placing the text.
    My experience is that InDesign displays a warning that text can’t be
    imported when styles identified with tags do not correspond exactly
    to styles in the document. If your paragraph style is named “Brood
    (artikel tekst)”, your incoming text should not have the backslashes
    in it.

  19. Prlwytzkofski
    August 6th, 2006 • 3:57 pm • Link

    Well Michael, those backslashes before the parenthesis
    were put there by Indesign itself. My paragraph style is indeed named
    Brood (artikel tekst) On exporting it those (also escape?*)
    backslashes appeared.?r?n?r?nBut I tried as you mentioned without
    those backslashes, using exact spelling of the paragraph style in the
    application. Result is the same. paragraph style seems not to get
    imported.?r?n?r?nA thought prang up. What if those backslashes were
    not put there intentionally by Indesign’s export module, but it’s
    an misuse of functions StripSlashes and AdSlashes in the code… A
    bug, or undocumented feature so to speak…?r?n?r?nI’ll try renaming
    al my styles, into not using parenthesis. You will hear the result.
    (should i also not use spaces just to be safe?)?r?n?r?n*?r?nA
    backslash is used as an escape char in a some programming languages.
    It works more or less like the -earlier in this rope- mentioned &
    for HTML. ?r?n?r?nEd.

  20. August 6th, 2006 • 4:03 pm • Link

    Spaces are fine in style names.
    Parenthesis can be dealt with, but if they’re causing trouble, try
    removing them and see if it works.

  21. scShaw
    April 14th, 2007 • 12:38 pm • Link

    Great article for understanding how to also use this in CS3. Was wondering if there is a typo in the below paragraph of your example:

    \1\rWHAT’S NEW:\1\rTHE VALUE:\3\ru \4\r& \5

    The bracketed “pstyle” elements are tags that InDesign will use to format this text automatically when placed in a document with the corresponding styles, the names of which follow the colon in the bracketed tag. The combinations of backslashes and numbers — \1 \2 \3 and so on — refer to the sub-patterns in the original search pattern. They’re numbered by their order in the search instruction. The text of the What’s New Paragraph is \2 and the text of the Phone Number is \5.

    Where is the \2 sub-pattern in the example?

  22. April 14th, 2007 • 1:40 pm • Link

    Good catch! It was a typo, and now it’s fixed.

  23. May 17th, 2007 • 1:54 pm • Link

    I don’t get GREP. I have numbered Paragraphs that the number appears in the running head. I want to search out all the Numbers in my document and replace them with a Char Style so they appear properly in the head. But I don’t know how to GREP it. My Paragraphs consist of 9.4 (tab) Heading, how do I just search for the 9.4, 9.4.1, 9.4.2 etc.? When I originaly did it I searched for \t(.+)\r, but this found 9.4 + (tab). Then I had to search all the tabs and change the Char Style back to NONE so they didn’t show in the Header. Then not only did it find all the numbers but it also found (i) (tab) and (ii) (tab), so I had to change all of them back to Char Style None. I’m sure there is a simple solution?

    Have I gone about this backwards?

  24. May 17th, 2007 • 2:13 pm • Link

    Sorry (.+)\t was my original GREP search for finding numbers as each number finishes with a (tab) after it.

  25. May 17th, 2007 • 2:51 pm • Link

    It’s ok, I found a better way to do it. I just nested my Character Styles into my Paragraph Styles so that 9.1 was Character Style 1 and HEADING was Character Style 2. Now all my running heads are perfect, alternating left and right every page. Stupendous.

  26. May 17th, 2007 • 2:59 pm • Link

    Good for you, Eugene. This should be my new strategy…wait long enough and people solve their own problems. :)

    There would have been a way to do this with GREP, but the method you came up with is actually the better and more efficient one. In CS3, you could use the much-improved Numbered List features to accomplish the same thing, but have all of the numbers be generated automatically (even with the decimal formatting).

    If you were using text-based Find/Change in CS2, you would have searched for any digit, a period, and any digit, then put nothing in the replace field, but chosen the appropriate character style in the Change Format area.

  27. May 18th, 2007 • 7:57 am • Link

    That’s my next task, to number them all manually, it’s such a pain especially when you have 100 level 2 headings. Up to now I’ve been doing it manually, this is an update of last years publications so I’m working from CS2 files in CS3. I couldn’t believe it when it dawned on me how simple the solution actually was, I think it gives me more flexibility too. Now, I’m off to find another problem and solve that too. Oh I love CS3, can’t wait until I have use it fluently. Cheers guys.

  28. March 19th, 2008 • 4:40 am • Link

    NEWS UPDATE:

    GREP is in CS3.

  29. May 13th, 2008 • 3:23 pm • Link

    I just ordered Indesign CS3 for my MAC. How do I uninstall CS2 or should I?

  30. May 13th, 2008 • 3:33 pm • Link

    There’s no reason I can think of to uninstall CS2. All CS versions of InDesign (and the other suite apps) can co-exist on the same system. They install separately, not as updates to the previous version. This is great for backward compatibility. I personally keep all CS versions on my machine for training and demonstration purposes. If you’ve got the room, it can’t hurt. By default, the OS will open the newest version of the application when you double-click a “.indd” file.

    One argument for not uninstalling it is that, once you do, you can’t re-install an older version when a new version exists on the machine without first uninstalling all of the CS3 apps, then re-installing CS, CS2, and CS3 in that specific order. That’s very time-consuming.

  31. Walton Jones
    November 16th, 2008 • 5:29 pm • Link

    How could I find all digits that are not included in parentheses? Basically I want to format a bunch of Bible references by making the verse numbers superscript, but leaving the reference alone.

    Here is an example:

    63The Spirit is the one who gives life; human nature is of no help! The words that I have spoken to you are spirit and are life. 64But there are some of you who do not believe.” (For Jesus had already known from the beginning who those were who did not believe, and who it was who would betray him.) (Jn 6:63-64)

    See there are numbers in the text and numbers in the reference at the end. I want the numbers in the text to be superscripted, but not the numbers in the reference.

    Is this possible? I can’t figure out how to make exceptions with GREP. Please help!

  32. Walton Jones
    November 22nd, 2008 • 4:41 pm • Link

    I am still looking for an answer to the problem I described in my previous comment. It is a problem of searching for only the verse numbers in a quote from the Bible without touching other digits (such as in the reference). I can’t figure it out!

    Please help…

    I watched the lynda.com tutorials…and they were great, but they didn’t cover this topic of GREP in enough detail to solve this problem.

  33. David Blatner
    November 24th, 2008 • 6:45 am • Link

    @Walton: Ah, it took me a while, but I think this will work. In the GREP “find what” field, search for:
    (\d+?)(?=[\l\u])
    (that’s all one line, no spaces)
    and in the change to field, type:
    $0
    (which just means, “replace with what you found”)

    Now you can change the Change Format area to apply the character style or superscript, or whatever.

  34. November 24th, 2008 • 10:50 am • Link

    Sorry for the late response, Walton…and thank you, David, for jumping in.

    If I understand Walton’s question correctly, I think what he’s looking for is to superscript those numbers at the start of the verses, but not format any of the references to the verses in parentheses.

    This requires a little more complex search. Using your sample paragraph, Walton, the “idea” of what you’re looking for is this:

    Any one or more digits at the beginning of a word, but only if those digits are not preceded by an opening parenthesis, upper- or lower-case character, colon or dash and only if they’re not followed by a closing parenthesis, colon or dash.

    To do that, you need to use the Negative Lookbehind (only if it’s not preceded by…) and Negative Lookahead (only if it’s not followed by…) expressions.

    That search would look like this:

    \<(?<!\(|[lu]|:|-)\d+(?!\)|:|-)

    As far as the replace function, you actually don’t need to enter anything in the Change To field, as long as you’re definitely applying formatting in the change operation. When a formatting change is attached to a query, the blank Change To field no longer means “replace with nothing” it means “leave the text unchanged, just follow the formatting instructions.”

    However, there’s no harm at all using $0 as David suggests, or even $1 (Found Text and Found Text 1, respectively). The result will be the same.

    Let me know if it works for you. I tested it on my end and it worked like a charm.

  35. Walton Jones
    November 24th, 2008 • 2:24 pm • Link

    Thanks so much guys! You are getting me closer.
    I tried your version Michael and it is almost there. The only problem is that it is also finding the first digit in the reference when the chapter consists of two digits.

    If for example the reference were (Ex 24:12), it would find the first 2 in 24.

    It is also finding digits in the middle of the verses when they happen to have numbers in them. I am afraid those may be really hard to get rid of, but do you think it is possible to take advantage of the fact that there is never any white space between the verse numbers and the next character (usually a letter, quotation mark, or open parenthesis)?

  36. November 24th, 2008 • 3:26 pm • Link

    Try this one, Walton:

    <(?<!()d+?(?=[ul"(])

    Basically, that’s doing this:

    Any one or more digits at the beginning of a word, only if it is not preceded by an opening parenthesis and only if it is followed by either an upper- or lower-case character, quotation mark or opening parenthesis.

    Those are the criteria you stated in your last post. I tested it on the one sample paragraph in your original post and it worked (but then again, so did my last test).

    Let us know how that works.

    IMPORTANT: I’m seeing a problem in how the blog is presenting these little bits of code. It’s not displaying some very crucial backslashes that need to be included before certain characters. I will e-mail you the GREP pattern since it can’t be represented accurately here.

  37. Walton Jones
    November 24th, 2008 • 3:55 pm • Link

    Wow! You guys rule!
    It works like a charm.

    Thanks so much. That will save me hours of work.

  38. David Blatner
    November 25th, 2008 • 12:25 pm • Link

    Yes, Michael, that was the problem! The backslashes were gone. I have gone back and edited my comment above. This code really does work, and it’s very simple because it just searches for one or more digits that are followed by a uppercase or lowercase letter.

    To type a backslash, you can type:
    & # 9 2 ; (with no spaces in between)

  39. January 22nd, 2009 • 2:32 pm • Link

    Speaking of Bible, I need to add a progressive number before the first letter in every phrase of a text.

    I.E. 1First phrase. 2Second phrase. 3So on.

    I can easily find the pattern were numbers have to be added but I don’t know if there’s a way to make Indesign (or any text editor) insert a progressive numbering.
    Any idea?

  40. Lemonshrew
    February 3rd, 2009 • 10:32 am • Link

    Giacomo, I have a similar problem.
    I have a set of Bible references that need the numbers added.

    Example:
    1. xx Luke 3:23 xx John 7:42
    2. xx Gen. 21:2 xx Gen. 25:26 xx Ruth 4:18

    needs to look like this:
    1. 1 Luke 3:23 2 John 7:42
    2. 1 Gen. 21:2 2 Gen. 25:26 3 Ruth 4:18

    I need to replace all the “xx”’s with numbers in order for each entry, then starting over with the next one.

    Anyone have any ideas?

  41. February 3rd, 2009 • 2:01 pm • Link

    Lemonshrew, short of a script or a miracle GREP function, I guess I would do it in a series of Find/Change passes.

    For the first pass:

    Find: ^(1.) xx (Luke 3:23) xx (John 7:42) xx etc.
    … using GREP to match those patterns within parens, but leave the parens themselves. First pass would find the longest instance in the doc (e.g. the one needing digits 1 through 9 inserted)
    Change: $1 1 $2 2 $3 3 $4 4 etc.
    ….Essentially it would re-insert the first found bit, replace the first xx with the number 1 (surrounded by spaces), replace the 2nd found bit, replace the second xx with the number 2, and so on.

    Have the first pass only find the *longest* instances, then shorten both Find and Change so the 2nd pass finds the second-most longest instances, etc.

    Well that’s my guess, anyway. ;-)

  42. Lemonshrew
    February 18th, 2009 • 8:30 am • Link

    Thanks Anne-Marie!

    I never thought of that. See, this is why they pay you the big bucks! :)

    LS

  43. Lemonshrew
    March 3rd, 2009 • 8:25 am • Link

    I have a large set of numbered references (67 pages worth). Some of these references have been split. I need to find them and put them back together.


    For example:

    Chapter 1
    1 (a)This is a reference. (b)This is a reference.
    2 (a)This is reference.
    2 (b)This is a reference.
    3 (a)This is reference. (b)This is a reference.


    Needs to be:

    Chapter 1
    1 (a)This is reference. (b)This is a reference.
    2 (a)This is reference. (b)This is a reference.
    3 (a)This is reference. (b)This is a reference.

    If the reference numbers only went up to 10 or 20, it wouldn’t be much trouble to just look for them individually. But some chapters have up to 176. I don’t mind fixing each one of these by hand if necessary, but finding them is my main problem. (Note: the split does not always occur between the (a) and (b) entries.)

    Any help would be appreciated!

  44. March 3rd, 2009 • 9:00 am • Link

    Lemonshrew — There’s a way to find any numbered reference (as you describe it in your example text) that’s followed by another line that starts with the same number, then “unite” those two lines and remove the duplicate number. It requires using a backreference in the GREP search. I was working with Anne-Marie to solve a similar problem, and I detail how backreferences work in a post on my blog, but here’s a specific solution to your problem.

    This will find any two lines that start with the same number and unite them onto one, removing the extraneous number, but it will only unite (a) and (b), not (a), (b), (c), (d), and so on. However, it will unite every (a) and (b) line in it’s first pass. You can run it again to unite the newly-united (a)(b) line with the (c) line, and so on. In other words, you’ll have to run the search as many times as you have references for a number. If you have references a through e, that means you may have to run the search five times. But that five-time process will handle as many numbered lines as you have, so even 176 numbered paragraphs will only require five searches.

    In fact, the pattern actually gets more efficient the more times you run it. Lines keep getting paired up on each pass, so you may not even have to run it five times. Just keep running it until no more matches are found.

    Here’s what you’d enter in the Find what field in the GREP area of the Find/Change dialog:

    (d+) (([a-z]))(.+)r1 (([a-z]))(.+)

    and here’s what you’d enter in the Change to field:

    $1 $2$3 $4$5

    Give it a try (on a copy of your file, of course!) and let me know how it works. I tried it on a sample directly copied from what you posted here, and added references up to (f) and it worked like a charm.

    – Michael

  45. Lemonshrew
    March 3rd, 2009 • 10:07 am • Link

    You guys are the best!

    It took me a minute to figure out that the backslashes had dropped out of the post, but it works great!

  46. March 3rd, 2009 • 8:37 pm • Link

    Glad to hear that worked, Lemonshrew. Sorry about the backslash thing. When I posted the comment initially, they were there, but somehow the blog filtered them out when I refreshed the page. For those following this thread, the pattern for the Find what field in my last comment should have shown a backslash before the letter “r” (which forms the metacharacter for a hard return), and a backslash before the number 1 (which forms the backreference to the first subpattern defined in the search (d+).

    David & Anne-Marie…do you think you can edit the comment on your end so that the backslashes appear as they should? That’ll help people make sense of all this when we’ve all long forgotten we ever posted it. :) Thanks!

  47. March 18th, 2009 • 4:58 am • Link

    Hi all, I’m trying to dlete a single space between two capital letters, and i can’t! Its drving me nuts, any help would be much appreciated, here’s an example:

    Ms P M Halshaw

    should read

    Ms PM Halshaw

    so far im using [[:upper:]] [[:upper:]] in the find what box,
    but i don’t know the code to remove the single space between the capitals.

    Any ideas?

    Thanks

  48. Nadya Miloserdova
    March 18th, 2009 • 5:08 am • Link

    @Zoheb
    Find What field:
    (\b[[:upper:]]) (\b[[:upper:]])
    Change To field:
    $1$2

  49. March 18th, 2009 • 6:01 am • Link

    Brilliant! Thank you so much Nadya! I modified it slightly, i just had to add a space after the second upper.

    Thanks again!

  50. mayanrose
    April 12th, 2009 • 2:54 pm • Link

    I need help! I cant figure out how to do this in a Bible:

    Each page has a heading, for example: Genesis 2:3
    The Genesis is a section marker but I dont know how to put the chapter and the verse automatically so I dont have to do it manually.

    At the end of the page the text finishes like this:

    Capitulo 2
    Dios santifica…

    1 Asi fueron terminados….
    2 El septimo dia….
    3 Por eso Dios…

    How do that in every page…..Is there a way?

  51. Nadya Miloserdova
    April 13th, 2009 • 12:30 am • Link

    @Mayanrose:
    Your solution can possible be either Running headers/footers, or Paragraph Numbering, or Text Variables.

    Unfortunately, it is not clear from your letter what particular part is constant or variable for each page.

    Let’s say you want to have such text:
    “Alice 13:4″, where Alice is constant part, and the digits vary from page to page.
    Make special paragraph numbering.
    Open dialog ‘Bullets and Numbering’ from Paragragh menu.
    In the ‘Number’ box type: Alice 13:^#
    Watch the result.

    Then you may want to consider using multilevel numbering as it can automate your variable “13″ here as well.

    Good luck!

  52. mayanrose
    August 3rd, 2009 • 4:19 pm • Link

    ohh my so new in this!
    Thanks a lot for the other ways to solve it!!

  53. October 22nd, 2009 • 1:27 pm • Link

    For example, Cisco announced that they were canceling their global sales meeting in favor of a virtual event. ,

Subscribe to the Discussion

Get the ongoing discussion surrounding "Grep Pattern Searching" delivered to you. Click here to subscribe via RSS.

Leave a Reply

You can use limited HTML tags, such as <em></em> for emphasis/italics and <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong> .

InDesignSecrets reserves the right to edit and/or remove posts and comments.