Grep duplicate-select first instance

Learn / Forums / InDesign User Groups / Grep duplicate-select first instance

Viewing 12 reply threads
  • Author
    Posts
    • #97708
      Alley J
      Participant

      Hi,

      Hoping for some help: I have copied dates into indd & have this (Jan-Dec but e.g.): January 22 29 January 12 24. I’ve used this string & it isolates/retains only the 2nd instance of months & their dates: (\u\l+ )(\d+ )*\1 [the big spaces here are all tabs, incl. in the Grep string] but how would I then isolate/retain just the first instance of dates?

    • #98073
      Graham Park
      Member

      Don’t have an elegant solution for you but this will work, it will remove the second number following the month, tab, number. But it will not remove punctuation following the second date.
      There is no real rule to find the months without finding text in other places, so they need to be explicitly defined.

      GREP Find and replace
      FIND
      ((January|February|March|April|May|June|July|August|September|October|November|December)\t\d+)(\t\d+)

      REPLACE
      $1

      How it works is to find two groups and then only add back the first item found.

    • #98081
      Alley J
      Participant

      Thanks for your reply, Graham. Upon re-reading, I realise I didn’t make it very clear (you know, I know what I mean!!). What it is, is 2 years of dates that when copied from the pdf, actually come in Jan\r\d, \d, \d+\rJan\r\d, \d, \d+\r, etc for the 2 years (so Jan 2017 followed by Jan 2018, followed by Feb 2017, then Feb 2018, etc, etc – I had changed the returns & comma-space to \t but originally it was as above – a return after each date & then each set of numbers). I needed to extract the set of 2017 months from the 2018. I got the 2018 by keeping only the duplicate set (using .*? for the numbers). But in the end, as time was a-wasting, I just changed it as nec. to become a table, cut that column & converted it back to text – it was the simplest solution but I’ve only picked up Grep in the last 2 or so years & so it was the (as someone else has put it), brain acrobatics? to see if it would work should it keep coming up again.

      In another similar situation, I applied numbering to each line, changed the numbers to text & used that to distinguish which lines I wanted where. It was a bit round-about, but I got there!

      But thank you for taking the time!

    • #98082
      Graham Park
      Member

      A bit hard from that, if you would like to post some sample text I can have a shot at it.

    • #98083
      Alley J
      Participant

      January
      3, 8,10,15, 22, 24, 26, 29
      January
      5,10,12,17, 24, 26, 29, 31
      February
      5, 12,16, 19, 21, 26
      February
      7, 14, 19, 21, 23, 28
      March
      2, 5, 12,19, 26
      March
      5, 7, 14, 21, 28
      April
      2, 9, 16, 23, 30
      April
      4, 11,18, 25
      May
      7, 14, 21, 28
      May
      2, 9, 16, 23, 30
      June
      4, 11, 18, 25
      June
      6, 13, 20, 27
      July
      2, 9, 16, 23, 30
      July
      4, 11, 18, 25
      August
      6, 13, 20, 27
      August
      1, 8, 15, 22, 29
      September
      3,10,17,24
      September
      5,12,19,26
      October
      1, 8, 15, 22, 24, 29
      October
      3, 10, 17, 24, 26, 31
      November
      5, 12, 19, 26
      November
      7, 14, 21, 28
      December
      3, 10, 17, 19
      December
      5, 12, 19, 21

      • #98157

        Hi,

        Basing on this list aand just don’t forget to have a carriage-return at the end of the List:

        Find:
        ((January|February|March|April|May|June|July|August|September|October|November|December)\r)((\d+(,\h?)?)+\r)\1((?3))

        Replace1:
        $1$3
        to get the first date, not the second

        Replace2:
        $1$6
        to get the second date, not the first

        (^/)

        Note “,\h?” takes in account typing errors in days (no space after some commas)!

      • #98192
        Alley J
        Participant

        Thanks, Obi-Wan. What does the ((?3))’ do? I tried using alternatives to the ‘/1’ – ‘/2’, ‘/3’, etc but they don’t work! Is it like that?
        (Ha, like the ‘^/’ – just realised what it was after seeing it on many posts!)

      • #98193

        (^/) ==> the cape, the hood and the light-saber: the signature of the Great Jedi-Masters, as on my avatar photo!

        MTFBWY!

        (^/)

    • #98084
      Graham Park
      Member

      Sorry I can’t help with GREP as it almost exclusively on 1 line with a few exceptions.

      To achieve this I think you would need to write a script. That is not my area so maybe some else can help.

    • #98085
      Alley J
      Participant

      No worries, thanks for looking, anyway :)

    • #98088
      David Blatner
      Keymaster

      Your list is kind of like a list… I wonder if Peter Kahrel’s Update Index script would help? https://www.kahrel.plus.com/indesign/lists_indexes.html

    • #98100
      Alley J
      Participant

      Thanks, David – I tried the script (changing the page_span to 0) & it just seemed to delete the first line (January) but I’m not sure what I’m doing there, exactly, so I’ll have another good look but I did work out a solution, based on Graham’s suggested Find. I just need to run one Grep, then run a 2nd one on the original text again:

      Find what:
      (January|February|March|April|May|June|July|August|September|October|November|December)\r(.*?)\r
      (January|February|March|April|May|June|July|August|September|October|November|December)\r(.*?)$

      Change to:
      $1\t$2\r

      & then again with the Change to:
      $3\t$4\r

      Perfectly seps out 1 year at a time. (I’ve been unsure about the ‘or’ symbol – whether it needs square brackets around it or not but this shows me when I do & don’t, too.)

      Appreciate you both taking the time – it’s another ‘happily ever after’ in Grep land!

    • #98103
      Graham Park
      Member

      Love a good brain teaser.
      I finally worked this one out using yours GREP as a starting point.
      Now two steps and it is done.

      FIND
      (January|February|March|April|May|June|July|August|September|October|November|December)\r(.*?)\r

      CHANGE TO
      $1\t$2\r

      Then

      FIND
      ((January|February|March|April|May|June|July|August|September|October|November|December)\t.+$)\r((January|February|March|April|May|June|July|August|September|October|November|December)\t.+$)

      CHANGE TO
      $1

    • #98106
      Graham Park
      Member

      You could shorten the months to make it query easier to read.

      FIND
      (Ja.+|Fe.+|Mar.+|Ap.+|May|June|July|Au.+|Se.+|Oc.+|No.+|De.+)\r(.*?)\r
      Replace
      $1\t$2\r

      Then

      FIND
      ((Ja.+|Fe.+|Mar.+|Ap.+|May|June|July|Au.+|Se.+|Oc.+|No.+|De.+)\t.+$)\r((Ja.+|Fe.+|Mar.+|Ap.+|May|June|July|Au.+|Se.+|Oc.+|No.+|De.+)\t.+$)
      REPLACE
      $1

    • #98107
      Alley J
      Participant

      Before your last reply, I had shortened the 2nd query like this:
      ((January|February|March|April|May|June|July|August|September|October|November|December)\t.+?\r?){2}

      putting a ‘?’ after return to catch the last set of numbers at the ‘end of story’. But further shortening never hurts!

      I was initially able to extract the 2nd set of numbers (for 2018) using the ‘\1’ find duplicate query:

      Find:
      (\u\l+\r)(\d+.*\r)*\1

      Change to:
      $1

      Shorter, still :)

      I was just struggling to find a way to extract only the 1st set of year’s months (for 2017).

      But with all of these refinements, it’s finally worked so thanks so much for your help with that – the query with the ‘or’ months did the trick with just a few adjustments & abbreviations along the way & now I’m a ‘happy Grep-er’!

    • #98108
      Graham Park
      Member

      That will work but I think it will find every second paragraph.
      As such you will need to be more careful when you use it.
      The one I did specifies the first word of the line in the find so is a bit safer. Still use all with care.

Viewing 12 reply threads
  • You must be logged in to reply to this topic.
>