Grep duplicate-select first instance
Learn / Forums / InDesign User Groups / Grep duplicate-select first instance
Tagged: duplicate, first instance, GREP, second instance
- This topic has 15 replies, 4 voices, and was last updated 6 years, 6 months ago by Michel Allio for FRIdNGE.
-
AuthorPosts
-
-
September 3, 2017 at 8:24 pm #97708Alley JParticipant
Hi,
Hoping for some help: I have copied dates into indd & have this (Jan-Dec but e.g.): January 22 29 January 12 24. I’ve used this string & it isolates/retains only the 2nd instance of months & their dates: (\u\l+ )(\d+ )*\1 [the big spaces here are all tabs, incl. in the Grep string] but how would I then isolate/retain just the first instance of dates?
-
September 13, 2017 at 7:21 pm #98073Graham ParkMember
Don’t have an elegant solution for you but this will work, it will remove the second number following the month, tab, number. But it will not remove punctuation following the second date.
There is no real rule to find the months without finding text in other places, so they need to be explicitly defined.GREP Find and replace
FIND
((January|February|March|April|May|June|July|August|September|October|November|December)\t\d+)(\t\d+)REPLACE
$1How it works is to find two groups and then only add back the first item found.
-
September 14, 2017 at 1:05 am #98081Alley JParticipant
Thanks for your reply, Graham. Upon re-reading, I realise I didn’t make it very clear (you know, I know what I mean!!). What it is, is 2 years of dates that when copied from the pdf, actually come in Jan\r\d, \d, \d+\rJan\r\d, \d, \d+\r, etc for the 2 years (so Jan 2017 followed by Jan 2018, followed by Feb 2017, then Feb 2018, etc, etc – I had changed the returns & comma-space to \t but originally it was as above – a return after each date & then each set of numbers). I needed to extract the set of 2017 months from the 2018. I got the 2018 by keeping only the duplicate set (using .*? for the numbers). But in the end, as time was a-wasting, I just changed it as nec. to become a table, cut that column & converted it back to text – it was the simplest solution but I’ve only picked up Grep in the last 2 or so years & so it was the (as someone else has put it), brain acrobatics? to see if it would work should it keep coming up again.
In another similar situation, I applied numbering to each line, changed the numbers to text & used that to distinguish which lines I wanted where. It was a bit round-about, but I got there!
But thank you for taking the time!
-
September 14, 2017 at 1:26 am #98082Graham ParkMember
A bit hard from that, if you would like to post some sample text I can have a shot at it.
-
September 14, 2017 at 1:33 am #98083Alley JParticipant
January
3, 8,10,15, 22, 24, 26, 29
January
5,10,12,17, 24, 26, 29, 31
February
5, 12,16, 19, 21, 26
February
7, 14, 19, 21, 23, 28
March
2, 5, 12,19, 26
March
5, 7, 14, 21, 28
April
2, 9, 16, 23, 30
April
4, 11,18, 25
May
7, 14, 21, 28
May
2, 9, 16, 23, 30
June
4, 11, 18, 25
June
6, 13, 20, 27
July
2, 9, 16, 23, 30
July
4, 11, 18, 25
August
6, 13, 20, 27
August
1, 8, 15, 22, 29
September
3,10,17,24
September
5,12,19,26
October
1, 8, 15, 22, 24, 29
October
3, 10, 17, 24, 26, 31
November
5, 12, 19, 26
November
7, 14, 21, 28
December
3, 10, 17, 19
December
5, 12, 19, 21-
September 18, 2017 at 3:11 am #98157Michel Allio for FRIdNGEParticipant
Hi,
Basing on this list aand just don’t forget to have a carriage-return at the end of the List:
Find:
((January|February|March|April|May|June|July|August|September|October|November|December)\r)((\d+(,\h?)?)+\r)\1((?3))Replace1:
$1$3
to get the first date, not the secondReplace2:
$1$6
to get the second date, not the first(^/)
Note “,\h?” takes in account typing errors in days (no space after some commas)!
-
September 18, 2017 at 3:33 pm #98192Alley JParticipant
Thanks, Obi-Wan. What does the ((?3))’ do? I tried using alternatives to the ‘/1’ – ‘/2’, ‘/3’, etc but they don’t work! Is it like that?
(Ha, like the ‘^/’ – just realised what it was after seeing it on many posts!) -
September 18, 2017 at 4:01 pm #98193Michel Allio for FRIdNGEParticipant
(^/) ==> the cape, the hood and the light-saber: the signature of the Great Jedi-Masters, as on my avatar photo!
MTFBWY!
(^/)
-
-
September 14, 2017 at 2:39 am #98084Graham ParkMember
Sorry I can’t help with GREP as it almost exclusively on 1 line with a few exceptions.
To achieve this I think you would need to write a script. That is not my area so maybe some else can help.
-
September 14, 2017 at 3:21 am #98085Alley JParticipant
No worries, thanks for looking, anyway :)
-
September 14, 2017 at 8:01 am #98088David BlatnerKeymaster
Your list is kind of like a list… I wonder if Peter Kahrel’s Update Index script would help? https://www.kahrel.plus.com/indesign/lists_indexes.html
-
September 14, 2017 at 4:04 pm #98100Alley JParticipant
Thanks, David – I tried the script (changing the page_span to 0) & it just seemed to delete the first line (January) but I’m not sure what I’m doing there, exactly, so I’ll have another good look but I did work out a solution, based on Graham’s suggested Find. I just need to run one Grep, then run a 2nd one on the original text again:
Find what:
(January|February|March|April|May|June|July|August|September|October|November|December)\r(.*?)\r
(January|February|March|April|May|June|July|August|September|October|November|December)\r(.*?)$Change to:
$1\t$2\r& then again with the Change to:
$3\t$4\rPerfectly seps out 1 year at a time. (I’ve been unsure about the ‘or’ symbol – whether it needs square brackets around it or not but this shows me when I do & don’t, too.)
Appreciate you both taking the time – it’s another ‘happily ever after’ in Grep land!
-
September 14, 2017 at 6:35 pm #98103Graham ParkMember
Love a good brain teaser.
I finally worked this one out using yours GREP as a starting point.
Now two steps and it is done.FIND
(January|February|March|April|May|June|July|August|September|October|November|December)\r(.*?)\rCHANGE TO
$1\t$2\rThen
FIND
((January|February|March|April|May|June|July|August|September|October|November|December)\t.+$)\r((January|February|March|April|May|June|July|August|September|October|November|December)\t.+$)CHANGE TO
$1 -
September 14, 2017 at 7:27 pm #98106Graham ParkMember
You could shorten the months to make it query easier to read.
FIND
(Ja.+|Fe.+|Mar.+|Ap.+|May|June|July|Au.+|Se.+|Oc.+|No.+|De.+)\r(.*?)\r
Replace
$1\t$2\rThen
FIND
((Ja.+|Fe.+|Mar.+|Ap.+|May|June|July|Au.+|Se.+|Oc.+|No.+|De.+)\t.+$)\r((Ja.+|Fe.+|Mar.+|Ap.+|May|June|July|Au.+|Se.+|Oc.+|No.+|De.+)\t.+$)
REPLACE
$1 -
September 14, 2017 at 7:58 pm #98107Alley JParticipant
Before your last reply, I had shortened the 2nd query like this:
((January|February|March|April|May|June|July|August|September|October|November|December)\t.+?\r?){2}putting a ‘?’ after return to catch the last set of numbers at the ‘end of story’. But further shortening never hurts!
I was initially able to extract the 2nd set of numbers (for 2018) using the ‘\1’ find duplicate query:
Find:
(\u\l+\r)(\d+.*\r)*\1Change to:
$1Shorter, still :)
I was just struggling to find a way to extract only the 1st set of year’s months (for 2017).
But with all of these refinements, it’s finally worked so thanks so much for your help with that – the query with the ‘or’ months did the trick with just a few adjustments & abbreviations along the way & now I’m a ‘happy Grep-er’!
-
September 15, 2017 at 1:16 am #98108Graham ParkMember
That will work but I think it will find every second paragraph.
As such you will need to be more careful when you use it.
The one I did specifies the first word of the line in the find so is a bit safer. Still use all with care.
-
-
AuthorPosts
- You must be logged in to reply to this topic.