Build a biblical reference index with GREP
November 26, 2009 at 1:38 am #53998
I don't know how many of you have worked in the publishing of academic (or non-academic) biblical literature. If you have, then you have had to work with biblical references a plenty. Then, if you have been tasked with building not only a subject index and perhaps a name index, but (cringe and ) also a biblical reference index. Normally, this might be built by the author or an indexer, but if your author can't and you can't hire a good indexer, and there's no intern to abuse, then you've got to do it. When I started learning about GREP, I jumped for joy, because I saw the potentials. All biblical references are patters. However, they can be irregular in nature. For example, you can have:
Romans 3:15 or Ro. 3:15 or Rom. 3:15 or Ro. 3.15 or Rom. 3.1-5, 6,7, 18, etc. you get the point. Pair that with any combination of references separated by a semicolon and life just got more complicated.
Without GREP, there are ways to tackle this. You can go through and apply a character style to every single reference individually, and then use a script like indexmatic (find it here: http://indesignsecrets.com/free) to populate an index. But that is a pain. You could also use the index feature of InDesign, but you have to use a workaround. You probably have to use a workaround anyway if you are already building more than one index (subject and author, for example). But there is a faster way with GREP. After hours of painful trial and error (thank you RegExr for providing an absolutely beautiful envornment in which to write and dynamically test my expressions!!!! Check out http://www.gskinner.com/RegExr), here is the expression which I have written:
It breaks down into the following parts:
(d+s)? (w+?.?sd+[[:punct:]]d+) (.d+)? ([,sd]+(.d+)?)?
Part 1 accounts for books that begin with numbers (e.g. 2 Timothy), but not all will.
Part 2 is the main expression to account for the name of the book, a period (if there is one… some books are spelled out completely and others are too short to merit an abbreviation, like Job), a space, the chapter number, followed by puntuation (either a period or a colon), then the verse number.
Part 3 takes care of a range of verses, so not Rom. 2:3 (covered in parts 1 and 2), but Rom. 2:3-8.
Part 4 is written as flexibly as possible to account for anything that comes after the pattern covered by parts 1-3. For example, you could have a verse or range of verses followed by a comma, a space (or maybe no space) and and another verse or another range of verses or even another chapter and verses, etc. For example: Rom. 2:3-8, 15 or Rom. 2:3-8, 3:6.
With this, then I do run a Find/Change operation and apply a character style with no formatting (e.g. “bib style”) to all of my biblical references. I just get to do it to all of them at the same time, as opposed to hunting them down one by one. I haven't tried, but I suspect that I could simply create a GREP style in the paragraph style and automatically apply this to everything. Then with a great script called indexmatic, I pull all those references into one place and begin laying out the index.
My guess is that maybe someone could hone this expression or has even come up with a better way of making an index. This is the way I do it on a regular basis. I at least hope that someone can benefit from it, or at least modify the expression to fit their particular need. Thanks for letting me share.
November 26, 2009 at 7:35 am #54005
This is a very good regex, and I am sure I will need it one day (but for french biblical references, sometimes with roman numbers, I Cor., XV, 14-17 ; II Rois, XVIII, 2 [second roman numerals in small caps]).
Best and thanks
November 26, 2009 at 1:53 pm #54010
I feel silly. I didn't even take Roman numerals into account. It shouldn't be too hard to add. And in English, if I were working with older texts, it would be an issue. And thank you for the Chain_GREP_queries script. I was unaware of it, but I am going to try it out the next time. As we say here in Costa Rica, pura vida.
November 26, 2009 at 2:10 pm #54011
will match “, 12.3″ but stop after the first comma (and the number after that). “?” is zero or one time.
to accommodate for a verse list of any length; “*” stands for zero or more times.
November 26, 2009 at 4:46 pm #54018
Jongware, thank you. That worked great.
You must be logged in to reply to this topic.