is now part of CreativePro.com!

Auto Format Superscript and Subscript Numbers using GREP Styles

27

How should you format H20 or MC2? Generally, you’d want to subscript the 2 in the first instance and superscript it in the second. There are a number of ways to do this, but I’m working on a manuscript right now that has a lot of superscripted numbers such as “109” (except that 9 is supposed to pop up a bit, to mean “ten to the ninth” or “one billion” in the modern, long form nomenclature).

I really don’t want to manually format all those numbers, and fortunately I don’t have to. I’m using GREP styles instead. In my case, the manuscript uses the typical ascii format for exponential numbers, such as 10^9. The caret means make the following number a superscript.

You can immediately see the pattern, right? “Look for one or more digits after a caret and make them superscript.” As soon as you have a pattern, you can grep-it — create a GREP style that applies formatting to it.

In order to do this for subscript character, we need to come up with a standard to tell our authors. In my case, I’m saying, “use sub2” to mean make this number a subscript. I honestly don’t know if there’s a standard way to type this. Let me know if there is. But “sub” works well. So “Hsub2O” means make the 2 subscript.

But of course, I also don’t want that caret or the word “sub” to appear in the final document! No problem: Just make a character style that makes the text disappear. To do that, make a character style that sets the fill and stroke to None, the size .1 pt, and the horizontal and vertical scale to 1%. That’s “disappeared” enough!

Here, then is the set of grep styles I’ve created:

grepSuper.png

In English, that means:

  1. Make any caret that falls after a number disappear.
  2. Make one or more digits that fall after a caret superscript.
  3. Make any instance of the text “sub” that immediately preceeds a number disappear.
  4. Subscript one or more numbers that comes immediately after the text “sub”

Nothing magic about it; it’s just seeing the pattern then using the positive lookbehind and positive lookahead features in grep.

The result? Here’s the same text before and after the paragraph style with this grep style in it is applied:

grepSuper2.png

In case you want to try it yourself without all that grep typing, here’s an InDesign snippet you can download and file > place in your InDesign document.

[update, feb 12] Here’s another snippet that you might find useful. It uses some of the same techniques in this blog post, but it has different “smarts” about whether a number should be superscript or subscript. For example, it knows that a “2” after “MC” should be superscript, but a “2” after just “C” should be subscript.

David Blatner is the co-founder of the Creative Publishing Network, InDesign Magazine, CreativePro Magazine, and the author or co-author of 15 books, including Real World InDesign. His InDesign videos at LinkedIn Learning (Lynda.com) are among the most watched InDesign training in the world.
You can find more about David at 63p.com

Follow on LinkedIn here
  • Eugene Tyson says:

    Hm that’s really clever. It does mean having to type in “sub” before each number though? Which could be type consuming in itself. And a slip of the finger could mean typing “subb” or “bus” or “usb” and that could end up in the text – no?

    Don’t want to step on a grep gurus toes or anything – but just wondering if this would work equally well?

    For: C8H10N4O2 = 4.5 x 10^9 = 4.5 billion bits of joy

    (?<=[[:alpha:]])[[:digit:]]+
    Seems to find numbers in between a string of letters

    and

    (?<=\^)[[:digit:]]
    And this seems to find the "^" before a digit

    As this is a GREP style it will be part of a paragraph style – so you'd only apply that paragraph style to the text with formulas.

  • Eugene Tyson says:

    PS – I love the disappearing text setting, very very clever.

  • Jongware says:

    The TeX way of denoting subscripts is ‘_’: CO_2, H_2O.

  • Moiz says:

    I was wondering what “C8H10N4O2” stands for and found this: C8H10N4O2.

  • Great stuff David!

    Thanks!

  • @Eugene: You are absolutely right that you have to type the pattern (whatever you choose) the same every time. I like the TeX way of using an underscore (thanks, Theun); that would be simpler.

    It is true that you could generalize with your code (basically “any digits after any letter”) but I get nervous with that level of generalization. If someone starts typing about P2P businesses or “see you l8r!” it would break. It all depends on the project, though. :)

    @Moiz: Yes, I live near Seattle, where C8H10N4O2 is an important ingredient in our lifestyle.

  • Eugene Tyson says:

    Ah yes I already new there would be a caveat in my GREP – as always :)

    I can see where this might be a problem in Find/Replace – but seen as this is applied to a paragraph, well I supposed it could be within a paragraph of text too. So there would be some issues in applying a grep style to an entire paragraph of text using my method.

    I suppose you could put catches in the GREP to not pick up on things like P2P or B2B and LOL and l8r et al. But I guess it might be easier to use a prefix before the numbers to ensure that you don’t pick up on extra things that don’t need the formatting.

    I suppose we may have to wait for GREP Character Styles, where you could serach for strings like that and apply them at a character level rather than a paragraph level?

  • F vd Geest says:

    I always use:

    (?<=CO)2

    and

    (?<=H)2(?=O)

  • Jongware says:

    F., that’s only good if you have just those two :) The “general” case would be something akin to

    (?<=\u)\d+(?=\u|\b)

    but this can only apply either subscript or superscript.
    David’s will work for either (depending on the _ or ^), expect … his first line specifies 20 atoms of H instead of the intended water.

  • F vd Geest says:

    Hi Jongware,

    yes I know it is specific. But combined with the third GREP for acronyms (see below) that’s all I need in 90% of the cases in official government documents. The acronym GREP leaves CO2 alone so it works nice. But I did understand that the GREP here was far more ‘general’, no worry ;-)

    Acronym that leaves CO2 alone:

    (?<=\W)\u\u+(?=\W)

    You see the 'theme here - positive/negative look behind/before, love them ;-)

  • Hello,
    For the fun, I have tried to find a style grep to format (correctly) all chimical elements. I have use the periodic table find on Wikipedia.
    I have 4 regex. The first one is for all elements, except 6 elements (Uut, uuq, etc.) included in the second regex. The third is for element with just one capital letter. These regex applied a character style Subscript. The last one don’t format digit if element is wrong (character style Regular). Perhaps someone will find problems. Jongware ? I would like to know. Thx

    1) (?<=Yb|A[cgmulst]|B[aehkir]|C[flademnorsu]|D[bsy]|E[urs]|F[erm]|G[dae]|H[fgso]|I[rn]|L[ai]|M[gnotd]|N[abdipo]|Os|P[admrtubo]|R[abefghu]|S[cgmrbein]|T[abchielm]|Z[nr])\\d+

    2)(?<=Uu[hpqst])\\d+

    3)(?<=[ABCFHIKNOPSUVWY])\\d+

    4)(?<=([ABCFHIKNOPSUVWY]))\\d+(?=\\1\\b)

  • Jongware says:

    Nice, Laurent!
    I suppose you didn’t you know you can combine lookbehinds of different lengths by ORing them in sequence:

    ((?<=[ABCFHI])|(?<=Uu[hpqst]))\d+

    (although this does lead to a GREP of semi-monstrous length — over 200 characters in length…)

    And where are Xenon, Neon, Argon, Krypton, Radon, and (less important) Ununoctium?

  • :-) Thanks Jongware.
    This is a regex with others elements (Argon, Helium, Xenon, Neon, Krypton, Radon, Lutetium, Lawrencium and Ununoctium)
    1) (?<=Yb|A[cgmulrst]|B[aehkir]|C[flademnorsu]|
    D[bsy]|E[urs]|F[erm]|G[dae]|H[efgso]|I[rn]|Kr|
    L[airu]|M[gnotd]|N[abdeipo]|Os|P[admrtubo]|
    R[abefghnu]|S[cgmrbein]|T[abchielm]|Z[nr])\d+

    2) (?<=Uu[hopqst])\d+
    I keep this second regex like that, I think it is more explicite than alternative in this case !

  • loic says:

    What about the ability to place comments and spaces such as Martinho told us recently ? I could help identifying which elements are concerned ?
    Loic

  • Jongware says:

    Loic: https://www.kahrel.plus.com/indesign/grep_editor.html shows how to enter comments. Peter doesn’t show the very first command (?x) (probably because his little editor inserts it by itself).

    Note that he suggests \x{0020} to insert a “searchable” space. However, this\ is\ sufficient ;-)

  • Bonjour Loïc
    You can insert comments with (?#) (cf. p. 83 ;-)) but I don’t think it will efficient in our example :
    (?# Actinium, Silver, Americium, Gold,
    Aluminium, Arsenic, Argon, Astattin)A[cgmulrst]|

  • Sandee Cohen says:

    David,

    I’m confused. Why do you want to keep the “sub” in the text. Isn’t the subscript applied to the digit enough of a marker for future work?

  • @Sandee: I’m not sure what you mean. The subscript isn’t actually applied… it’s just a grep style. However, if you were doing this with find/change, then yes, you could remove the “sub” and just apply the character style, I suppose!

  • Sandee Cohen says:

    D’uh-uh! I thought this was for a Find/Change. That’s why I didn’t understand why you want to hide the “sub”. Now it all makes sense.

    That’s actually a very cool tip on its own. GREP styles will Find and Change but not Find and Delete.

    Your trick accomplishes that.

  • Terry Clifford says:

    Dave, I am not familuar with using GREP styles etc. and have a question. I copied the formula below from an article you had posted in 2009 that I just read.

    (?

    My question is, where exactly do I put this formula to get the desired outcome. The article mentioned using it in the Find/Change box, however, where does the formula go? In the Find What field or the Change to Field? This formula would come in very handy for my line of work, however I just don’t know how to use it. You also talk about putting it into a style, but I am not familiar with GREP styles. I use paragraph and character styles all the time, I just haven’t ever used a GREP style.

    Terry

  • Terry Clifford says:

    (?)

    The above formula did not post in my previous comment, although I am not sure why.

  • Jongware says:

    Terry: the forum software objects against < and (sometimes) > characters; you’ll have to enter them as &lt;

    As for your question, regardless of what the expression actually is, the only thing GREP does is searching. So you’d always have to enter it into the Find What field.
    The Change To field is related to, but quite different from, GREP searches — i.e., you can search for an “any character” but it has no meaning to replace something with ‘any character’.

  • Peter Kahrel says:

    > However, this\ is\ sufficient

    Good one! Thanks.

  • Igor Freiberger says:

    I’m developing a font with many OT features. One of its stylistic set is planned to do these automatic formating to scientific notation (Mathematical, Chemical, Phisics, etc.) exactly as these discussed here.

    I still do not developed the whole OT code, but your tips about RegEx solutions are pretty useful. Hope I could achieve a complete set of OT substitutions to make this really automatic and useful for scientific texts.

    Other features already have OT solutions triggered through stylistic sets:

    1. Roman numbers: you type any number up to 9999 and get the proper Roman numeral.

    2. Language-based auto substitutions. For example: |SS| in German to be replaced by Ezsett; |5o| or |3a| to become 5º and 3ª in Portuguese and Spanish; and so on.

    3. Alternative glyphs to reduce serif conflicts. Pairs like |vu| or |gu| get slight minor serifs where the glyphs tends to touch each other, inreasing legibility.

    4. Chess notation to become automatic (substitution of letters and signs to international chess figurines).

    The work is still running but the font will probably use all 20 stylistic sets available to OT nowadays. I must thank for all your RegEx inputs as they would help this project.

    (BTW, the font is a somewhat transitional serif family with 14 weights and support for all Latin-based languages plus IPA. Estimated release time: April 2011).

  • Alfred Mosskin says:

    I am a big fan of GREP-styles and love this type of logical puzzling. These types of discussions are also excellent for learning new tricks!

    But I would like to attack the problem from a different angle. As you say you have a long manuscript with text snippet like 109 or H20 or MC2. This doesn’t look right in Indesign. But the problem is that this doesn’t look right where the text was originally written either.

    The text author, copywriter or some other text generator will in 99% of all cases use Microsoft Word. (Or another software that can do rtf.) If I would ask the author to write ^ or sub to indicate superscript/subscript, this would get it to work with GREP-styles in Indesign. But it would still look funny in the text editor. Instead I would suggest to the author to use Character styles. Already in Microsoft Word. One for superscript and one for subscript. They would of course also change the appearance of the text in Word to super-/subscript. If they are uncomfortable with editing styles in Word I could easily make a dot/doc/rtf-template with the styles included and send it over. This would take less than 60 seconds.

    The style mapping between Word and Indesign is really an excellent feature (crucial for many clients I meet). When importing or linking the text to Indesign the Character Styles would immediately be in place and I can format them in Indesign any way I like.

    There are (at least) two major benefits of using Character styles when composing the text in Word:
    1. The author can decide for herself/himself where it should be superscript/subscript and where not.
    2. It looks right, when writing the text! So there is much less chance of a mistake. Which leads to less reviewing.

  • ThompsonText says:

    Just to clarify Igor’s point 2 above for any non-German-speakers,
    |SS| in German to be replaced by Ezsett
    Not every instance of ss or SS in German should necessarily be replaced by ß automatically.

  • >
    Notice: We use cookies on our websites to give you a great online experience. If you keep browsing, we'll assume you're ok with this. For more information, see our privacy policy. By closing this banner, you agree to the use of cookies.I AGREENo