January 28 2010 • 3:14 AM

Massaging Text with InDesign (Getting the Text You Want from the Text You Have)

Sometimes the text I need is hidden inside a bunch of other text. For example, here’s a web page that lists, among many other things, a bunch of emails I need:

The blue column in the middle is the list of emails. I’m sure there are web utilities that let you grab just one column, and there’s that cool trick of Option/Alt-dragging in MS Word to get a column… but I want to show you how you can use InDesign to pull text out of a bunch of data like this.

I copy and pasted the whole page into InDesign, and you can see that it’s a mess:

But I also notice patterns! In this case, we need to remove all the lines that have just the user name followed by a bunch of tabs. So I whip up this quick grep:

That looks for a word followed by four tabs, followed by a return, followed by the same (duplicate) word. It then replaces it with the word itself. Obviously, your grep would be different, depending on what kind of pattern you find.

I click Change All, and end up with a nice clean list, with one email on each line:

In order to grab just the emails addresses, I need to put them in a single column. So I select all the text and choose Table > Convert Text to Table:

The result is a nice, orderly table:

Now it’s easy to select all the columns I don’t need and delete them, leaving me with a single column table. Choose Table > Convert Table to Text and I end up with what I wanted all along:

Sounds like a lot of work, but the point of this post is: InDesign can do all kinds of text massage/processing that many people leave to text editors or word processors. InDesign is where I’m fastest and most comfortable, so here’s where I do it!

8 Responses discussing this post. Add yours below.

  1. January 28th, 2010 • 3:39 am • Link

    Why the emails are blurred in the Web screenshot if they are then visible in the text frame ? ;-)

  2. January 28th, 2010 • 6:37 am • Link

    Good catch, Branislav. It is because I could not edit the text on the web page, but I did change all the personal information in the other screen shots to protect people.

  3. January 28th, 2010 • 8:40 am • Link

    Love it. I do this sort of thing all the time with html code to help build large tables of data of even building xml files for indesign or online purposes. InDesign + GREP = <3.

  4. January 30th, 2010 • 3:36 am • Link

    Hello,

    Can I propose another method ? Using Peter Kahrel’s script chain_grep_queries, it is easy to select text you want. Has you know, with Test mode “the script collects all instances matched by the Find What expressions in all selected queries and lists these matches in a new document.” In your example, you had just to write a regex to find e-mails.

  5. January 30th, 2010 • 5:55 am • Link

    In my opinion, a very interesting plugin you’ve described here

  6. February 1st, 2010 • 1:07 pm • Link

    I agree fully: InDesign is marvellous for many things. Please allow me a little joke: I like InDesign also because its frames have comfortable handles! ;)

    Broaden the frame of the 2nd picture until every paragraph fits nicely into one line.
    Double-click into the frame (Cursor becomes active).

    Search Text for ^t^t^t^t^p
    Change Text to ^t
    Edit > Select all.
    Table > Change Text to Table
    Select the rows you don’t need and delete them.

    Because: At least this text was not such a mess – it has a nice pattern just from the beginning!

  7. Anton
    February 5th, 2010 • 2:57 am • Link

    Hi, guys :)

    A little bit off-topic, but since we have once again come to the mighty powers of The GREP :) , what GREP-related (ID-specific) learning resources would you recommend for a novice?

  8. February 5th, 2010 • 5:57 am • Link

    @Anton: Check out the resources here on our grep page:
    http://www.indesignsecrets.com/grep

Subscribe to the Discussion

Get the ongoing discussion surrounding "Massaging Text with InDesign (Getting the Text You Want from the Text You Have)" delivered to you. Click here to subscribe via RSS.

Leave a Reply

You can use limited HTML tags, such as <em></em> for emphasis/italics and <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong> .

InDesignSecrets reserves the right to edit and/or remove posts and comments.