The Great Mail Merge of 2008

So, like a lot of computer people, I have the odd clepto-esque habit of saving all of my email.  Now, this wouldn’t be anything newsworthy if I had done a decent job of it, and just kept some nice little archive folder somewhere, or fed it all into GMail and had done.

Unfortunately, what I actually kept over the years is a mess of “I’m about to reformat this machine, copy all the mail off and I’ll deal with it later” backups.  In fact, I have no less than 123 mbox files from past Thunderbird installs, 4 more mboxes from an Evolution backup, 4 Outlooks PSTs, and for good measure two Outlook Express profile folders and a maildir from… well, I actually have no idea where that’s from… maybe KMail once upon a time?

So, upwards of 132 independent message sources.  Nice work, Colin.

First off, some interesting stats about this pile of mail:

Earliest Date
March 15, 2002
Latest Date
June 21, 2007
Total Emails Archived
15493
Number of Duplicate Copies
12567
Percent of Messages With ≥1 Duplicate
27.87%
Average Number of Duplicates (of those with ≥1)
2.910
Maximum Number of Duplicates
14

And for posterity’s sake (aka, the next time I have to do this…) here’s some tips on how to clean up the mess:

  • Use Thunderbird + the Remove Duplicates (Alternate) Plugin
    I really can’t say enough about the “Remove Duplicate Messages (Alternate)” plugin.  I highly recommend it over the non-Alternate version.  Here’s the basic idea.  Install the plugin.  Right-click a Thunderbird folder and select “Set Original message folder(s) for next duplicate search.”  Then, right-click some other folder and select “Remove Duplicates…”.  Up pops a window (after a few brief seconds of churn) with a list showing all duplicate (or triplicate or more) messages, side by side to make it abundantly clear that they are true duplicates.  Hit [OK] and they’re gone.  Perfect.  Clean, simple, and effective.
  • How to Import mail from Outlook PSTs
    The one key point to make here is that the only program I trust to read Outlook’s PST format is Outlook. I’ve seen a few open source / third party tools, such as LibPST, but mostly they’re shareware “recovery” apps, and they just scare me :).  Besides, if you have Outlook to make the PST, just use it to read it.  Or ask a friend.  Whatever.
    The magic to getting your messages out of Outlook is: Thunderbird! Just install on the same machine as Outlook, have Outlook running with your PST opened (File->Open->Outlook data file…), and use Thunderbird’s Tools->Import… feature to suck in all the messages from Outlook.  Remove those you weren’t interested in and you’re done.  The rest are now present in Thunderbird.
  • How to Import mail from Maildirs
    The magic here is a neat little shell script by Joerg Reinhardt, which I found on linuxquestions.org.  Drill is, run it like:sh md2mb.sh <maildir>and you’ll get an mbox out named maildir.mbox
  • How to Import mail from Outlook Express
    Yeah, I know.  Outlook Express is old, not geeky, etc.. but back in the day (these messages are dated from 2002) I was young and naive, so here we are.  How to deal?  Well, the simplest way I found is just to copy my dbx files back over top a blank identity in Outlook Express on an XP box.  Use a VM or an old machine, either way.  Then install Thunderbird alongside, and import just as to extract messages from PSTs.  Notes: I was not able to get readdbx from libdbx working, nor was I able to open the dbx’s in Outlook 2003 by tring to import them using the Import/Export tools.  Sad face.

And there you have it: how to build your very own email archive Frankenstein, bootstrapped up from over a hundred pieces and jolted into life with a dash of Thunderbird.  (And yes, Jason, I know you could write me a VBA app in 5 minutes to do this whole mess in Outlook… but you’re not here :-P)

2 thoughts on “The Great Mail Merge of 2008

  1. Jon E

    Ok first why would anyone ever use outlook for e-mail? It’s soo bad (the calendar is nice for work). Back in the day Outlook Express didn’t suck that much. At one point it was better than the Netscape alternative.

    Also what made you decide to merge e-mails from 2002! I just have them sensibly floating around my harddrive somewhere taking up space. I used to just move my preferences folder every time I updated my computer up until I started using the WI imap server for everything and my old pop ISP e-mail account died.

    Now I need to go sort through the billion “Application Data” and “Desktop” backup folders all over my various harddrives.

    Reply
  2. Colin M Post author

    Well, Outlook is really not that bad for email… my client of choice at the moment is GMail/Google Apps for My Domain, but Outlook pretty much did all the same stuff. The only scary part is your PST, which one small corruption can destroy…

    The motivation for merging all this stuff back to 2002 is to not have all those emails floating around my various harddrives and backup locations. This way they’re all in one place, and (now) all uploaded to GMail. Next step is to find a nice incremental IMAP Archiver utility, and presto: all my email appropriately backed up in one spot, for good. Plus, I get to delete the 100+ mboxes littering my drive 😀

    Have fun with your AppData folders… I’m all done now! 😛

    Reply

Leave a Reply

Your email address will not be published. Required fields are marked *