I keep my email in Maildir folders. It works well on the whole for every-day email, but it doesn’t work so well for large email archives (mainly because Unix systems don’t tend to cope well with folders containing a very large number of files). My system of archiving had been to simply copy messages older than a given number of days to a different Maildir folder that I use for my archives.
The problem was mainly backups. The backup tool I use (Tarsnap – which is brilliant by the way!) was taking ages to crawl over the archive folders. In addition, the folders were taking up a lot of space on disk and compressing many small files isn’t easy without making a tar file, or similar.
So I decided the best plan was to archive the messages to Mbox files. They’d compress well (in the end I just used a compressed ZFS filesystem), be backup friendly (because they’d rarely, if ever, change), and be quick to read from disk (it’s easier to read a large file than many little ones).
It can’t be hard, right? Isn’t an Mbox file approximately this?
cat Maildir/cur/* > mboxfile
Well, it turned out to be more effort than that. First you need to create the
"From " separator line, which requires the sender and delivery date. These can be found by parsing the headers, but it’s surprising how many broken emails there were in my archives.
Next you need to decide what Mbox format to use. I thought there was only one! You can either escape
"From " lines in the body, or you can add a
Content-Length header, or do both.
After far more effort than I originally intended I came up with Maildirarc. It’s an extended version of my original shell script that just copied messages from one Maildir folder to another. I wrote it in Perl and decided to have a play with Git and Github for version control. You can see the results here:
The end result turned out to be slightly more than just an archival tool. It can also be used to do Maildir to Mbox conversions, which might be useful to other people.
If you decide to give it a go please feel free to let me know how you get on by posting a comment below. If you have any ideas or changes you can fork it on Github and send me a pull request.
Yes, that’s the intended behaviour. To add a message to an mbox file it requires the sender’s email address. That error is saying it can’t find one because the message is oddly formatted so it can’t put it in the mbox file.
If it’s happening a lot there might be something odd going on with Maildirarc. But if it’s just a one-off then it could be a spam message that’s messed up – I had a few of them when I did my test runs.
I was trying to use your script to make maildir tombox conversion and I found that when you get the message
Can’t find or parse Return-path, From, Sender or Reply-To headers
it can happen that mails may not be properly copied to the mbox file Have you noticed that?