This is the OLD page for this script. Please refer to: http://batleth.sapienti-sat.org/projects/mb2md/ for the latest operational version. Maintainance of mb2md and its descendents has very kindly been taken over by Juri Haberland. This is the sort of program you probably only want once in your life, unless you are a consultant, so I salute Juri for making his additions and offering to maintain it on an onging basis! |
Be sure to read right down to the bottom. The most capable version of this script is not my latest version, but the one developed from this by Juri Haberland. Juri's version copes with a problem which my versions do not handle: timezone data in the "From " line in the mbox mailboxes - and he has added other valuable features.
Robin Whittle Last update below here 13 September 2002 rw@firstpr.com.au
Back to the Web-mail directory - where you will find
more about Maildir is a faster and cleaner type of mailbox for an IMAP server.
There is another perl script to do Mbox to Maildir conversions, by Ragnar Kurm at: http://home.uninet.ee/~ragnar/2md/ .
For a discussion of other programs for converting between Mbox and Maildir formats, see this part of another page here: ../RH71-Postfix-Courier-Maildrop-IMAP/index.html#mb2md
and the User Contributed Maildir Support section of the qmail home page, which is on the mirror sites reachable at http://www.qmail.org .
(This is the basic version of the script. You may well find that
mb2md-2 is more suitable for your needs - so read this section and
then the mb2md-2 section below. mb2md-2 has better comments too.)
The script is fully documented by its own comments, so take a look:
mb2md.txtIt runs from command line arguments only, and is quite flexible. It is smart enough to not transfer a dummy message such as the UW IMAPD puts at the start of Mbox mailboxes - and you could add your own search terms into the script to make it ignore other forms of dummy first message.
You can also grab it gzipped: mb2md.gz .
mb2md in its current form only works within the user's directory, though I think with appropriate arguments prefixed with ../../ it could read and write files anywhere. I understand that there are "virtual user" situations in which the user's mail directory is not at /home/{uid}/Maildir/ but somewhere like: /var/qmail/maildirs/{uid}/Maildir/ . A small change to the script should adapt to this situation.
Please let me know if this script is useful and if you develop any modifications of it, or other scripts to drive it and so achieve things like:
File dates
The original mb2md script produces files for each message with the date of the files set to the time the script runs. In normal operation of a mail server, each message would have the date and time of when it was written to the maildir - such as when the message was received. Some clients, such as Microsoft's Outlook Express (which has long been so full of serious security holes that you would be mad to run it unless you like your computer being infected with viruses) apparently display the message date and/or sort on the date, using the file date of the message's file. It seems that the IMAP protocol can report this "physical" message date which for a Maildir system is the file date and for an Mbox system is the date in the "From " line which is added at the start of each message. These dates are typically the local date-time when the message was received. I guess that in Mbox and Maildir systems, moving the message to another Mbox or Maildir retains that date/time.
Netscape and Mozilla ignore this physical date. When sorting on date, they sort and display the time of each message according to the date-time in the "Date: " header which came with the message - when the message sent . (This is not always what you want, when someone sends from a computer with its date set 17 years into the future, as mine once was!)
After running the original version of mb2md, all the file dates-times are the time of conversion. This screws up people using clients such as Outlook Express. So it would be nice to have a modification of mb2md which made the date of each message file follow that in the "From " line of the message in the mbox. Simon Hampton sent me a patched version of mb2md which I thought did this - but in fact it works from the "Date: " header, which is the time the message was sent. Also, his patch does not correctly set the date of the last message file created. That patched version (which includes his email address) is here: mb2md-date-sh.pl.txt But see below for mb2md-2 which works properly and uses the "From " line.
On 30 August 2001, Michael Bartlett sent me a short perl script which
he created to convert an Mbox mailbox at /var/mail/xxx to a Maildir
mailbox at /var/mail.old/xxxx . See here for that script:
box2dir.pl.txt .
Here is the usage documentation from mb2md:
# Run this as the user of the mailboxes, not as root.
#
# mb2md MBROOT MBDIR [DEST]
#
#
# MBROOT Directory,
relative to the user's home directory,
#
which is where the the MBDIR directory is located.
#
#
# MBDIR Directory,
relative to MBROOT where the Mbox files
#
are. There are two special cases:
#
#
1 - "None"
#
#
2 - "Inbox"
#
#
If it is set to "None" then mailboxes in the MBROOT
#
directory will be converted and placed in the
#
DEST directory. (Typically the Inbox directory
#
which in this instance is also functioning as a
#
folder for other mailboxes.)
#
#
If this is set to "Inbox" then the source will
#
be the single mailbox at /var/spool/mail/blah for
#
user blah and the destination mailbox will be the
#
DEST mailbox itself.
#
#
Except in this "Inbox" case, the MBDIR directory
#
name will be encoded into the new mailboxes' names.
#
See the examples below.
#
#
This script will not work with mailbox files which
#
contain spaces in their names.
#
#
Expect trouble if an files in MBDIR directory
#
are not proper Mbox mailbox files.
#
#
This does not save an UW IMAP dummy message file
#
at the start of the Mbox file. Small changes
#
in the code could adapt it for looking for
#
other distinctive patterns of dummy messages too.
#
#
Don't let the source directory you give as MBDIR
#
contain any "."s in its name, unless you want to
#
create subfolders from the IMAP user's point of
#
view. See the example below.
#
#
# DEST
Directory relative to user's home directory where the
#
Maildir format directories will be created.
#
If not given, then the destination will be ~/Maildir .
#
Typically, this is what the IMAP server sees as the
#
Inbox and the folder for all user mailboxes.
#
#
#
# Example
# =======
#
# We have a bunch of directories of Mbox mailboxes located at
# /home/blah/oldmail/
#
# /home/blah/oldmail/fffff
# /home/blah/oldmail/ggggg
# /home/blah/oldmail/xxx/aaaa
# /home/blah/oldmail/xxx/bbbb
# /home/blah/oldmail/xxx/cccc
# /home/blah/oldmail/xxx/dddd
# /home/blah/oldmail/yyyy/huey
# /home/blah/oldmail/yyyy/duey
# /home/blah/oldmail/yyyy/louie
#
# With the UW IMAP server, fffff and ggggg would have appeared in
the root
# of this mail server, along with the Inbox. aaaa, bbbb etc,
would have
# appeared in a folder called xxx from that root, and xxx was just
a folder
# not a mailbox for storing messages.
#
# We also have the mailspool Inbox at:
#
# /var/spool/mail/blah
#
#
# To convert these, as user blah, we give the first command:
#
# mb2md xyz Inbox
#
# In this case, the first argument is irrelevant - "xyz" is ignored.
#
# The main Maildir directory will be created if it does not exist.
# (This is true of any argument options, not just MBDIR = "Inbox".)
#
# /home/blah/Maildir/
#
# It has the following subdirectories:
#
# /home/blah/Maildir/tmp/
# /home/blah/Maildir/new/
# /home/blah/Maildir/cur/
#
# Then /var/spool/blah file is read, split into individual files
and
# written into /home/blah/Maildir/new/ .
#
# Now we give the second command:
#
# mb2md oldmail None
#
# This reads the fffff and ggggg Mbox mailboxes and creates:
#
# /home/blah/Maildir/.fffff/
# /home/blah/Maildir/.ggggg/
#
# Now we give the third command:
#
# mb2md oldmail xxx
#
# Then all the mailboxes:
#
# /home/blah/oldmail/xxx/aaaa
# /home/blah/oldmail/xxx/bbbb
# /home/blah/oldmail/xxx/cccc
# /home/blah/oldmail/xxx/dddd
#
# are converted into Maildir format mailboxes in the following
# directories:
#
# /home/blah/Maildir/.xxx.aaaa/
# /home/blah/Maildir/.xxx.bbbb/
# /home/blah/Maildir/.xxx.cccc/
# /home/blah/Maildir/.xxx.aaaa/
#
# This suits Courier IMAP fine, and these will appear to the IMAP
# client as four mailboxes in the folder "xxx" within the Inbox
# folder.
#
# The final command:
#
# mb2md oldmail yyyy
#
# does the rest. The result, from the IMAP client's point of
view is:
#
# Inbox -----------------
# |
# | fffff -----------
# | ggggg -----------
# |
# - xxx
# | | aaaa
--------
# | | bbbb
--------
# | | cccc
--------
# | | dddd
--------
# |
# - yyyy
#
| huey -------
#
| duey -------
#
| louie ------
#
# Note that although ~/Maildir/.xxx/ and ~/Maildir/.yyyy may appear
# as folders to the IMAP client the above commands to not generate
# any Maildir folders of these names. These are simply elements
# of the names of other Maildir directories.
#
# With a separate run of this script, using the MBDIR = "None"
# approach, it would be possible to create mailboxes which
# appear at the same location as far as the IMAP client is
# concerned. By having Mbox mailboxes in some directory:
# ~/oldmail/nnn/ of the form:
#
# /home/blah/oldmail/nn/xxxx
# /home/blah/oldmail/nn/yyyyy
#
# then the command:
#
# mb2md oldmail/nn None
#
# will create two new Maildirs:
#
# /home/blah/Maildir/.xxx/
# /home/blah/Maildir/.yyyy/
#
# Then what used to be the xxx and yyyy folders now function as
# mailboxes too. Netscape 4.77 needed to be put to sleep and
given ECT
# to recognise this - deleting the contents of (Win2k example):
#
# C:\Program Files\Netscape\Users\uu\ImapMail\aaa.bbb.ccc\
#
# where "uu" is the user and "aaa.bbb.ccc" is the IMAP server
#
# I often find that deleting all this directory's contents, except
# "rules.dat", forces Netscape back to reality after its IMAP innards
# have become twisted. Then maybe use File > Subscribe -
but this
# seems incapable of subscribing to folders.
#
# For Outlook Express, select the mail server, then click the
# "IMAP Folders" button and use "Reset list". In the "All"
# window, select the mailboxes you want to see in normal
# usage.
#
#
# This script does not recurse subdirectories or delete old mailboxes.
#
# Be sure not to be accessing the Mbox mailboxes while running this
# script. It does not attempt to lock them. Likewise,
don't run two
# copies of this script either.
#
#
# Trickier usage . . .
# ====================
#
# If you have a bunch of mailboxes in a directory ~/oldmail/doors/
# and you want them to appear in folders such as:
#
# ~/Maildir/.music.bands.doors.Jim
# ~/Maildir/.music.bands.doors.John
#
# etc. so they appear in an IMAP folder:
#
# Inbox -----------------
# | music
#
| bands
#
| doors
#
| Jim
#
| John
#
| Robbie
#
| Ray
#
# Then you should rename the source directory to:
#
# ~/oldmail/music.bands.doors/
#
# then you can use:
#
# mb2md oldmail music.bands.doors
#
#------------------------------------------------------------------------------
Note, it took me hours to sort through the various versions of mb2md to find out firstly exactly which lines were changed, and secondly exactly what the changes were intended to achieve - quite apart from figuring out whether the changes did what was intended. The reason my program works and is usable is because I document things at length, even if imperfectly. This means it is easy for some poor sod (maybe you, but probably me in the near future) to figure out what I was trying to do, and therefore where my mistakes are.
Those who do quick and dirty changes at the usual low doumentation standard of too many computer programmers (as I am sometimes tempted to do as well) make it hard for anyone - themselves and me - to get reliable results from their work Simon Hampton did, however, document his changes with interspersed comments and with extra comments which helped me understand some code I copied. Mark Lai sent me a highlighted Word file to show me which lines were changed.
As noted above his patched version of mb2md reads the date in the "Date: " line which is the SMTP header created by the sending client program, with the date-time and timezone for when the message was sent. It then uses this to set the date-time of the message's file in the Maildir, using GNU touch, which is very flexible about the date formats it will handle. A bug in his patch of mb2md meant that the last message processed did not have its date set correctly.
This "Date: " date-time is not what I think we want. We want to use the time which is at the start of each message in the Mbox mailbox.
Here is an example of the headers and first few lines of a message from a UW-IMAP Mbox mailbox (I changed addresses to stop them being found by spammers):Simon's code used the date-time in red but I want to use the date-time in green - this "From " line is not part of the message itself, it is a line added by the IMAP server so it can find the start of each message in the Mbox mailbox, and so it can know the time it was received.
From dduck@test.org Wed Nov 24 11:05:35 1999
Return-Path: <dduck@test.org>
Received: from prussian-caravan.cloud9.net (prussian-caravan.cloud9.net [168.100.1.4])
by localhost.localdomain (8.9.3/8.9.3) with ESMTP id LAA03277
for <rw@firstpr.com.au>; Wed, 24 Nov 1999 11:05:33 +1100
Received: by prussian-caravan.cloud9.net (Postfix, from userid 54)
id CBEC0763A6; Tue, 23 Nov 1999 19:06:46 -0500 (EST)
To: rw@firstpr.com.au
From: dduck@test.org
Subject: Confirmation for subscribe postfix-announce
Reply-To: dduck@test.org
Message-Id: <19991124000646.CBEC0763A6@prussian-caravan.cloud9.net>
Date: Tue, 23 Nov 1999 19:06:46 -0500 (EST)
Sender: dduck@test.org
Status: RO
X-Status: A
X-Keywords:
--
Someone (possibly you) has requested that your email address be added
to or deleted from the mailing list . . . . .
Typically the date found in the "From " line is the date-time when the previous IMAP server (the one which generated the Mbox mailboxes we are converting) received the message. This may have nothing at all to do with the date the message was sent, according (as best the sending email client can know) to the "Date: " header which came as part of the message.
Based on Simon Hampton's patched version, they added a feature which they told me works with the clients "Microsoft outlook 98, express 5 & 2000", when migrating from UW IMAP to (I assume) Courier IMAP. The feature is to determine whether the message had been read by the user and to convey this to the message in the Maildir mailbox. The way it worked was to read the headers of the message looking for a line:
Status: ROand if so, add the following at the end of the name of the message in the Maildir:
:2,SThe ":2," means the start of IMAP server flags (at least for Courier) and the flag "S" means that it has been read or rather seen . But looking at the code, I am not convinced this patch works properly, since a flag for whether a message has been seen or not is set to 0 before the processing loop, may be set to 1 if a message has been "seen" (and should then cause the ":2,S" to be added to the filename) but there is no mechanism to set it back to 0. So I expect this patch would flag every message after the first "seen" one as being read.
Since I am not running UW IMAP at present and am not actively using Outlook or Outlook Express I do not have an easy way of testing this patch. However I looked at my old Netscape-UW-IMAP Mbox mailboxes and found they used the "Status: RO" line - as shown in the message fragment above.
What is the "O" in the above Status line?
David's patched version of mb2md uses the HTTP::Date module by Gisle Aas: http://search.cpan.org/doc/GAAS/libwww-perl-5.64/lib/HTTP/Date.pm (Search http://search.cpan.org/ for the module "HTTP::Date") to read the "Date: " header in the message to set both the filename and date of the message in the Maildir. This, I think, reads the SMTP "Date: " header, taking into account its timezone offsets in the wide variety of formats the HTTP::Date module supports. This is the date the message was sent (as much as the sending client program knew - it could have all sorts of inaccuracies about the time and its timezone).
It is my understanding that to be compatible with the way Courier IMAP and presumably others work, that the file date-time should be the date-time when the message was received. So this achieves the same functionality as Simon Hampton's patch and in my view this is not the way to proceed.
Michael Best sent me a version of mb2md which integrated the guts of a separate script perfect_maildir which was posted to the Mutt mailing list (Mutt is a Unix mail client - http://www.mutt.org ) on 25 December 2001: http://www.mail-archive.com/mutt-users@mutt.org/msg21872.html . I haven't run it and I don't have a proper copy, due to his email client wrapping lines. If you want a copy, please contact him at <mbest (get rid of this at) emergence.com > . Note that Michael Best's patch of mb2md, like the Philip Mak original code, put the messages in the /cur/ subdirectory of the Maildir, rather than the /new/ directory, which is where I think new additions to a Maildir should go.
The first new functionality was to build the text of the Subject: line into the message file name - which is intriguing, but not something I would do for practical purposes, since Courier IMAP doesn't use this or create messages in this form and since I generally do not manually look in the Maildir directories.
The second part of the new functionality was extensive and cleanly done (though I haven't tested it). It looks for the following header lines and to create flags on the end of the resulting message file name to convey the same meaning. Presumably Mutt in combination with UW-IMAP would use some or all of these flags in the left column - and presumably Courier IMAP uses those in the second column in a way Mutt understands.
Flag found in header
lines of the form:
- Status: N
- X-Status: N
Converted to flag at end
of message filename, of
the form:
:2,N
Meaning and notes
F
F
"Flagged".
A
R
"Replied". Netscape - UW-IMAP uses this too
R
S
"Read" = "Seen".
D
T
"Deleted" = "Trashed"(?) = "Tagged for deletion at the next IMAP Expunge".
On 10 March 2002 I wanted to create a new version of mb2md which does the file date properly, based on the date-time in the "From " line, to indicate when the message was received . I also want to add functionality to retain other information during the translation - whether it has been:
- "Read" or "seen".
- Replied to.
- Flagged. (A user facility for any use they choose, I think.)
- Tagged for Deletion. Though I would imagine that people would generally physically delete these message with IMAP Expunge before converting the Mbox mailboxes to Maildirs.
Note that there are evidently two ways in an Mbox mailbox in which a message could be indicated as having been "read", or "seen":
- There is a new line in the headers:
Status: RO
- In either of the two lines, with potentially other characters present as well:
Status: R
X-Status: R
Looking at backup files of my old Netscape - UW-IMAP Mbox mailboxes I find that that system used the Status: RO approach, not the Status: R approach. There were "X-Status: " lines, often empty. The only ones I found were for X-Status: A for those which I had replied to. I do not feel like firing up UW-IMAP to test how it generates the others.
While I understand the Michael Best / Philip Mak code was written for and tested with the Mutt client, I think that it should work fine with UW-IMAP Mbox mailboxes when Netscape and probably other programs has been used as the client.
Here is the new script, as of 12 March 2002, as text and as a gzip file. The usage is the same as the original version.
Please see below for patches to this version which may well be superior to my version. Although I have not tested Juri Haberland's version, I am greatly impressed by what he has done.
The new features are:
- File date-time is now based on the "From " line in the Mbox mailbox.
- Integrate the flag changes as written by Philip Mak and Michael Best.
- Tidy the file up in terms of comments.
- I also changed the filename of each message to be of a regular length:
7654321.000123.mbox:2,xxx
Where "7654321" is the Unix time in seconds when the script was run and "000123" is the six zeroes padded message number as messages are converted from the Mbox file. "xxx" represents zero or more of the flags F, R, S or T.
- Introduce version numbers - 2.01 etc.
Text: mb2md-2-01.pl.txt
Gzipped: mb2md-2-01.perl.gz
Note that the new version requires a copy of GNU "touch" on the system - or some other version of touch which can handle the date-time formats as found in the "From " line, such as: "Wed Nov 24 11:05:35 1999". The script expects this to be at /bin/touch so you should alter one line in it if your version of touch is somewhere else.
Patched versions from other people
- http://people.spoiled.org/jha/mb2md.html Juri Haberland has extensively modified my version 2.01 to:
- Fix a bug with zero-sized mailboxes.
- Access source Mbox mailboxes in directories other than the home directory - for instance somewhere else on a Samba or NFS mount - by using a / at the start of the MBROOT parameter (now the "-s" parameter in Juri's version.
- Add an option to strip the filename extension of an existing Mbox. For instance, to strip the ".mbx" off "main.mbx " - and so avoid "mbx" becoming part of the resulting Maildir mailbox name.
- Made the parameters to the script follow "-m" and "-s" tags etc. and provide a usage response if run without parameters.
- Cope with timezones in the "From " line, which were causing trouble for other people. (In my test files, there were no such timezone fields.)
Juri also observes that my original script does handle spaces in mailbox file names, provided the name is given in quotes. I haven't tested any of this, but it looks like he knows what he is doing!
Also of potential interest is his FAQ on the ext3 journaled file system and how it closely relates to ext2. http://batleth.sapienti-sat.org/projects/FAQs/ext3-faq.html Salute!
I will continue to maintain this script, although I haven't got any practical use for it since I changed to Courier IMAP in July 2001.
Please let me know if this script works OK or otherwise, and send me any scripts you develop to drive this one to do mass migrations of many users and their multiple mailboxes.
I don't promise to add any new features - but will accept suggestions and patches from others if they look useful.
If you send me a new version of this script, I insist you:
- Send it as a .gzip file, or via reference to a web site, or as an attachment - since text can be wrapped in emails.
- Clearly state what new functionality you have added what sort of situation this is intended to help with.
- Clearly show with easily searched for comments (such as your initials at the start and end of the modified sections) in your file which lines you have deleted, changed or added.
- Clearly and fully comment your code with English sentences, with proper capitalisation and punctuation. This is the only way I can write code which works - or at least it is vastly easier than trying to do with without proper comments - so be clear and make it easy on me and yourself.
Update history.