Maildir
(fix nested lists) |
(Maildir naming updated, new pros and cons) |
||
Line 17: | Line 17: | ||
− | == maildir == |
+ | == Maildir == |
Where each folder is a directory, and each message is a file in that directory structure |
Where each folder is a directory, and each message is a file in that directory structure |
||
Line 36: | Line 36: | ||
== So... == |
== So... == |
||
− | How to fix maildir's woes? |
+ | How to fix Maildir's woes? |
=== Information in Filename === |
=== Information in Filename === |
||
The standard says that the name shoulnd't be of care to the MUA, so let's see if we can make it human useful afterall. How about: |
The standard says that the name shoulnd't be of care to the MUA, so let's see if we can make it human useful afterall. How about: |
||
* YYYYMMDDHHMMSS.SubjectLineUpTo40CharsLong.md5:2, |
* YYYYMMDDHHMMSS.SubjectLineUpTo40CharsLong.md5:2, |
||
This provides human readable date, and messages are then likely sortable by date from commandline easily. Date plus "." = 15 chars. Then let's have up to 40characters of subject line - this should allow for reasonable hinting as to message contents from the filename. That's 55 characters used so far. The first 4 characters of an md5sum of the message itself should be a final uniqueness check (if it is identical to all the others then we have a problem! That brings us to 60 characters (the seperating period making the shortfall for those keeping count). And at the end let's say that status chars (including ":2,") could grow to 10 chars. That's 70 chars which means a basic 'ls' stays comfortably inside a standard 80column terminal. |
This provides human readable date, and messages are then likely sortable by date from commandline easily. Date plus "." = 15 chars. Then let's have up to 40characters of subject line - this should allow for reasonable hinting as to message contents from the filename. That's 55 characters used so far. The first 4 characters of an md5sum of the message itself should be a final uniqueness check (if it is identical to all the others then we have a problem! That brings us to 60 characters (the seperating period making the shortfall for those keeping count). And at the end let's say that status chars (including ":2,") could grow to 10 chars. That's 70 chars which means a basic 'ls' stays comfortably inside a standard 80column terminal. |
||
− | :* What is the date based on? Envelope date or message date? In mbox, envelope date is encoded in the From_ line. In maildir it is the datestamp of the file? Or is it forgotten? |
+ | |
− | :* We lose the hostname of the machine that delivered the message (maildir used to put this in). Is this exactly relevant anyway since it can end up almost anything (eg: converting a mbox to maildir with mutt will set the host to whatever the machine name that mutt is running on! BFW!) |
+ | ==== Pro === |
+ | * Sorting/moving/copying of messages from the commandline is easily possible (for common date and subject line sorting) |
||
+ | |||
+ | ==== Con ==== |
||
+ | * No Maildir tools support this at this time, so whilst it _shouldn't_ break any MUAs, new files in a Maildir wont be named in accordance. |
||
+ | |||
+ | ==== Issues and unknowns remaining ==== |
||
+ | :* What is the date based on? Envelope date or message date? In mbox, envelope date is encoded in the From_ line. In Maildir it is the datestamp of the file? Or is it forgotten? |
||
+ | :* We lose the hostname of the machine that delivered the message (Maildir used to put this in). Is this exactly relevant anyway since it can end up almost anything (eg: converting a mbox to maildir with mutt will set the host to whatever the machine name that mutt is running on! BFW!) |
||
+ | :* How do we encode the subject line (for non-fs compatible characters). |
||
+ | :* Could the sorting/moving/copying niceness be expanded by including the sender email address as well? |
||
+ | :** YYYYMMDDHHMMSS.sender@email20chars.subjectline20chars.md58char:2, |
||
+ | :*** seperators at char 15(.), 36(.), 57(.), 66(:) - this makes the filename 68char minimum, with room for status |
||
== Links == |
== Links == |
||
Line 49: | Line 49: | ||
* http://qmail.org/qmail-manual-html/man5/mbox.html - qmail's doc on the mbox format |
* http://qmail.org/qmail-manual-html/man5/mbox.html - qmail's doc on the mbox format |
||
* http://homepages.tesco.net./~J.deBoynePollard/FGA/mail-mbox-formats.html - "mbox" is a family of several mutually incompatible mailbox formats. |
* http://homepages.tesco.net./~J.deBoynePollard/FGA/mail-mbox-formats.html - "mbox" is a family of several mutually incompatible mailbox formats. |
||
− | * http://www.ii.com/internet/robots/procmail/qs/#mailboxFormats - a good reference for mbox/maildir as they apply to procmail |
+ | * http://www.ii.com/internet/robots/procmail/qs/#mailboxFormats - a good reference for mbox/Maildir as they apply to procmail |
Revision as of 14:30, 2 April 2009
|
Maildir is a popular alternative to the almost ubiquitous mbox format (formats - see below). However, I have some issues with it...
First though, let's look at mbox
mbox
Where each folder is a file, and each message is a segment within that file
Pro
- Easy to grep from the commandline (grep STRING file)
- Easy to search within folder using 'less' or similar
- Faster to open as only one fopen required
Con
- Requires $From escaping. Multiple incompatible methods!!!
- Requires file locking for multiple access handling (esp nfs?)
- Slower to write changes as whole mbox needs re-writing
Maildir
Where each folder is a directory, and each message is a file in that directory structure
Pro
- Multiple simultaneous access
- No need for escaping anything
- Changing status on an individual message is VERY fast (file rename only)
- Can also be done from the commandline easily
- Each file is a self-contained valid RFC822 message datastream
Con
- May not be able to search from commandline so easily with multiple ('rgrep STRING maildir/' may return MULTIPLE files, which is then harder to search through with 'less', etc)
- Filename uniqueness is arbitarily unfriendly to human readability
Nemo sez...
Arbitrary unfriendless is anathema to me. Especially, speaking here as a human, arbitrary unfriendliness to a human!
So...
How to fix Maildir's woes?
Information in Filename
The standard says that the name shoulnd't be of care to the MUA, so let's see if we can make it human useful afterall. How about:
- YYYYMMDDHHMMSS.SubjectLineUpTo40CharsLong.md5:2,
This provides human readable date, and messages are then likely sortable by date from commandline easily. Date plus "." = 15 chars. Then let's have up to 40characters of subject line - this should allow for reasonable hinting as to message contents from the filename. That's 55 characters used so far. The first 4 characters of an md5sum of the message itself should be a final uniqueness check (if it is identical to all the others then we have a problem! That brings us to 60 characters (the seperating period making the shortfall for those keeping count). And at the end let's say that status chars (including ":2,") could grow to 10 chars. That's 70 chars which means a basic 'ls' stays comfortably inside a standard 80column terminal.
= Pro
- Sorting/moving/copying of messages from the commandline is easily possible (for common date and subject line sorting)
Con
- No Maildir tools support this at this time, so whilst it _shouldn't_ break any MUAs, new files in a Maildir wont be named in accordance.
Issues and unknowns remaining
- What is the date based on? Envelope date or message date? In mbox, envelope date is encoded in the From_ line. In Maildir it is the datestamp of the file? Or is it forgotten?
- We lose the hostname of the machine that delivered the message (Maildir used to put this in). Is this exactly relevant anyway since it can end up almost anything (eg: converting a mbox to maildir with mutt will set the host to whatever the machine name that mutt is running on! BFW!)
- How do we encode the subject line (for non-fs compatible characters).
- Could the sorting/moving/copying niceness be expanded by including the sender email address as well?
- YYYYMMDDHHMMSS.sender@email20chars.subjectline20chars.md58char:2,
- seperators at char 15(.), 36(.), 57(.), 66(:) - this makes the filename 68char minimum, with room for status
- YYYYMMDDHHMMSS.sender@email20chars.subjectline20chars.md58char:2,
Links
- http://wiki.dovecot.org/MailboxFormat/Maildir - dovecot issues with maildir
- http://www.inter7.com/courierimap/README.maildirquota.html - courier's maildir++ implementation
- http://qmail.org/qmail-manual-html/man5/mbox.html - qmail's doc on the mbox format
- http://homepages.tesco.net./~J.deBoynePollard/FGA/mail-mbox-formats.html - "mbox" is a family of several mutually incompatible mailbox formats.
- http://www.ii.com/internet/robots/procmail/qs/#mailboxFormats - a good reference for mbox/Maildir as they apply to procmail