Maildir
(expand slightly) |
(slight clarify and mention of my implementation) |
||
(5 intermediate revisions by one user not shown) | |||
Line 23: | Line 23: | ||
* Multiple simultaneous access |
* Multiple simultaneous access |
||
* No need for escaping anything |
* No need for escaping anything |
||
− | * Changing status on an individual message is VERY fast (file rename only) |
+ | * Changing status on an individual message is VERY fast (file rename only), as is moving messages between folders with 'mv <files> <newfolder>' style tricks. |
** Can also be done from the commandline easily |
** Can also be done from the commandline easily |
||
* Each file is a self-contained valid RFC822 message datastream |
* Each file is a self-contained valid RFC822 message datastream |
||
=== Con === |
=== Con === |
||
* May not be able to search from commandline so easily with multiple ('rgrep STRING maildir/' may return MULTIPLE files, which is then harder to search through with 'less', etc) |
* May not be able to search from commandline so easily with multiple ('rgrep STRING maildir/' may return MULTIPLE files, which is then harder to search through with 'less', etc) |
||
− | * Filename uniqueness is '''arbitarily''' unfriendly to human readability |
+ | * Filename uniqueness is '''arbitarily''' unfriendly to human readability. |
+ | ** Thus while moving a message between folders is trivial, discovering anything ABOUT a message is largely impossible from the filename alone, meaning 'mv <filenameregex> <newfolder>' is a crippled trick. |
||
* Cannot organise mail folders into subfolders for organisational ability. |
* Cannot organise mail folders into subfolders for organisational ability. |
||
Line 49: | Line 49: | ||
==== Con ==== |
==== Con ==== |
||
* No Maildir tools support this at this time, so whilst it _shouldn't_ break any MUAs, new files in a Maildir wont be named in accordance. |
* No Maildir tools support this at this time, so whilst it _shouldn't_ break any MUAs, new files in a Maildir wont be named in accordance. |
||
− | * to write this, the MDA needs to parse the message, something it doesn't need not do. |
+ | * to write this, the MDA needs to parse the message, something it otherwise need not do. |
==== Issues, thoughts and unknowns remaining ==== |
==== Issues, thoughts and unknowns remaining ==== |
||
Line 57: | Line 57: | ||
** restrict to a-zA-Z0-9-_@, and remove all other characters. this compactness ensures maximum information in the sender and subjectline information fragments. |
** restrict to a-zA-Z0-9-_@, and remove all other characters. this compactness ensures maximum information in the sender and subjectline information fragments. |
||
* Could the sorting/moving/copying niceness be expanded by including the sender email address as well? |
* Could the sorting/moving/copying niceness be expanded by including the sender email address as well? |
||
− | ** YYYYMMDDHHMMSS.sender@email20chars_.subjectline20chars__.md5_:2, |
+ | ** YYYYMMDDHHMMSS.sender@email25chars______.subjectline25chars_______.md5_:2, |
− | *** seperators at char 15(.), 36(.), 57(.), 62(:) - this makes the filename 64char minimum, with status flags following |
+ | *** seperators at char 15(.), 41(.), 67(.), 72(:) - this makes the filename 74char, with status flags following |
*** truncate or pad email and subject as required. This keeps columns aligned for (monospaced) filelisting maximal niceness. |
*** truncate or pad email and subject as required. This keeps columns aligned for (monospaced) filelisting maximal niceness. |
||
− | *** run some stats over mail to find mean/mode/mean email and subject line lengths? |
+ | *** I ran some brief stats over some mail to find common from and subject line lengths, and 25 is a good size for both |
+ | |||
+ | == The other big picture == |
||
+ | Is there really any reason why Maildir needs three subdirectories? |
||
+ | * tmp isn't needed, as dovecot demonstrates: http://wiki.dovecot.org/MailboxFormat/Maildir#Mail_delivery (byebye 'tmp') |
||
+ | ** however, my team leader suggests that it IS needed over NFS mounts... |
||
+ | * new and cur simply distinguish the 'new' vs 'old' flag on a message. |
||
+ | ** Since all other flags are part of the filename, why not this one too. This removes another directory. (byebye 'new') |
||
+ | ** note that this loses significant efficiency when checking for new messages, compared to Maildir/new ... unless clever atime tricks (like mutt uses with mbox files) work. |
||
+ | |||
+ | Now that we've demonstrated that one subfolder is needed, really it turns out that none are needed. (why have Maildir/new/* when you can have Maildir/*) |
||
+ | |||
+ | This also simplifies subfolders immensely. A sub directory on-disk == a mail sub-folder. No need for tricks like naming mail folders with delimiting "." in the directory filename. |
||
+ | |||
+ | Finally, I'd use ";" instead of ":" in the filename, so that they are windows/NTFS compatible. |
||
+ | |||
+ | Thus by example: |
||
+ | ;~/Maildir2/:a mail folder |
||
+ | ;~/Maildir2/<all normal files>:one message per file within Maildir2 |
||
+ | ;~/Maildir2/sub-folder/:a mail subfolder - which is also an on-disk subfolder. It would be populated with mail files also. |
||
+ | ;~/Maildir2/.<dotfiles>:any metadata about the Maildir2 which clients may want to write, can happily go here, yeah? |
||
+ | |||
+ | ...I haven't defined what a .dotsubdirectory should mean. Is it a mail subfolder? Or is it metadata? I don't think it's important to define offhand... |
||
+ | |||
+ | == Implementations == |
||
+ | I have had an internal implementation as a simple shell script since 2013, and use it for my "nntp newsspool to Maildir" setup. It might be on github one day. |
||
== Links == |
== Links == |
||
Line 66: | Line 66: | ||
* http://www.inter7.com/courierimap/README.maildirquota.html - courier's maildir++ implementation |
* http://www.inter7.com/courierimap/README.maildirquota.html - courier's maildir++ implementation |
||
* http://qmail.org/qmail-manual-html/man5/mbox.html - qmail's doc on the mbox format |
* http://qmail.org/qmail-manual-html/man5/mbox.html - qmail's doc on the mbox format |
||
− | * http://homepages.tesco.net./~J.deBoynePollard/FGA/mail-mbox-formats.html - "mbox" is a family of several mutually incompatible mailbox formats. |
+ | http://homepage.ntlworld.com./jonathan.deboynepollard/FGA/mail-mbox-formats.html - "mbox" is a family of several mutually incompatible mailbox formats. |
* http://www.ii.com/internet/robots/procmail/qs/#mailboxFormats - a good reference for mbox/Maildir as they apply to procmail |
* http://www.ii.com/internet/robots/procmail/qs/#mailboxFormats - a good reference for mbox/Maildir as they apply to procmail |
Latest revision as of 09:28, 13 May 2021
|
Maildir is a popular alternative to the almost ubiquitous mbox format (in fact, formats plural - see below). However, I have some issues with it...
First though, let's look at mbox
[edit] mbox
Where each folder is a file, and each message is a segment within that file
[edit] Pro
- Easy to grep from the commandline (grep STRING file)
- Easy to search within folder using 'less' or similar
- Faster to open as only one fopen required
[edit] Con
- Requires $From escaping. Multiple incompatible methods!!!
- Requires file locking for multiple access handling (esp nfs?)
- Slower to write changes as whole mbox needs re-writing
[edit] Maildir
Where each folder is a directory, and each message is a file in that directory structure
[edit] Pro
- Multiple simultaneous access
- No need for escaping anything
- Changing status on an individual message is VERY fast (file rename only), as is moving messages between folders with 'mv <files> <newfolder>' style tricks.
- Can also be done from the commandline easily
- Each file is a self-contained valid RFC822 message datastream
[edit] Con
- May not be able to search from commandline so easily with multiple ('rgrep STRING maildir/' may return MULTIPLE files, which is then harder to search through with 'less', etc)
- Filename uniqueness is arbitarily unfriendly to human readability.
- Thus while moving a message between folders is trivial, discovering anything ABOUT a message is largely impossible from the filename alone, meaning 'mv <filenameregex> <newfolder>' is a crippled trick.
- Cannot organise mail folders into subfolders for organisational ability.
[edit] Nemo sez...
Arbitrary unfriendless is anathema to me. Especially, speaking here as a human, arbitrary unfriendliness to a human!
[edit] So...
How to fix Maildir's woes?
[edit] Information in Filename
The standard says that the name shoulnd't be of care to the MUA, so let's see if we can make it human useful afterall. How about:
- YYYYMMDDHHMMSS.SubjectLineUpTo40CharsLong.md5:2,
This provides human readable date, and messages are then likely sortable by date from commandline easily. Date plus "." = 15 chars. Then let's have up to 40characters of subject line - this should allow for reasonable hinting as to message contents from the filename. That's 55 characters used so far. The first 4 characters of an md5sum of the message itself should be a final uniqueness check (if it is identical to all the others then we have a problem! That brings us to 60 characters (the seperating period making the shortfall for those keeping count). And at the end let's say that status chars (including ":2,") could grow to 10 chars. That's 70 chars which means a basic 'ls' stays comfortably inside a standard 80column terminal.
[edit] Pro
- Sorting/moving/copying of messages from the commandline is easily possible (for common date and subject line sorting)
- Should be partially or wholly compatible with other Maildir extensions (Maildir++) which provide additional capabilities - eg, nested folders and quotas
[edit] Con
- No Maildir tools support this at this time, so whilst it _shouldn't_ break any MUAs, new files in a Maildir wont be named in accordance.
- to write this, the MDA needs to parse the message, something it otherwise need not do.
[edit] Issues, thoughts and unknowns remaining
- The envelope date in mbox is encoded in the From_ line. In Maildir it is the datestamp of the file. The envelope sender is not saved in maildir.
- We lose the hostname of the machine that delivered the message (Maildir used to put this in). Is this exactly relevant anyway since it can end up almost anything (eg: converting a mbox to maildir with mutt will set the host to whatever the machine name that mutt is running on! BFW!)
- How do we encode the subject line (for both spaces and ensuring maximal fs compatible characterset).
- restrict to a-zA-Z0-9-_@, and remove all other characters. this compactness ensures maximum information in the sender and subjectline information fragments.
- Could the sorting/moving/copying niceness be expanded by including the sender email address as well?
- YYYYMMDDHHMMSS.sender@email25chars______.subjectline25chars_______.md5_:2,
- seperators at char 15(.), 41(.), 67(.), 72(:) - this makes the filename 74char, with status flags following
- truncate or pad email and subject as required. This keeps columns aligned for (monospaced) filelisting maximal niceness.
- I ran some brief stats over some mail to find common from and subject line lengths, and 25 is a good size for both
- YYYYMMDDHHMMSS.sender@email25chars______.subjectline25chars_______.md5_:2,
[edit] The other big picture
Is there really any reason why Maildir needs three subdirectories?
- tmp isn't needed, as dovecot demonstrates: http://wiki.dovecot.org/MailboxFormat/Maildir#Mail_delivery (byebye 'tmp')
- however, my team leader suggests that it IS needed over NFS mounts...
- new and cur simply distinguish the 'new' vs 'old' flag on a message.
- Since all other flags are part of the filename, why not this one too. This removes another directory. (byebye 'new')
- note that this loses significant efficiency when checking for new messages, compared to Maildir/new ... unless clever atime tricks (like mutt uses with mbox files) work.
Now that we've demonstrated that one subfolder is needed, really it turns out that none are needed. (why have Maildir/new/* when you can have Maildir/*)
This also simplifies subfolders immensely. A sub directory on-disk == a mail sub-folder. No need for tricks like naming mail folders with delimiting "." in the directory filename.
Finally, I'd use ";" instead of ":" in the filename, so that they are windows/NTFS compatible.
Thus by example:
- ~/Maildir2/
- a mail folder
- ~/Maildir2/<all normal files>
- one message per file within Maildir2
- ~/Maildir2/sub-folder/
- a mail subfolder - which is also an on-disk subfolder. It would be populated with mail files also.
- ~/Maildir2/.<dotfiles>
- any metadata about the Maildir2 which clients may want to write, can happily go here, yeah?
...I haven't defined what a .dotsubdirectory should mean. Is it a mail subfolder? Or is it metadata? I don't think it's important to define offhand...
[edit] Implementations
I have had an internal implementation as a simple shell script since 2013, and use it for my "nntp newsspool to Maildir" setup. It might be on github one day.
[edit] Links
- http://wiki.dovecot.org/MailboxFormat/Maildir - dovecot issues with maildir
- http://www.inter7.com/courierimap/README.maildirquota.html - courier's maildir++ implementation
- http://qmail.org/qmail-manual-html/man5/mbox.html - qmail's doc on the mbox format
http://homepage.ntlworld.com./jonathan.deboynepollard/FGA/mail-mbox-formats.html - "mbox" is a family of several mutually incompatible mailbox formats.
- http://www.ii.com/internet/robots/procmail/qs/#mailboxFormats - a good reference for mbox/Maildir as they apply to procmail