A Quick Introduction to Email Formats
I originally wrote the following for my employer's wiki, but I stole it and translated it for nemwiki, too -- st
An email message, in raw form, has two parts: the headers and the body.
The headers are seperated from the body by exactly one blank line.
Headers look like this:
Header names are case-insensitive.
Headers can be wrapped onto more than one line by prefixing the second and following lines with whitespace:
Header-Name: a very long value which must be wrapped over several lines
Headers whose names begin with X- or x- are non-standard. (known simply as X-Headers)
The interpretation of the message body depends on certain headers being present. Normally, the message is assumed to be 7-bit ASCII text. If the message has a MIME-Version: 1.0 header, then the body is treated as a MIME-encoded message. MIME stands for "Multipurpose Internet Mail Extensions", and it is described in RFC 2045 (http://www.faqs.org/rfcs/rfc2045.html)
A MIME-encoded message should have the following headers:
- The current version of the MIME specification
- Describes the format of the data in the message body. The raw content-type (text/plain, for example) can have parameters seperated by semicolons:
Content-Type: text/plain; charset="ISO-8859-1"
- Some data formats won't fit into 7-bit ASCII - for example, any text that has accented characters, anything written in an east-Asian script, or binary files. MIME-based mail composers can encode the data into a sequence of 7-bit printable ASCII characters, and MIME-based mail readers will automatically decode the sequence before trying to do anything to it (such as save it to disk, or display it).
Along with all the usual values for Content-Type (text/html, text/plain, image/png, audio/mp3, application/octet-stream and so forth) there are two special values that apply to email messages: multipart/alternative and multipart/related. Messages with these content types contain a number of other messages, each with their own headers and bodies. These sub-messages are assumed to be MIME-format (so they don't need a MIME-Version header).
For a multipart/alternative message, the mail reader will display the 'best' of the alternatives available, and ignore the others (For example, you might have a text/html message and a text/plain message. If your mail reader can render HTML it should display the text/html message, otherwise it should display the text/plain message.
For a multipart/related message, all the messages inside should be, well, related. For example, an HTML page that includes images should be grouped with the images in a multipart/related section.
Both multipart/alternative and multipart/related content-types should have a boundary parameter that contains a line used (prefixed by --) to mark the beginning of message parts. Obviously, it should be a line that does not occur anywhere else in the message.
Here's a simple MIME message:
MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="flubblegidget" Text before the first boundary marker is hidden by MIME mail readers. --flubblegidget Content-Type: text/plain Hello, world! --flubblegidget Content-Type: text/html <html><body><h1>Hello, world!</h1></body></html> --flubblegidget--
Note: If you copy and paste the above into a file to try it out with, say, Outlook Express, the wiki puts spaces on all the blank lines - you'll have to make sure that all the blank lines really are empty.