Thinking out loud about 2nd-gen Email

Note: This is just me, thinking out loud; you absolutely do not need to think that I have carefully thought this through, or that this is a good idea. With expectations set as low as possible, let’s continue.

There are many old pieces of tech still in use, but there’s one that grinds my gears every time I try to use it: Email.

For users, email works pretty well. Sometimes it sends too many emails to Junk, but Email is old, reliable, easy to understand, and relatively easy to search. It’s a good system, and I’m not eager to replace it with Slack anytime soon.

However… the backend for email, is a mess. In escalating order (and “we” is used in a very imprecise, broad hand-waving sense for technologists):

My gut reaction to the above is that we’ve got a lousy spec, with decades of cruft and unofficial spec, and we aren’t that great at securing it, or making sure messages are authentic. So… could we do better?

Thus the hypothetical: 2nd-gen email.

Your initial reaction might be: That would be pointless, because not everyone would opt into it, and it would break compatibility all over the place. My thought is… that’s not necessarily a given. Imagine this:

  • We create a new DNS record, called MX2. Most email services, then, would have an MX2 and MX record. Older services only have MX.
  • If an ancient, 20 year old email client, tries to send a message – it finds the MX record and sends the message just like normal. A modern client sees the MX2 and sends the message there if it exists; otherwise, it falls back to MX.
  • From there, the email services which implement MX2 would publish a public date, on which all messages sent to them by the old MX record, will be automatically sent to Junk. If just Microsoft and Google alone agreed on such a date, that would be 40% of global email traffic.

If the above looks slightly familiar, it’s because this strategy already worked, in a sense, with the transition from HTTP to HTTPS. We threw away a multi-decade-old protocol, for a new and more secure one. We set browsers to automatically upgrade the connection wherever possible, and now warn users about insecure connections when accessing HTTP (especially on login pages). Nevertheless – users can still visit HTTP pages, ancient browsers still work on HTTP, but most websites have gotten the memo and upgraded to HTTPS anyway.

The incentive to upgrading to MX2 would be simple: Your messages, while they still would arrive, would go to Junk automatically past the publicly posted date. No business wants that, even if users are already trained to expect, and act, like that can happen. Thus, the incentive to upgrade without truly breaking any day-to-day compatibility.

Personally, I think that such a transition could go even faster than the HTTP to HTTPS transition. Self-hosted email is not very popular in part because of the complexity of the current email system, so between Microsoft, Google, Amazon, Zoho, GoDaddy, Gandhi, Wix, Squarespace, MailChimp, SparkPost, and SendGrid – you have most of the email market covered for the US; anyone not in the above list would quickly fold. The relative centralization of email, ironically, makes a mass upgrade to email much more achievable.

What would a 2nd-gen email prioritize then? Everyone has different priorities, but I’d personally suggest the following which would hopefully win a broad enough consensus if this idea goes anywhere (though experts, of which I am not one, would have plenty of their own ideas):

  1. A standardized HTML specification for email; complete with a test suite for conformance. Or, maybe we just declare a version of the HTML5 spec to be officially binding and that’s the end of it.

  2. Headers for email chain preferences, or other email-specific preferences (i.e. Is this email chain a top-reply chain, or a bottom-reply chain? The client shouldn’t need to guess, or worse, ignore it.)

  3. If an email has a rich, HTML view; it should be required to come with a text-only, non-HTML copy of the body as well; for accessibility, compatibility, and privacy reasons.

  4. All MX2 records must have a public key embedded in the record. To send an email from the domain:

    – A hash of the email content, and all headers, is created.
    – This hash is then encrypted with the private key, corresponding to the record’s public key.
    – This header is then added to the email, as the only permitted untrusted header.
    – When an email is received, the header containing the hash is decrypted with the DNS public key, and the rest of the email is checked against the hash for integrity and authenticity.

  5. Point #4 is a lot like DKIM and DMARC right now, except:

    – There would always be an automatic reject policy (p=reject) . Currently, only 19.6% of email services which even have DKIM are this stringent.
    – If headers do need to be added to an email, the spec can carefully define carve-outs for where untrusted data can go (i.e. if the spam filter wanted to add a header).
    – There also could be standardized carve-outs for, say, appending untrusted data from the receiving server to a message body (i.e. your business could add data to the body’s top or bottom indicating that the message from an external recipient and you have legal obligations, but your email client can also clearly show that this was not part of the original message and is not signed).
    – As such, the signing would not need to work around email compatibility to such an extent as DKIM, reducing the likelihood of critical flaws.

  6. By simplifying the stack to the above, eliminating SPF, DKIM, and DMARC (and their respective configuration options), and standardizing on one record (MX2) for the future, running your own self-hosted email stack would become much easier. Additionally, the additional authenticity verifications would hopefully allow spam filters to be significantly less aggressive by authenticating against domains instead of IPs.

  7. Point #6 is the biggest change – we’re no longer authenticating, or caring, about the IP Address that’s sending the email. Every email can and always would be verified against the domain using MX2 records and the public keys in them. Send a fake spam email? It doesn’t have a signature, so it gets tossed without any heuristics. Send a real spam email? Block that domain when there’s complaints. Go after the registrar (or treat domains belonging to that registrar as suspicious) if needed. This would mostly eliminate the need for IP reputation by replacing it with domain reputation – which, at least to me, is a far superior standard with more understandable and controllable outcomes(1).

  8. Clients which implement MX2 can, optionally, have an updated encryption scheme to replace OpenPGP. Something like Apple’s Contact Key Verification. Hopefully there would be forward secrecy this time.

If you have got great counterarguments, let me hear them.

(1) This would, perhaps, be the one and only “new feature” we could advertise to users. Not getting emails? You can just type in the name of the website, and always receive the emails.

Edit 1, for clarification: For bulk senders, there would be multiple MX2 records on the domain, each containing a public key for every authorized sender. One of those records would have a marker indicating it as suitable for incoming mail.


Edit 2: This article has had a very large discussion on Hacker News. While the discussion winds down, I have some additional thoughts from there:

  • If there is an MX2 (ever), a sane way to share large files (like hundreds of megabytes, or even gigabytes) would be great. Designing the protocol wouldn’t be easy especially due to spam concerns, but if I had a nickel for how many links are shared to avoid email size limits… this is a real-world problem.
  • MX2 will literally never happen if Google and Microsoft don’t join in. They would also, of course, have considerable control on the outcome. However, if even open-source communities and developers adopted MX2 because it was easy to implement and open source… you never know what grassroots can do.
  • Part of me wonders what would happen if MX2 threw out SMTP for HTTP with a standardized REST API and JSON bodies. Sure – it would add a mountain of HTTP overhead and be more complex. However, it would sure as heck make implementing MX2 into a project quite easy in most programming languages, as it would just be a web server running on a custom port answering endpoints. REST APIs are also, despite their complexity, a well-documented system including for preventing spam (it’s not like Stripe or S3 lets people spam their APIs with garbage). I don’t know enough about SMTP to know if that’s a good idea – but I do know that SMTP is sub-optimal enough that Microsoft and Google don’t use it when exchanging messages with each other.
  • There has been interesting commentary about being a pull protocol instead of a push protocol (i.e. instead of sending a message from X to Y; X sends Y a tiny standardized note saying to pick up a message from X). The most popular proposal of this was DJB’s Internet Mail 2000.
  • The idea of plain-text alternatives to HTML is probably impossible to enforce.
  • As some commenters have pointed out, if the public key is always on the DNS, and every MX2 implementation is required to have that public key, a sending-server-to-receiving-server email encryption becomes possible.
  • The idea of using HTML, at all, is controversial. Email was originally never designed for HTML, and the security risks of processing it are quite large. Using a superset of Markdown with style directives, or a customized XML schema, or even a new simple markup language all-together (Modern Mail Markup Language – M3L, claiming it now) might be an interesting thought experiment.
  • A consistent point that came up was that standards drift – people don’t always implement the spec right, mistakes are made. I answer that, being a new standard, this is a chance to rigidly enforce the rules from the beginning. For example, we could put it in spec that any incoming message that’s not signed, despite that being immediately and easily verifiable by the sender, causes a 1 hour IP ban for laziness. Just an example.

Published by Gabriel Sieben

Gabriel Sieben is a software developer from St. Paul, MN, who enjoys experimenting with computers and loves to share his various technology-related projects. He owns and runs this blog, and is a traditional Catholic. In his free time (when not messing with computers), he enjoys hiking, fishing, and board games.

Join the Conversation

10 Comments

  1. I think point 6 is at odds with your goal of having the big email providers support this standard. The existing complexity is their most.

  2. From my perspective as an academic user with emails from multiple orgs (as is common for many academics), email is becoming borderline unusable. Orgs have become so paranoid about phishing that on top of the SPF/DKIM/DMARC pile, they’re putting in all kinds of poorly-thought-out-and-implemented homegrown restrictions. There’s basically no way anymore to be confident that you’re receiving all the email you should be, or that others are receiving emails that you send. Somehow, we need to put up a wall to keep out the dregs of the open web, while ideally maintaining some degree of discoverability and serendipity. Something like what you propose (signing and domain reputation) seems like it would solve that.

  3. Any mail server has the duty to deliver, in confidence, and not classify as spam, any legitimate email that is sent. ‘Legitimate’ means that it is sent in good faith and acceptable to the recipient. DKIM, SPF, HTML, MX2 and other schemas do not address this requirement. The primary major problem with the existing email system is the failure to deliver valid, legitimate email. The MX2 proposal seems to aggravate this problem, by suggesting that even more legitimate email would be not properly delivered.

  4. Many moons ago, I experimented with sender-pays email using hash cash, and I think if you are going to build a second version of email, sender-pays should become a core part of the the new environment.

    After thinking about the problem for several years, I believe a sender-pays system has three essential components.

    The first one is who stores the message. Current email fails because spammers push the cost of storage on the receiver. Forcing the sender to hold the messages until the receiving server fetches them makes it more expensive for spammers/marketers to send large volumes of emails. the receiver only gets a notice of the message and the initial headers to help distinguish who the message is from and what it’s about.

    The second component is a mechanism for imposing a mechanism for limiting the rate at which a server can send email. Today all the techniques used today put that cost on the receiver. Using a hashcash token puts the cost on the sender.

    The problem with the first version of hashcash tokens for email is that it was a fixed cost. The third component fixes that problem by allowing the receiver to specify the cost of sending a message based on the sender’s reputation. This allows the receiver to reduce the cost to zero if the individual or the server has a good reputation. There was no way to change the token size based on the sender’s reputation or the amount of available compute to solve the hashcash problem.
    Reputation management is not complex. The more messages received that are not objected to, the cost of sending slowly drops. Declaring a message spam, however, jacks up the price quickly.

    The Hashcash tokens provide another benefit in that if a site is blocked inappropriately, it can still be used to send messages to clear up the problem because if you’re willing to pay the computing cost, you can still deliver a message in contrast to the blacklists which prevent you from ever communicating again.

    As for fixing spf/dkim, I think we might be better off using a variant of wire guard to encrypt the mail message stream and letting the mail server put a series of public keys in their DNS record.

    1. Thanks for the first point that party which holds the message until it received might be overwhelmed and needs compensation. I should think about it.

      I’ve been thinking about “sender-pays” approach but in more general sense as an universal method to eliminate any unwanted spam. For instance unwanted phone calls and push notifications. The reasoning is that receiver attention is a limited and valuable resource and should be compensated. If receiver gets large enough payment for each time he is bothered then he is not at loss.

      But such approach needs substantial change in our culture and medium closer to real money than abstract hashes.

  5. I have a few quick thoughts – adding them here for discussion:

    1. Why not use a JSON payload for sending email. Instead of custom header, let it be in the payload. Thus body becomes a string, and bodyType can indicate whether it’s text or html. Similarly, attachments can be added as a Base64 blob to the body.

    2. Secondly, why not allow clients to send a HTTP POST request to IP for emails. Might have HTTP overhead, but would simplify building clients.

    1. I used to hold this exact perspective on email, so I totally get where it’s coming from. It’s a simple implementation and simple is great! Howeeeever, the reality isn’t actually that simple.

      JSON is simple, but when payloads can be large it can be prohibitively inefficient. Most JSON implementations require storing the entire unserialized (or deserialized) and serialized values in RAM at the same time and the format itself is typically unfriendly to streaming — NDJSON helps when you have many small objects, but not one large one. And when attachments are base64-encoded, that inflates the payloads even more because base64-encoding a string makes it ~36% larger. So if you support 10MB attachments, you have to allow for 24MB of RAM just to process just that one attachment. If you’re running a garbage-collected language runtime you have to assume you’re holding onto that object in memory for longer than you need it — it may not be GCed immediately. If you have to process thousands of attachments concurrently, you’re looking at hundreds of GB of RAM used just to decode attachments in memory. Not the entire JSON payload, just the attachments.

      Nearly every system that needs to operate at unbounded scale uses streamable formats instead, like `multipart/form-data`. This lets you process data without loading all of it into RAM at once, reducing the maximum ram required. You need sufficient RAM to accommodate everyone sending the maximum possible payloads concurrently (if you run even a moderately successful email service, this *will* happen), otherwise the app crashes. The way emails are typically sent via SMTP uses a streamable serialization format that is a lot like `multipart/form-data` so that the server can stream attachments directly to durable storage without loading them in RAM (other than a buffer whose size is measured in KB), and supporting both HTML and text parts. This is something that the current email ecosystem gets very much correct.

  6. You’re forgetting all the emails that are not created by humans, but by legit services: monitoring emails, legit lists, etc. Of course, one solution is to create subdomains for those. But it may be a difficult migration for some companies, that are sending emails from different platforms, like using cloud services for customer support (with automatic email sending) or who knows what. Of course, you can try to export-import the private key in all the services sending emails for the same domain, but it may be a little difficult.

    That is something that DKIM solves allowing multiple DKIM keys, and specifying in the email header what key you’re using. I suppose that you’re thinking about the possibility to have multiple MX2 entries, one for each sending server with a different certificate. But as the MX is used to deliver emails to a domain (SMTP server) it may cause a conflict when you have multiple senders out of that email server (you won’t be able to deliver emails to all the client IPs. I still see the need to separate certificates used by authorized emails senders than the identification of the email receiver.

  7. I wish MX2 would support plain-text only. It could save significant amount of traffic, electricity, and money. Fancy decorated documents can be sent as attachments.

Leave a comment

Your email address will not be published. Required fields are marked *