Previous Entry Share Next Entry
And this is why Java APIs drive me nuts
device
jducoeur wrote in querki_project
Today's main project has been battling The Spam Problem.

Ever since I started sending emails from Querki (about a month ago), I've had problems with it getting spam-foldered. It's been erratic, but hasn't been getting any better, so today's project wound up being looking into that.

Some of the problems are IT-level, so Aaron is having to deal with those (eg, setting up reverse DNS). But the biggest problem turns out to be at my end. I've been sending the email as HTML, and relying on modern mailers to down-display that as text if necessary. Which they do -- but it turns out that they don't *like* to do it. And for reasons I don't even slightly understand, it is apparently an indicator of spam.

Off I went to figure out how to send the emails as proper MIME Multipart. Which I did, but *man* -- the experience was like pulling teeth. We're using Javamail, of course -- Querki is based on Scala, and Scala is based on Java -- and the API documentation is a total mess. Nowhere in *any* of the examples do they correctly show you how to deal with simply sending the same message in HTML and text, despite the fact that that's probably the single most common thing to want to do nowadays.

So for posterity's sake, and hopefully help somebody else avoid my mistakes, here is how that code snippet came out. Mind, this is Scala code using a Java API, which is why it's idiotically over-complex:
val bodyQL = t.getProp(emailBody).first

// Attach the HTML...
val bodyParser = new QLParser(bodyQL, personContext)
val bodyWikitext = bodyParser.process
val bodyHtml = bodyWikitext.display
val bodyPartHtml = new MimeBodyPart()
bodyPartHtml.setDataHandler(new DataHandler(new ByteArrayDataSource(bodyHtml, "text/html")))

// ... and the plaintext...
val bodyPlain = bodyWikitext.plaintext
val bodyPartPlain = new MimeBodyPart()
bodyPartPlain.setDataHandler(new DataHandler(new ByteArrayDataSource(bodyPlain, "text/plain")))

// ... and set the body to the multipart:
val multipart = new MimeMultipart("alternative")
// IMPORTANT: these are in increasing order of importance, and Gmail will display the *last*
// one by preference:
multipart.addBodyPart(bodyPartPlain)
multipart.addBodyPart(bodyPartHtml)
msg.setContent(multipart)
The code is wordy to begin with, but more importantly:
  • Nowhere in MimeMessage did it indicate that SOP is to build a MimeMultipart and assign that in; indeed, only one of the example programs even shows you how to do so. Most of the examples tell you to just assign the content directly to the message, so that's what I did -- and that turns out to piss off ISPs.

  • The "alternative" part is absolutely essential, and is completely undocumented as far as I could tell. (If you don't have it, Gmail displays *both* parts, which had me scratching my head for a while.)

  • As the note says, the order of the MIME parts is critical -- and again, that's not clear from the documentation.
Grouse, grouse, grouse. I really have to look around sometime and see if somebody's built a better-designed Scala shell around this mess. If not, I may have to do it myself sometime, just for my sanity.

Oh, and while I'm on the topic of email, two more things.

First -- if you do have a plaintext email reader, you'll find (as indicated by the above code) that the email is being sent as basically the literal pseudo-Markdown wikitext. That's actually not a bad answer, but we're not yet stripping out some of the formatting details, so the plaintext will have, eg, {{cssStyle:...}} junk in it. My apologies; sooner or later, we'll have to write a Markdown-to-plaintext renderer.

Second, and more importantly -- while this should help with a lot of ISPs, Google is pretty explicit that they prefer to see whitelisting. So if you're on Gmail, and you think you're likely to get involved in this project at some point, I recommend simply adding "userEmails@querki.net" (the address that most Querki mail comes from) to your Contacts list now -- that will allow you to receive invitations without them getting spam-foldered. (Frankly, I'm hoping that enough people putting that address into Contacts will make Google's algorithms take notice more generally, but that's just a guess. Mysterious are the Ways of Google.)

And yes, I'll be adding proper "Unsubscribe" links to those emails soon. Current theory is that each email will have three links at the bottom:
  • Stop receiving emails from this Space. (I'm tired of this topic.)

  • Don't send any more invitations from this sender. (I don't want to hear from this person.)

  • Don't send any more invitations from Querki, period. (No, really, just go away.)
I think that will comply nicely with both the letter and spirit of what companies like Google want, and provides folks with appropriate granularity of shutting things up...
Tags:

  • 1
I think the reason that html code in email is taken as a SPAM indicator is that one way spammers (and especially phishers) obfuscate their material is by hiding it in html -- especially html coded to look like plaintext while containing hidden redirects that come into play if you click on it.

HTML is a strike towards being spam, so too is any email with a multitude of links. Especially too email that has links or graphics that may not have the same domain as the email sender. Having an Unsubscribe link at the bottom actually goes a good way towards -decreasing- spam scores in some systems.

Edited at 2013-06-19 12:38 pm (UTC)

HTML is a strike towards being spam, so too is any email with a multitude of links. Especially too email that has links or graphics that may not have the same domain as the email sender.

Yeah, this has all been an education in spam algorithms. Fortunately, Anne volunteered to look at what Querki's emails looked like in her systems (and she's relatively experienced in the topic), and Alexx showed me the headers that came out of his ISP, so I got a couple of hard data points showing where we were getting seriously dinged. Having HTML at all is a small strike; not having it structured as proper MIME multipart, with a text/plain alternative, turns out to be a *huge* one in both systems.

Good reminder, though: I should probably scrub any external links from invitation mail, as another part of spam-prevention. Thanks: now added to the ever-growing to-do.

Having an Unsubscribe link at the bottom actually goes a good way towards -decreasing- spam scores in some systems.

Yeah, I've been assuming that, which is why it's fairly high on the to-do list. Only reason I haven't done it yet is that I'm not yet fully prepared to honor that promise. (And one of my rules is "don't lie to the user" -- that's why the placebo "Done" button rankled with me.) I need to get the User/Identity framework a little more mature, so that I can record peoples' unsubscribe requests correctly.

Once that's all in place (hopefully within the next month), I'll add the necessary flags to the database, and make those Unsubscribe links a standard part of every invitation email...

  • 1
?

Log in

No account? Create an account