Trevor Burnham

Sure, it works in practice…

Beautiful Spam Filtering

March 21st, 2010

This is the fifth in a series of posts about Paul Graham’s book Hackers & Painters.

I am a bit per­plexed at “A Plan for Spam” (Chapter 8): In it, Graham describes Bayesian fil­ter­ing, and his surprise at how effec­tive it is compared to the more time-​​consuming alter­na­tive of building a com­pli­cated set of black­lists and whitelists. Because this method of spam pro­tec­tion requires a large corpus (ideally an indi­vid­u­al­ized one for each user) of spam and non-​​spam messages, he writes, “each user should have two delete buttons, ordinary delete and delete-​​as-​​spam.” This is, of course, a feature familiar to anyone who’s used Gmail, Apple Mail or vir­tu­ally any other recent e-​​mail client. I find it hard to imagine a world in which Bayesian spam fil­ter­ing isn’t used, and I now receive very little spam in part because of it. So, this essay is com­pletely right, but it feels like a relic from a bygone era when e-​​mail spam was a serious concern rather than a joke. Today, phishing is the much greater concern, and Bayesian filters are nec­es­sar­ily limited in their ability to stop e-​​mail designed to emulate missives from your bank or ISP. While “A Plan for Spam” may have been influ­en­tial, it feels out of place in a book aimed at less tran­sient concerns.

Yet taken as a com­ple­ment to the fol­low­ing chapter, “Taste for Makers,” the spam essay conveys a subtle and impor­tant point: People tend to neglect a simple, elegant solution that requires a lot of thought up front (in this case, an under­stand­ing of Bayesian prob­a­bil­ity updating) when a more com­pli­cated, tedious, and less effec­tive approach is avail­able that requires little intel­lec­tual exertion (the blacklist-​​whitelist approach). The prop­er­ties of the former solution are part of what Graham calls good design: “Good design is simple… Good design solves the right problem… Good design is hard… Good design looks easy… Good design resem­bles nature… Good design is often strange.” This last is inter­est­ing: “I’m not sure why,” Graham admits. “It may just be my own stu­pid­ity. A can opener must seem mirac­u­lous to a dog. Maybe if I were smart enough, it would seem the most natural thing in the world that e = -1. It is after all nec­es­sar­ily true.”

The best advice is at the end of the chapter. “Intol­er­ance for ugliness is not in itself enough,” Graham warns, “You have to under­stand a field well before you develop a good nose for what needs fixing. You have to do your homework. But as you become expert in a field, you’ll start to hear little voices saying, What a hack! There must be a better way. Don’t ignore those voices. Cul­ti­vate them.”

[Update: After writing this, I came across “Filters that Fight Back,” an essay Graham wrote in 2003 yet chose not to include in H&P. In it, he claims that spam could be fought more effec­tively if our e-​​mail clients auto­mat­i­cally crawled suspect links, thereby imposing harsh band­width costs on spammers. It is a scheme that seems just crazy enough to work, and I’m a bit sur­prised that I haven’t heard of it being used in practice.]

Tags:     No Comments

0 responses so far ↓

Comments are closed.