Homophonous linguists and spam

At the most recent LSA I found out that there is more than one set of “homophonous linguists”: a pair of scholars in the field who share the same name. Turns out that not only are there two Stefan Frisches, there are also two Matthew Gordons. And one of the Stefans has a way of combatting spam that I thought I’d share here, since spam has had a direct effect on the structure of phonoloblog.

There are numerous ways of fighting spam: my server scans the content of incoming messages, runs a statistical content analysis, and tags messages which are likely to be spam as [spam] in the subject line. The content analysis doesn’t need to do much more than look for some co-occurrence of words like “congratulations”, “the late Mr.”, “South Africa”, “Nigeria”, “lottery”, and so on. The current system sometimes fails to catch them, but never labels something “spam” that wasn’t. This is a somewhat passive approach – let the spam come, and chuck it. There is no negative impact on the spammer; from their point of view, it’s equivalent simply to you deleting their message.

A more defensive approach is to use a blocker that only admits mail from known addresses. The first time you email someone with one of these, their blocker sends an automated message back to you asking you to identify a word embedded in a gif image. If you correctly identify the word, then your message is allowed through, and your email address is permanently added to the recipient’s list of ‘cool’ senders. It’s kind of like getting wanded at the airport security gate: a nuisance, but you know why it has to be done. It also keeps out messages about last-minute deals from Travelocity and so on, which you might be OK with. (Then there’s the matter of the dumb, numb feeling that sets in if you’ve ever got the word wrong).

Other approaches range over degrees of counter-offensiveness. The most extreme would be to respond to a spammer in an antagonistic manner – probably not worth your time. But there are other ways to create more of a nuisance for them, if you consider the fact that they depend on cultivating email addresses from online databases. For example, Geoff Pullum has a link from his site to a page that automatically generates a new set of false email addresses every time you load it. Hypothetically, the spambots end up sending spam to these false addresses. (I guess the goal is, eventually if spammers realize that most of their spam goes out to false addresses, they will awaken to the futility of the exercise and move on).

Which brings us back to the Florida Stefan Frisch: he posts actual spam-sender addresses on his own web page. So rather than cultivate false addresses, spambots cultivate the addresses of their brethren, and end up spamming each other. I like this idea so much I have started to paste all my incoming spam addresse into a single file, and will soon post it (as soon as Google’s crawler actually decides to visit my page).

As for homophony of the linguist, I was surprised to find out that it hadn’t instead occurred with some of the more frequent last names in our field, like Smith or Kim. Mind you, Gordon is ranked at 143 of the most common last names in the US, not far below Stevens, Hicks, and Kennedy, and not far ahead of Shaw, Rice, Rose, and Stone. And Frisch might be comparatively more rare (in the US at least, tied at 7501 with Woosley, Armitage, and Chouinard), but the fact that it co-occurs twice with Stefan (rather than with, like, Seamus) shouldn’t be too much of a surprise. Not that it’s a problem either – as with other homophonous pairs, we can distinguish these in context (e.g. in both pairs, one’s a phonologist and the other is a syntactician).