Tips for Avoiding SPAM (Part 2)
In Part 1, I gave tips and info for everyone. This part contains some techniques that web developers and system administrators can use to reduce the impact of SPAM on their websites.
Tips for Web Developers
As a web developer, you have to be careful when using stuff like guestbooks and discussion boards on your site. If you ask users to enter their email addresses when contributing to these things, you should realize that the end result may be a webpage with dozens or hundreds of email addresses – just what a SPAMMER looks for. In addition, any email addresses for people at your company/website may also become victims if you have them on your web pages. Normally, companies looking to harvest addresses from websites are doing it by unleashing a “robot” on your site that will scour your pages for things that look like email addresses. Luckily, these robots aren’t usually very smart so you may be able to fool them using some of the tips below.
- To avoid putting the real email addresses of site administrators, customer support, etc. on your site, you may want to create a “Contact Us” page. Instead of giving the user the email address of the party they’re trying to contact, the page may just contain a TEXTAREA where they can type their message and a SELECT list of recipients’ names or titles. When they click the Submit button on this form, a CGI script on your server will send the message to the appropriate address. If you don’t want to write your own CGI script, Matt’s Formmail script will work just fine. The only problem with Formmail is that it requires you to put a hidden field in your HTML form containing the email address of the recipient. That kind of defeats the whole purpose since a robot will probably find that hidden field in the HTML! This isn’t a big problem if you know even a tiny bit about perl and want to modify the Formmail script. Basically you can alter it so that the recipients’ email addresses are contained in the formmail.pl script rather than in the HTML. If you need help, let me know.
- Another option to avoid putting the actual email addresses of your website contacts on your site might be to create an email “autoresponder”. This address doesn’t actually have an inbox on the mail server and no human ever looks at the messages sent to it. When someone emails the autoresponder, it would reply with the email addresses of your website contacts. SPAMMERs probably wouldn’t ever see this reply since they usually send from fake email addresses anyway.
- Do NOT use software that feeds robots a ton of fake email addresses. Trust me, this software causes more problems than it solves. Most of the ones that I’ve seen use “fake” addresses that are too close to real ones; and while it may be funny to send a robot the address “stupid@friggindummy.com”, that message is probably still going to a real domain (friggindummy.com), a real mail server, and maybe a real human postmaster. Multiply that times the thousands of addresses that they give and it turns into a headache. This also assumes that the robot is somehow going to get overloaded, tired, or bored after receiving all these “fake” addresses and is going to stop trolling your site for more addresses, which probably isn’t true.
- Do you really need to require users’ email addresses for them to sign the guestbook? When I added the Reader Comments stuff to rahji.com I decided to just have people use their first name as an identifier. We’ll see if anyone complains. Consider a webmaster that I know of. He’s a nice guy and he felt rotten when the hundreds of users of his guestbook and classifieds got repeatedly SPAMMED by some crappy company. Maybe he could’ve just used the users’ names instead of their email addresses? Maybe he could’ve used a variation of my first tip above and created a page that lets users contact each other without ever seeing their email address?
- I think a decent option to solve the problem I just described is to encode any “mailto:” link into hex. I don’t know if it really fools many email-grabbing robots but I suspect it might since most of this kind of software is written by crappy programmers. To convert an email address to its hex equivalent, apply this perl substitution to it: s/(.)/ sprintf(‘%%%2x’,ord($1)) /ge This would make an address like jjx3@rahji.com look like %6a%6a%78%33%40%72%61%68%6a%69%2e%63%6f%6d, which works perfectly fine as part of a “mailto:” link in any browser I’ve seen. Strange but true. If you find a browser that this doesn’t work with, let me know.
- Consider this: some email collecting robots may be so stupid that they only collect email addresses from “mailto:” links. An easy thing to do might be to just write the addresses without a “mailto:” link around it. It might help a little.
- With very little perl knowledge you could write a script that converts email addresses on your page to the human-readable equivalents I described way at the top of this page: jjx3 at rahji dot com Start with these substitutions in perl: s/\./ dot /g; s/@/ at /g;
- When thinking of these tricky ways to disguise email addresses from robots, make sure you consider disabled users and other people with browsers that may not work like IE and Netscape. For instance, one way to hide addresses might be to create a GIF image with the address in it, but that does no good for someone with a text-only browser. Also, I’ve seen people use javascript’s “document.write” function to output email addresses. It’s pretty clever since it can be done so that the address doesn’t show up in the HTML but does appear in the browser window. But what a mess it would make for people without javascript or those who turn it off in their browsers. Another unfortunate side-effect of not putting the actual email address on the page (as in my ‘let me know’ link a few paragraphs back) is that someone who prints out the page will not have the email address. Putting the URL of your site or the URL of the actual page somewhere in the document might help people who are looking at a printout find the online version again someday.
- Another SPAM-related problem that webmasters face is unsolicited commercial messages in guestbooks. Since most guestbooks don’t require any kind of sign up, you’ll often see them full of advertisements. If you are good with perl or some other server-side scripting language, you should be able to develop some sort of process of approval so that messages don’t appear on the website unless you’ve okayed them. A cheezier way of making sure that messages don’t appear in the guestbook before you see them might be to have the CGI script that saves the users’ entries actually save them to an alternate guestbook. You could then replace the public guestbook with the alternat one periodically. I don’t think I would ever do it that way but it’s an idea. Other ideas are to require users to go through a registration process before they’re allowed to write in guestbooks or messageboards. Users may not appreciate the extra work, though. One more idea would be to have the guestbook script email the person a confirmation message that they must reply to. By checking the email address that they submitted in the guestbook form against the one that they reply with, you’ll be able to verify that the address is real. Most SPAMMERs and trouble-makers will use a fake email address and their guestbook entry will be rejected. I like that idea.
- One last thing I’ll mention that’s sort of related, is that if you have a bulletin board, guestbook, or some other form on your website that allows users to submit personal information, you could be breaking the law. One of the privacy initiatives of the FTC involves a law that is supposed to protect children from putting personal information on the web and I guess getting stalked or something afterward. Look at coppa.org to see how this might effect your website and what you can do to comply with the law.
This work is licensed under a
Creative Commons Attribution-NoDerivs 3.0 License.
If this post is interesting to you, subscribe to this site.
Posted: December 16th, 2005 under internet.
Comments: none
Write a comment