Fighting Spam

Background

I am sick of spam, worms and viruses. I don't distinguish much between them but worms are generally more annoying since they cost me a lot more money (wasting significantly more bandwidth and diskspace).

In the past I've dealt with this trash using various filtering techniques.

On the morning of September 19th, 2003 I woke up to a rather unpleasant sight. Since going to bed 7 hours earlier I had received 47 copies of a Windows executable. Some quick research revealed that it was a new worm known as W32/Gibe.F. The virus is known by several other names, most commonly Swen.

I was receiving about 200 of these emails every day for the first few days after the outbreak. Becoming rather tired of that I took the suggestion of Henry Spencer and configured my email server (Postfix) so that it would not accept messages larger than 140,000 bytes. I kept this solution in place for almost a month, knowing it wasn't a long-term/practical fix but, happy that my bandwidth was being saved (the virus appears to talk ESMTP).

Finally one day in early October an aquaintance was attempting to send me an email with a number of attachments. The email exceeded 140,000 bytes. I decided it was time to set my size limit back to a more sane (perhaps?) value. As soon as I implemented that change Gibe.F was back with a vengeance. As of October 19, 2003 I was receiving up to 330 viruses per day.

I began to check where the viruses were coming from and sending emails to the affected Internet Providers asking that they investigate their client and have the problems resolved. This task was time consuming and tedious, it had to be automated.

First Attempt at Automating the Fight

Upon reading the headers from the Gibe.F infected emails it was apparent that the virus does not spoof headers. This makes it relatively easy to determine the email's true source. By checking the last received header we can find the IP address from which the email originated. Looking up that address in DNS or, failing that, with whois leads us to appropriate email addresses for lodging our complaint.

Now I'm sure that someone else has done this before but that's not nearly so much fun so I wrote some scripts to do the job for me. I started by writing a simple perl script that finds where the email came from (an IP address) and maps that to a domain name. The exact technique that I use for this is as follows:

  1. Find an IP address in the last received header. In the rare case where there are multiple IP addresses in the header, my script grabs the first IP address (most likely to be the correct one).
  2. Perform a DNS lookup of the IP address.
  3. Determine if the hostname that I found in DNS accepts email by doing another DNS lookup; this time looking for a mail exchanger.
  4. If I cannot find a mail exchanger for the hostname then I go up one level in the dns heirarchy(1) and try to find a mail exchanger for that.

Feel free to download my script, I call it find-sender-domain. Here is the source code:


    

Using this script is simple. You feed it an email message on standard input or, you pass it the name of a file (containing an email). If the script cannot find an appropriate domain name then it will exit with an error (exit status 1). If the script does find a domain then it will give you that domain name on standard output. Appropriate place to email complaints to is abuse@domainname, I also CC postmaster@domainname for good measure.

Now that we have an easy way of finding out who sent us the worm we need to identify all copies of the worm and send appropriate emails.

I have a procmail rule that puts all HTML email into an HTML folder. This simple procmail rule catches almost all of my spam and, as luck would have it, most copies of the Gibe.F worm.

I use Clam AntiVirus to scan my HTML email folder and tell me which emails contain the worm. Since my folders are in maildir format each email is in it's own individual file. Having a single email per file makes it easy to find which email contains a virus; if your virus scan finds an infected file then that is one infected email that can be fed directly to the find-sender-domain script.

This could all be hooked into procmail and done automatically but I was lazy and wrote a few separate scripts that run every 5 minutes from a cron job. This is pretty atrocious but it does work:


    

If you understand shell scripting the above script should make some sense to you. Here is an explanation of what the script does:

  1. Scan the HTML folder for viruses
  2. For every copy of Gibe.F found it tries to find the domain of the sender
  3. If a sender domain is found then it moves the virus into a "Complained" folder and sends a form letter out to the suspected network admins. The Complained folder is used to hold viruses that I received and have sent a complaint about. I keep the viruses for proof and to satisfy requests for additional information (should that be needed).
  4. If I cannot determine a sender domain for the virus then I move it to a Gibe.F folder. This folder contains definite viruses but ones that I haven't successfully lodged a complaint about. If I get ambitious later I may go back to these.

Besides calling the find-sender-domain script, the shell script also calls a get-headers script. Since I'd used Mail::Audit in the first perl script I decided to stick with it for this script. Below is a copy of the get-headers script:


    

Clamscan detects a lot of the Gibe.F worms as "Exploit.IFrame.Gen" instead of as Gibe.F. This is due to the fact that some Gibe.F worms are wrapped in an exploit that causes the attachment to be automatically executed by outlook. I have written a slightly modified script for the worms that are wrapped in Exploit.IFrame.Gen since I can't be 100% sure that they are Gibe.F unless I investigate further.

Results to Date

I wrote these scripts on October 19, 2003 after about 24 hours of use I had emailed 1,242 complaints (almost 1 week's backlog). I received numerous automated replies from abuse departments as I would expect, I also received quite a few personal replies, even thank you letters for my action. Here are a few responses:

In the first 2 days about 6 people lost their Internet connectivity as a direct result of my emails. A whole bunch more now understand that updating their computer is important. Unfortunately most will now believe that anti-virus software is very important, unfortunate because Linux or MAC OS would be so much better a choice for them that to stick with Microsoft Windows.

10 days after implenting this the results have been excellent. Here are my statistics on number of viruses received:

Now I have no proof as to whether people just got their act together or whether my complaints to ISPs really made a difference, I like to think the latter.

Problems/Possible Enhancements

There are a number of issues that this script simply does not handle.

  1. Many IP addresses do not have hostnames assigned to them (PTR records), we ignore email from these IPs.
  2. Many domains do not have mail exchanger (MX) records assigned. They rely on an A record getting the email to them. While their email may work there's no way for us to anticipate that it does, we do not email these domains.
  3. Again on the topic of poorly setup email, there are many systems on the Internet that do not have an abuse or postmaster account. For now I manually complain to these domains (if time permits) and ask them to consider setting up appropriate aliases.

Some improvements that I believe can be made:

  1. Since script doesn't check for spoofed headers this is only useful for viruses and worms that we know have correct headers. If I add spoofed header detection, and that detection can be made reliable, then this script could be useful for automating all manner of email complaints (spam, worms, viruses, ???).
  2. In cases where an MX record cannot be found we could lookup the IP address using whois instead(2)

If you have any comments about this webpage or better methods of fighting back please email me.

Footnotes

  1. Given a hostname of ppp123.wehave.net I check if an mx record exists for ppp123.wehave.net. Failing that I check if an mx record exists for wehave.net. Failing that I check if an mx record exists for net.
  2. I have not implemented whois lookups yet because the whois records for IP addresses served by ARIN, APNIC, LACNIC and RIPE are not consistent. It may prove difficult to drill down into the whois databases and automatically find an email address appropriate for abuse reports. If anyone knows of existing scripts that do this work please let me know.
  3. As of early 2004, I gave up on SPAM and Virus fighting. I now use whitelists to pre-approve all email. I use TMDA which largely automates this, it keeps the bad guys away while allowing legitimate senders (even those unknown to me) to communicate.