SuYash Security Zone: Filtering E-Mail with Postfix and Procmail

Referred: http://www.securityfocus.com/infocus/1593
Brian Hatch 2002-06-17
Note: This article has been taken only for study and reference purpose only.No commercial benefit is involved.

Introduction: An Overview of Server Solutions for Spam Reduction

Most folks dislike spam in their e-mail. Spam takes up our network, disk, and cpu resources. It requires that we weed through unwanted messages to find the ones that we requested. (I'm not going to try to convince you that spam is not good, you can check out some of the anti-spam resources listed in the relevant links section below, if you're interested.) Fortunately, we have many points along the network at which we can implement spam filtering, particularly:

SMTP daemon;

local mail delivery agent; and,

the e-mail client

SMTP Daemon

As the e-mail message is received by our SMTP daemon (Postfix, Qmail, Sendmail, etc) we have our earliest chance to reject the offending message. If it is possible to determine that the message is spam at this stage, we are able to keep the message off our systems with the least amount of our own resources. For example if we know that the user 'yet-more-spy-software@example.com' is a spammer, we can reject the message as soon as the 'mail from:' SMTP command is sent. This has the benefit of being able to send back an immediate bounce message, and also means that we do not waste our bandwidth receiving a mail we would just be deleting anyway.

Local Mail Delivery Agent

Once the SMTP server determines that a mail is destined for a local user, it hands off the e-mail to a mail delivery agent to stick it in that user's mailbox. Sometimes this program is the SMTP daemon itself, but often it is not. It is common to use Procmail as a local delivery agent with Postfix, for example, even though Postfix has a delivery agent built in.

The local delivery agent can use its own spam determinations to plop the offending messages into dedicated spam folders or just dump them to /dev/null. Many local delivery agents are able to call external commands to filter messages directly or help make additional friend-or-foe determinations.

The E-Mail Client

The last place you can guard against spam is in your e-mail client. Some mailers can scan the message headers or content to route spam to separate boxes much like a local delivery agent could. However, this usually requires that the e-mail client retrieves the mail in the same manner as POP/IMAP rather than operating on a local spool.

Filtering Spam with Postfix and Procmail

This article is the first of a three-part series that will help systems administrators to implement the first two methods to filter out unwanted e-mails before they arrive in the end-user's in-box. Server configurations are more powerful and can filter for all users on the system at once, while e-mail client configurations can only protect an individual's right to spam-free e-mail. As systems administrators, most SecurityFocus readers will prefer that the unwanted e-mail doesn't even reach the client at all. Although mail filtering is an option for many STMP servers and delivery agents, this series will only focus on two: Postfix and Procmail, respectively.

An Overview of Postfix

The Postfix system is broken into many different programs each of which run with minimal permissions. The master process camps on port 25 as root and hands over connections to the smtpd daemon, which runs without root privileges. This program deposits e-mails in a queue directory where they are processed by another Postfix program and, eventually, delivered to a local mailbox. For a more detailed view of Postfix' inner workings, see Postfix-the big picture.

The entire Postfix configuration is contained in the /etc/postfix directory. The main configuration file is main.cf, which controls the variables used by all the pieces. The master.cf file controls which processes are run. (For example, you turn off inbound SMTP simply by commenting out the 'smtpd' line in master.cf.) After any change, you simply run 'postfix reload' to reload the configuration. As you might anticipate, 'postfix stop' and 'postfix start' stop and start the whole Postfix system.

Rejecting Spam with Postfix

The Postfix smtpd server has simplistic but effective spam detection abilities. It has a small set of data that it can look at to make a friend-or-foe determination, including:

Remote machine ip address/hostname

SMTP session based options (HELO/EHLO/etc)

Envelope sender

Envelope recipient

Message Header/Body

When Postfix determines that a message is spam, it will pause the SMTP conversation for a bit and then send an error. The delay is to slow down the sending system, partially in an effort to annoy dedicated spamming systems.

If Postfix can determine that a message is spam, it will respond to the sending machine with an error code, which consists of three digit numbers followed by informational text. There are two main kinds of error codes: 4xx codes indicate temporary failure, which means that the sending machine should try the mail again, while 5xx codes indicate a permanent failure. The most commonly used codes are represented in the following table:

Code	Meaning
554	Transaction failed (This is the default for messages using Postfix' REJECT syntax.)
550	Requested action not taken: mailbox unavailable.(Good for sending permanent errors.)
450	Requested mail action not taken: mailbox unavailable. (Good for sending temporary errors.)

For a full list of error codes, see RFC-821. We'll see how we can select error codes in the next section.

When debugging your spam rejection rules it may be convenient to supply 4xx error codes so that the client retries messages again later. This will keep you from losing legitimate e-mail as you tweak your spam blocking rules.

Postfix Map Files

Many of the Postfix configurations we will discuss in this two-part series can employ external map files. These files contain e-mail addresses, host names, or other data that you may wish to accept or reject.

As an example, we'll take a look at a map that could be used by the various smtpd restriction commands:

# OK means to accept the message.  Useful to make sure that certain mail # that would satisfy later restrictions gets accepted anyway.  Make sure # you list these rules first. #  trusted_friend@example.com	 OK grandpa_george@some_domain.org     OK  # REJECT drops the message with a default error message. #  hostile_domain.com		 REJECT  # Any 5xx code means a fatal error, and the mail client should not try again. # Here we'll make a specific rejection message to send as part of the # bounce message, rather than the default REJECT version. #  ex-girlfriend@some_host.net      554 Get a life and move on.  # Code 4xx means that the client should retry later, it's a non-fatal error. #  busy-mailinglist@example.org      450 Sorry, we're performing an upgrade of  our mailinglist software, be back on Thursday.

The first argument on the line lists some data that will be used by a spam-determination rule, usually a hostname or e-mail address. (For regular expression maps, the first argument is everything between a set of slash (/) characters. Those are covered later.) The remainder of the line is the SMTP result that will be returned if the first part matches. Each Postfix restriction (based on message sender, HELO response, etc) can have different return values. However, they all include the "OK", and "REJECT" options, which mean to accept or reject permanently the e-mail, respectively.

These flat files need to be converted to a Postfix lookup table, generally a hash, dbm, or btree file. Lookup tables are indexed versions of flat files, which means finding entries is extremely fast. Hash and btree files are named with a .db extension, while dbm files are actually two files, one with a .pag and one with a .dir extension. Unless you specifically prefer one form over another, you can stick with your system's default, which you can learn by running the following command:

    $ postconf |grep database_type     default_database_type = hash

In the remainder of the article I'll explicitly use hash files, just to be specific - the choice is up to you. Thus, if I were to include the 'check_client_access' restriction (which we'll get to later) into some rule, I'd be using:

    check_client_access hash:/etc/postfix/access

In the case shown above, our Postfix installation prefers hash (.db) files. In our /etc/postfix/main.cf we will be pointing to such indexes, and will need to name both the file name and the indexing method. Thus, if I were to tell the 'check_client_access' restriction (which we'll get to later) that it should perform lookups in the access file, I'd say the following in /etc/postfix/main.cf:

    some_configuration_rule = check_client_access      hash:/etc/postfix/access

The check_client_access determination will look at the /etc/postfix/access file hash (technically access.db) to lookup keys.

Any time you make a change to files that are being used in this way, you'll want to rerun the postmap command, and restart Postfix. Rather than manually run postmap for each file, you can save some time by creating a Makefile in /etc/postfix, such as this one:

    $ cat /etc/postfix/Makefile     MAPS: map1.db           map2.db           map3.db      %.db: %           postmap $<

List all the maps you need (access.db, virtual.db, etc) in the MAPS section. The bit at the end just tells make to run postmap on the plain text file to create the db (or dbm, etc) file. So each time you make a change to a map, simply do the following:

    # cd /etc/postfix     # make     # postfix reload

Most Postfix installations chroot the various daemon programs, which means they cannot see that you've changed the files in /etc/postfix. So remember to reload the Postfix system each time you make a change to these files.

Blocking E-Mail Based on the SMTP Client

Postfix can block e-mail based on the machine that is attempting to send the e-mail. All Postfix has available is the machine's IP address and host name (derived from a reverse DNS lookup). There are three main methods available: rejecting machines with no reverse DNS entries, explicit entries in map files, and DNS blackhole lists.

Blocking Machines With No Reverse DNS

Most administrators make sure that there is a valid reverse DNS entry for all their machines. This is useful in many respects, such as allowing the machine to access systems protected by TCP Wrappers. You can have Postfix deny access from machines that do not have reverse DNS entries by creating the following entry in /etc/postfix/main.cf:

    smtpd_client_restrictions = reject_unknown_client

Every IP address can have a reverse DNS entry (PTR record in DNS-speak) that maps that IP address to a host name.

Some host-based authorization services use reverse DNS to determine if a machine is allowed to access certain resources. TCP Wrappers, for example, can be used to allow only machines in some_domain.com to access the SSH server. For this reason, most hosts configured by competent administrators will have a valid reverse DNS entry.

Spammers, on the other hand, often use machines with no reverse DNS entry, because then we'd be able to block e-mail based on that information. However, we can block hosts based on the fact that they do not have a reverse DNS entry by using smtpd_client_restrictions = reject_unknown_client.

Standard practice encourages PTR records for all IP addresses. However, there is no rule or RFC that says that this is required. There are many machines out there that do not have PTR records, so using this rule will block legitimate e-mail. If you do enable this restriction, be prepared to manually maintain a map file that lists IP addresses of machines that are allowed to send e-mail to you.

SMTP Client Map Restrictions

You can create a map file that lists any combination valid or invalid machines, such as this one:

    # Whoops, we need to talk to these machines     # but they has no reverse DNS set up:     10.0.10.1	    OK     10.0.10.5	    OK     # Reject these guys, they keep sending us junk mail     # and won't take us off their lists     spam_central.com	 REJECT

You tell Postfix to look at this file (let's say it's a hash named /etc/postfix/client_restrictions) by adding the following line to /etc/postfix/main.cf:

    smtpd_client_restrictions = hash:/etc/postfix/client_restrictions

DNS Blackhole lists

Many Internet users have created lists of machines (IP address) that regularly send spam. Originally, these lists were published and those who wished to block those hosts would import the data into their spam blocking mechanisms. Unfortunately, this wasn't easily automated, and you needed to perform the updates periodically. Paul Vixie came up with an ingenious solution to this problem. He created the Real Time Blackhole List (RBL), which is available via DNS rather than flat files. The RBL is now part of the MAPS (Mail Abuse Prevention System) at mail-abuse.org.

There are now many different sites that offer RBL-style lookups, often referred to as DNSBL (DNS-based Blackhole Lists). You simply configure the DNSBL domains you wish Postfix to handle, and it will check the client IP address against these lists. If the IP address is registered at any of the DNSBL domains, then the message will be rejected.

If you wanted to use the bl.spamcop.net list you would add the following entry to your main.cf:

    stmpd_client_restrictions = reject_maps_rbl     maps_rbl_domains = bl.spamcop.net

You can list as many DNSBL domains as you wish and all will be queried. To use the MAPS+ RBL as well, your entry would read

    stmpd_client_restrictions = reject_maps_rbl     maps_rbl_domains = rbl-plus.mail-abuse.org                                                bl.spamcop.net

The most difficult part of implementing DNSBL blocking is determining which DNSBL you wish to use. You will need to read about the procedures used by the DNSBL you're considering in order to determine if you believe that their tests and motives are correct - both for adding and removing IP addresses from their databases. This is left as an exercise for the reader.

Personally, I use the DNSBL (which is currently free for 'hobbyists' although you must register) and bl.spamcop.net (which is free, but you should really donate to help support it.) Together these block about thirty e-mails to my server a day.

Bundling Postfix Restriction Options

We've seen three separate smtpd_client_restrictions we could implement. As you might expect, we can enable all of them by listing them together as:

    smtpd_client_restrictions = hash:/etc/postfix/client_restrictions,          reject_unknown_client, reject_maps_rbl

The order is important. If you want to be able to allow mail from a domain that is listed in a DNSBL database, then you need to make sure your map file (which contains the whitelisted machine with an OK) is checked before you check the DNSBL listings.

Blocking Spam Based on SMTP Compliance

The next method to block spam involves checking the remote system's compliance with the SMTP protocol, and verifying the data it provides. The rule of the Internet is to 'be strict about what you send and liberal about what you accept'; however, we will break that rule by dropping connections that aren't up to snuff.

SMTP HELO

Every SMTP session is supposed to begin with the client declaring who it is by sending a HELO or EHLO (extended helo) command like this:

$ nc mailserver 25 220 mailserver.example.com ESMTP ReegenMail EHLO my.host.name.some_where.com 250-mailserver.example.com 250-PIPELINING 250-SIZE 10240000 250-ETRN 250-XVERP

This article is the second of three articles that will help systems administrators configure SMTP daemons and local mail delivery agents to filter out unwanted e-mails before they arrive in the end-users' in-box. In the first installment, we offered a brief overview of Postfix, and began a discussion of rejecting spam with Postfix by blocking e-mail based on the SMTP client, blocking machines with no reverse DNS, SMTP client map restrictions, DNS blackhole lists, bundling Postfix restriction options, and blocking spam based on SMTP Compliance. In this part, we will look at sender/recipient restrictions, restriction ordering, and map file naming conventions before moving on to Procmail in the final article.

Postfix Sender/Recipient Restrictions

Postfix can easily make spam determinations based on the sender or recipient of the e-mail. It is important to understand that there are potentially two different addresses for the sender and recipient of each e-mail. The first is the envelope address, which is presented during the SMTP conversation. The second is the actual "From:" or "To:" address in the message header.

To illustrate, let's look at an SMTP session, again using our friend netcat:

    # nc smtp.example.com 25     220 smtp.example.com ESMTP ReegenMail     helo some.host.dom     250 smtp.example.com     mail from: spammer@no_such_domain.com     250 Ok     rcpt to: innocent_bystander@example.com     250 Ok     data     354 End data with <CR><LF>.<CR><LF>     From: FooBar@some_address.net     To: My_Friend@some_other_address.net     Subject: You've just got to see this!       (blahblahblah)     .     250 Ok: queued as 1F75BD8     quit     #

The above message has an envelope sender of "spammer@no_such_domain.com". If this mail bounces, this is the address to which an error message will be sent. SMTP servers don't look at the actual contents of the e-mail to determine where the e-mail should go. Thus, although the "From:" line indicates that the message was sent from "FooBar@some_address.net", this is easily forgeable. Similarly, the envelope recipient here is "innocent_bystander@example.com" while it is listed as "My_Friend@some_other_address.net" in the mail headers themselves. The restrictions immediately available to Postfix are the envelope addresses (MAIL FROM/RCPT TO), not the one's in the mail headers (DATA).

Sender Restrictions

Spammers frequently use bogus sender addresses to avoid dealing with the inevitable bounces and flames. Postfix can try to determine if an address is invalid and, if so, assume the message is spam. Unfortunately, Postfix cannot actually verify that the e-mail address is valid without attempting to send mail to it. However, it can check to see that it could be possible to send e-mail to that address by verifying that there is an MX or A record for the domain itself. To enable this, use the following restriction:

    smtpd_sender_restriction = reject_unknown_sender_domain

If the sender domain is unknown, Postfix responds with a 450 (temporary) error, which means that the client should retry the message later. This is important, as you don't want a temporary DNS problem to result in the loss of legitimate e-mail. However, if there's no way for you to possibly reply to this message, it is most likely spam.

"MAIL FROM" addresses should be fully qualified ("mailserver.example.com", instead of "mailserver" for example) if you want to be sure you can send return e-mail. You can drop mail that is not fully qualified by using:

    smtpd_sender_restriction = reject_non_fqdn_sender

Lastly, you can create a map file listing valid or invalid senders, such as:

    # allow larry@example.com     larry@example.com          OK       # reject anything else coming from @example.com     example.com		       REJECT

and integrate it with your spam determination via:

    smtpd_sender_restriction = check_sender_access maptype:mapname

If you wanted to include all these restrictions, the most logical method would be to specify:

    smtpd_sender_restriction = check_sender_access maptype:mapname, 	reject_non_fqdn_sender, reject_unknown_sender_domain

Recipient Restrictions

Our last restriction is based on the message recipient (again, the address listed in the 'RCPT TO' SMTP dialog, not the 'To:' address). The recipient restrictions are similar to the sender restrictions, namely reject_known_recipient_domain, reject_non_fqdn_recipient, and check_recipient_access, and they work in the same manner. Thus, you could include the following options in addition to any you already have:

    smtpd_recipient_restriction = (other restrictions here)         check_recipient_access maptype:mapname,         reject_non_fqdn_recipient, reject_unknown_recipient_domain

Your Postfix configuration should already have some smtpd_recipient_restriction value set. In order to prevent your mail machine from acting as an open relay, Postfix requires that you have some rules (using the "check_relay_domains", "reject_unauth_destination", or "reject" options) to dictate which machines are allowed to relay e-mail through your server. The correct settings for these are specific to your environment. A common entry is "permit_mynetworks, check_relay_domains', so to add spam detection rules, you might change this to:

    smtpd_recipient_restriction = permit_mynetworks, 	check_relay_domains, check_recipient_access maptype:mapname, 	reject_non_fqdn_recipient, reject_unknown_recipient_domain

Just as a reminder, make sure that you don't eliminate your current anti-relaying settings while trying to add spam filtering, or your machine will reject all incoming e-mail!

Postfix Restriction Ordering

Postfix processes the various restrictions in the following order:

    smtpd_client_restrictions     smtpd_helo_restrictions     smtpd_sender_restrictions     smtpd_recipient_restrictions

This is, unsurprisingly, the order in which the SMTP session occurs. These restrictions may (and often do) contain multiple checks. Each one of them will result in one of three values:

OK - The mail is good, let it through;

DUNNO - Inconclusive results from this restriction. Check the remaining restrictions to see what to do; and,

REJECT - The mail should be rejected. Don't check any later restrictions.

If a restriction returns "DUNNO", then the later restrictions will be checked. If it returns either "OK" or "REJECT", then the later restrictions are NOT checked. As soon as one of the restrictions determines that the message is or is definitely not spam, no further checks are performed.

The problem arises if you want to have later restrictions override a previous restriction. For example, you may have an e-mail address 'spam-troll@example.com' that exists soley to recieve spam and send it to a spam-reporting site. You would want all mail to this account to be delivered, even if earlier restrictions (say a HELO check) would normally block the e-mail.

The easiest method is to put all your restrictions into the smtpd_recipient_restrictions option. At the point that the recipient has been sent by the client, Postfix has everything but the actual e-mail itself to help make a friend-or-foe determination.

This may sound surprising - thus far I've been implying that the various options belong to a specific restriction. In actuality, I've been showing you the default place to put the restriction options. Any check can be performed in the earliest or later restrictions. A DNSBL check, for example, is a restriction based on the client IP address, and thus would most "logically" belong as:

    smtpd_client_restrictions = reject_rbl_domains

However, it could be put in any of the later restriction clauses as well, such as

    smtpd_recipient_restrictions = check_recipient_access      	hash:/etc/postfix/recipients, reject_rbl_domains

This would allow our spam-troll@example.com mail (presumably listed as "OK" in the /etc/postfix/recipients map file) to be delivered, even if the system sending the e-mail is listed in the DNSBL domains.

As you can see, it can be a bit tricky to tweak your Postfix restrictions just right, as you have multiple restrictions you can play with, and multiple checks you can do in each restriction. I suggest that you test and retest any changes you make to your configuration.

For a full list of the checks available for each of the restrictions, you should check the Postfix man pages, and also the Postfix UCE Web page. If you need to do advanced restrictions, in cases where sticking them all into smtpd_recipient_restrictions isn't sufficient, check out the Meng Weng Wong's Postfix UCE guide, which provides helpful hints about pitfalls you're likely to encounter.

Header and Body Restrictions

All the restrictions listed thus far are performed during the initial SMTP conversation, before the actual e-mail is accepted. If these restrictions are met, then the client may issue a DATA command, and Postfix will begin slurping up the message data itself. However, Postfix has one more chance to reject the e-mail. After the client sends the "." to signify the end of the message data, Postfix would return a 250 response code if all is well. However, if you wish to reject e-mail based on the message contents, Postfix can respond with an error, and will not accept the message.

This method is useful because the client machine will think your address is invalid, which may get you taken off their spam list. It also means that you haven't sent the message along to your local delivery agent. However, you are receiving the e-mail over the network, so your bandwidth has not been spared.

There are two message content-checking algorithms. The first is header_checks, which compares lines against the message headers (everything from the beginning of the message up until the first blank line.) The other is body_checks, which checks only the body of the message. Using header_checks is faster than body_checks, because message headers are typically 1-2K, while the message body is much larger. Both use regular expression maps instead of the kinds of maps we used in the smtpd_*_restrictions. A regular expression map file looks like this:

    # comment     /regular expression1/    ACTION     /regular expression2/    ACTION       # You can also specify two patterns, in which     # case the line must match the first but not     # the second     /must_match_me/!/must_not_match_me/   ACTION

The valid actions are:

REJECT - Rejects the message with a 554 error code, and the generic error message "Message content rejected."

REJECT text - Rejects the message with a 554 error code, using the supplied text as the error message. (Not all versions of Postfix support this form)

WARN - logs the header via syslog, but does not actually reject the message. Good for testing. (Not all versions of Postfix support this.)

IGNORE - Causes the matching line to be deleted from the message. Yes, the term IGNORE is a bit of a misnomer.

OK - Valid, but useless.

The header_checks and body_checks look through the message until a REJECT, WARN, or the end of the message is reached. For this reason, "OK" doesn't really do anything except waste CPU cycles. Using "OK" will NOT short-circuit the logic and accept the message: Postfix will continue to look through the rest and may well find a REJECT line that matches.

There are two kinds of regular expression maps that Postfix can support: regexp, which are standard regular expressions used by grep and friends, and pcre, which are perl-compatible regular expressions. Use whichever one has the functionality you need and you are most comfortable with. Thus your main.cf could look include lines like this:

    header_checks = pcre:/etc/postfix/header_checks     body_checks = pcre:/etc/postfix/body_checks       # Or, if we prefer regexp instead:     #   header_checks = regexp:/etc/postfix/header_checks     #   body_checks = regexp:/etc/postfix/body_checks

Then simply create your header_checks and/or body_checks files. For example:

    # Header checks file     /^Subject: Internet Sic Codes/  REJECT     /^Subject: ADV /                REJECT       # Encourage good Netiquette     /^X-Mailer: Microsoft Outlook/  REJECT Sorry, we don't accept mail from LookOut!   ---       # Body checks file     /you signed up to receive/	REJECT     /this is not spam/	 REJECT Liar.     /rent is overdue/	 REJECT Sorry, your message cannot be delivered successfully.

The regular expressions are case insensitive by default. If you want to force case sensitivity, include a "i" at the end of the closing slash (yes, this is backwards from the norm). You can also extract sections of the matched pattern using parens, which you can then include in your error message using ${1}, ${2}, and so on. Read the sample regexp and pcre files included in the Postfix distribution for more information and examples.

How to Avoid Bouncing Legitimate E-Mail During Configuration

When you're starting to implement your spam prevention configuration, you're quite likely to make errors that turn your mail server into an anti-social mail-dropping hunk of silicon. You can add "warn_if_reject" options before restrictions to tell Postfix to merely log that the message would have been rejected, without actually rejecting the e-mail. You then need to check your system logs for instances of "reject_warning" and make sure that the rules are acting correctly. Once you feel confident of your changes, remove the "reject_warnings" option and your spam prevention is in place for real. Similarly, when tweaking header_checks and body_checks, use the WARN option instead of REJECT.

Map File Naming Conventions

The default Postfix configuration and sample files use the same file for all its maps: /etc/postfix/access. This means it would use this file for all its restrictions, which may be more restrictive than you wish. Say I had all my maps in /etc/postfix/access, like:

    smtpd_something_restrictions = ... ,          check_recipient_access hash:/etc/postfix/access,          check_sender_access hash:/etc/postfix/access,          check_client_access hash:/etc/postfix/access

and the access file looked like this:

    # reject mail from bad_domain.com     bad_isp.com   REJECT

Then I would not be able to receive e-mail from @bad_isp.com, any machine that had a reverse DNS entry inside bad_isp.com, and I couldn't send any mail to someaddress@bad_isp.com.

If all I meant to do was keep people from sending mail from addresses of @bad_isp.com, but I do exchange e-mail with people who have their network connectivity through them, then using one file will not work. Similarly, an "OK" restriction could be overly permissive, resulting in spams sneaking through. I prefer to use separate maps for each restriction, à la:

    smtpd_something_restrictions = ... ,         check_recipient_access hash:/etc/postfix/access.recipient,         check_sender_access hash:/etc/postfix/access.sender,         check_client_access hash:/etc/postfix/access.client

This way I can specifically tailor the restrictions I need.

This is the third installment in a four-part series on filtering e-mail with Postfix and Procmail. The first two parts of this series focused on how you can stop receiving spam by configuring Postfix for spam prevention. This segment will introduce you to the methods of stopping spam with Procmail.

Stopping Spam with Procmail

Procmail is a local mail delivery agent, meaning that its only job is to deliver mail to a local user mailbox. Procmail can deliver mail to local mailboxes that are in mbox files, MH folders, or maildir folders, as well as sending messages to other e-mail addresses or local programs. Unless you have a specific mailbox format preference, you should probably stick with the default on your system, which is most likely mbox format. For the purposes of this article, I'll assume mbox folders are used.

Postfix uses its own internal local mail delivery agent by default. However, you can easily configure it to use Procmail instead, either globally or on a per-user basis.

Using Procmail Globally

The easiest way to use procmail as your delivery agent is to add the following line to your /etc/postfix/main.cf:

	mailbox_command = /usr/bin/procmail

You should also verify that mail for the root user is forwarded to a user account. For example, to forward all root mail to "bree" you'd use:

	# echo 'root: bree' >> /etc/aliases

Finally, rebuild the aliases database and reload postfix:

	# newaliases  	# postfix reload

That's it. Now procmail is in use for all local users.

Using Procmail for Specific User Accounts

If you prefer that only certain users get Procmail for their local delivery agent, or if you are not the Postfix administrator, then you can set up .forward files to launch Procmail for specific users. All you need to do is create a file called .forward in your home directory that looks like this:

	"|IFS=' ' && exec /usr/bin/procmail -f- || exit 75 #username"

Substitute "username" with your actual username and you're all set. And note that the quotes do belong in the file, they are not just there for this Web page.

If you are the administrator and want to enable Procmail for a specific user, you can create their forward file for them. Just make sure to chown the file appropriately when done.

What to do with Your Mail

Procmail's default purpose is to deliver your e-mail to your mailbox, usually /var/mail/USERNAME or /var/spool/mail/USERNAME. However, there are several other options that we have for filtering our e-mail.

Dedicated Spam Box

If you have a mailer that supports multiple inboxes, such as Mutt, Pine, or any IMAP-based mail client, then you can have mail delivered to multiple places. This is perfect for having a mailbox dedicated to each list to which you subscribe. This makes it easier to read your e-mail because you are in the appropriate mindset - you won't have messages from Great Grandma George in the same mailbox as your buffer overflow discussions for example. I prefer to create a mailbox that will be the repository for all the mails that are most likely spam. That way I can check this mailbox once a day when I'm bored and make sure nothing important ended up there by mistake, and can delete the rest. No need to look at any of the message content in general - the "From" and "Subject" lines are usually sufficient to distinguish them.

Many folks (myself included) prefer to name all mailboxes that can receive e-mail (rather than just being places you manually move e-mail) with a prefix such as 'IN.'. How you set up your mailer to check these alternate mailboxes is client specific. In Mutt, for example, you can add the following to your ~/.muttrc

	mailboxes =IN.fanmail =IN.family =IN.vpns =IN.hle \   	          =IN.bugtraq =IN.flames =IN.deaththreats

Or if you want to slurp all mailboxes in alphabetically:

	mailboxes = `echo $HOME/Mail/IN.* | sed -e "s#$HOME/Mail/#=#g"`

This is good to make sure you don't accidentally forget to check an inbound mailbox that you set up via Procmail later. You can even merge the two in such a way that you have a specific order of mailboxes to check, followed by all others not specifically named:

	mailboxes =IN.fanmail =IN.family =IN.vpns =IN.hle \   	          =IN.bugtraq =IN.flames =IN.deaththreats \ 	          `echo $HOME/Mail/IN.* | sed -e "s#$HOME/Mail/#=#g"`

In Mutt you can check the alternate mailboxes by using the "c" command. Mutt will tell you when new mail has been delivered to these periodically too. If you're using a different mail program, you're on your own.

Adding Mail Headers

Another option would be to insert new mail headers for any message that is likely to be spam. You can then use this header for additional Procmail rules (to shuttle the offending mail to a dedicated spam box, for example) or use it within your mail client. Many mail clients are able to look for specific mail headers and then flag the messages differently.

Mutt, for example, can use mail headers as part of its scoring algorithms to put spam messages at the bottom of your mailbox where you don't need to look at them. Other programs can color code mails with this header, letting you identify and delete them faster.

/dev/null

Instead of a dedicated spam mailbox, you can send mails that are determined to be spam directly to the bit bucket. If you do this then the message will never reach your e-mail spool whatsoever, and you will have no trace of what the message was. The sender will have no idea that you did not get the mail.

I generally don't like this option because if a legitimate message was dropped to /dev/null then I will never know what it said, and the sender won't know that I didn't get it since they get no error message. I strongly recommend against using this.

Exiting with an Error

If Procmail exits with an error, then the mail server (Postfix) will assume there is some problem delivering the message, and it will either retry later, or give up and send back an error to the envelope sender. This is somewhat better than using /dev/null because the sender will get an error message indicating that their e-mail was not properly received. In the case of a legitimate e-mail, the sender will know to retry. In the case of a spam, the spammer may even take your address off their list, although this is unlikely.

Procmail Recipes

Procmail reads global configuration from /etc/procmailrc, and then user configuration from $HOME/.procmailrc. These files contain "recipes" that tell the program what to do. These recipes could tell Procmail to deposit the mail somewhere, forward it to another e-mail address, or filter or pipe it through an external process. The procmailrc has both and/or style logic available, as well as grouping with {} for control flow. You can create procmailrc files that do exactly what you need that are almost as unreadable as perl code.

I'll be keeping the Procmail commands pretty light here. If you wish to hurt your brain, then investigate the full power of Procmail by reading the procmailrc and procmailex manual pages. You can (and should) comment your .procmailrc files in the standard shell way, by using lines beginning with "#".

The most important thing to understand is that there are two kinds of recipes. A delivering recipe is one that will save, forward, or pipe the e-mail somewhere. At that point procmail considers its work done, and it stops processing the e-mail. (Unless you specifically tell it to continue with the "c" flag, which is discussed below.) Anything else is a non-delivering recipe - including things like message rewriting via the filtering mechanism I discuss later - and the processing of the mail continues until a delivering recipe is found or the end of the .procmailrc is reached.

So, let's see what these recipes look like.

Recipe Heading

All recipes start with a line that looks like this:

 	:0 [flags] [ : [lockfilename] ]

There are a bunch of flags available to you. The ones you'll use most often are:

Condition-modifying flags, such as:

H - egrep the header for the conditions listed below. This is the default unless other flags are specified.

B - egrep the body for the conditions listed below

D - tell egrep to be case sensitive. (By default it is case insensitive, which is usually what you want.)

Action-related flags, such as:

h - pipe the header to an external program

b - pipe the body to an external program

f - filter the message through an external program. Procmail discards the existing message and replaces it with the output of the program. Excellent for adding or removing headers, often with the formail program that is included with Procmail. This is not a delivering recipe as it only modifies the message.

W - wait for the external program to finish. (Including it is a good idea when when using h, b, or f.)

Flow control, such as:

a - execute this recipe if the previous one matched and completed successfully - basic "AND" logic.

E - execute this recipe if the previous one was not executed at all - basic 'ELSE' logic.

c - Generate a new copy of the message. This is used if you want to have a message delivered somewhere (say to a special mailbox or bounced to another e-mail address) and have Procmail continue processing the procmailrc file to deliver it somewhere else too.

There are others, and I even hand-waved away some of the additional features of the flags I listed above. See the procmailrc man page if you're interested.

All the flags you list are "and"-ed together. For example:

	:0 BDfh

would mean to egrep the body of the message case sensitively for the patterns found on the following lines, and if the patterns match then filter the header of the message through some program.

If you put a colon after the flags then Procmail will create a lockfile before performing your actions. This is important on any recipe that will end up saving the e-mail somewhere, as you don't want two copies of Procmail writing to the same file at the same time. You can specify a name for the lockfile if you wish, such as:

	:0 Bf:busy_lock

If you don't specify a filename, then Procmail will generate one by using "filename.lock", where "filename" is the name of the target file. Because of this, there's no need to specify a lock file name in general.

Recipe Conditions

The most common conditions are simple egrep pattern match strings. You tell Procmail to look at a portion of the message (header or body) and see if the regular expressions match. Each regular expression (you can specify more than one) is listed on it's own line with "*" at the beginning, such as:

	* regular_expression

For example, the following recipe will match e-mails with those two Subject and X-Mailer headers. (The default is to egrep headers, unless flags are specified.)

	: 0  	* ^Subject: make money fast  	* ^X-Mailer: AOL 6.0

Since frequently you'll want to be able to determine if a message is to a specific destination, Procmail includes two handy shortcuts:

	^TO	Match a destination (To/Cc/Bcc/etc) containing  		the specified word.  	  	^TO_	Match a destination (To/Cc/Bcc/etc) containing  		the specified address.

Thus, the following is a good condition to add to match only messages addressed to you specifically:

	* ^TO_myaddress@my_isp.net

You also have the ability to negate any regular expression using the ! character, thus the following would match messages not specifically addressed to you:

	* !^TO_myaddress@my_isp.net

When you include multiple regular expressions, each must be matched for the action to be taken. However, Procmail also has the ability to use multiple regular expressions with scoring, enabling each to have different weight. You include a score and exponent (used to affect the score when found multiple times) before the actual regular expression. Procmail tallies the score and if the result is positive, it considers it a match:

	:0  	* -200^0 ^Content-Type: text/html  	* -300^0 ^Subject:.*(ADV|Sex|Viagra|Enlargement)  	* -300^0 ^From:.*(hotmail|aol).com  	*  100^0 ^Subject:.*Re:  	*  300^0 ^User-Agent: Mutt  	*  501^0 ^TO_myaddress@my_isp.net  	IN.spam

The previous recipe gives negative weights to various common spam characteristics and positive weights for indications of legitimate e-mail.

Procmail's scoring is rather powerful, but I won't go into it further because we'll see later that most of the spam-related scoring algorithms are better performed in external products like SpamAssassin, which we will call from Procmail. However, if you are interested in using scoring algorithms, consult the procmailsc man page.

Recipe Actions

The last part of each recipe is the action to take. There are four actions you can have, which I will explain in detail below.

1. Delivery to local file

Deliver the e-mail to a local file. If the action line begins with anything other than a "!", "{", or "|" (which are the remaining possibilities described below) then the action is simply the filename to which the mail should be delivered. An example would be:

  	:0:  	* ^TO_bri@hackinglinuxexposed.com  	IN.hle

In the example above, mail to my HLE address is automatically dropped into the ~/Mail/IN.hle mailbox. The type of mail spool used depends on the suffix you use for the filename.

  	filename	standard mbox file format  	filename/	maildir folder  	filename/.	MH folder

Again, unless you prefer one form or another, stick with the default for your system, which is likely mbox format. Procmail will create the file if it doesn't already exist.

One thing to note: if the target is a directory (as opposed to a proper maildir or MH folder directory structure), then the messages will be stored in that directory as separate files named "msg.XXXXX" (for varying values of XXXXX.) So make sure you type your filenames correctly.

2. Begin nesting block

It's often helpful to take multiple actions based on some conditions. You create a nesting block using curlies, inside of which are more recipes that are only executed if the conditions are matched.

For example you may want to group all your logic that is based on the presence of a list header into a block, rather than listing each combination set separately. An example would be:

	:0 h  	* ^Subject: security alert  	{   		:0 c  		$DEFAULT    		:0  		! my-pager-address@example.com  	}

Since the "Subject:" line matched, the two recipes inside the curlies are executed. True, this logic could have been written as:

	:0 hc  	* ^Subject: security alert  	$DEFAULT    	:0 h  	* ^Subject: security alert  	! my-pager-address@example.com

But by grouping them together it was more readable. Advanced rules may require (and will definitely be more readable with) curlies, especially when using and/else flags.

($DEFAULT is your default mail destination, by the way. I'll cover some handy variables in a minute.)

3. Forward the message

You can e-mail a copy of the message simply by listing the e-mail address or addresses after a "!", such as:

	:0 h  	* ^TO_helpdesk@our_company.net  	* ^Subject:.* Windows(95|98|ME|NT|2000|XP| )  	! bill_gates@micro_soft.com

If you're exclusively a Unix administrator, you may find that this recipe correctly routes help desk e-mails to the true offending party. Since this is a delivering recipe, you would never see these mails at all.

4. Pipe the message into an external program

You can pipe a message into a separate program in two different ways. The first is to use the program as a final delivery, such as this:

	:0 bhW  	* ^Subject: Re-sync DNS    	| /opt/bin/resync-dns

In that case, the resync-dns command will receive the e-mail and do something with it. Procmail's job is finished, and no other copy of the e-mail goes anywhere. Some places have used SMTP as a "reliable" messaging service to trigger tasks remotely just like this example.

The other option is to use the external program as a filter, which will be used to re-process the mail for Procmail, such as:

  	:0 bfW  	* Greeting card  	| /opt/bin/snag-card-from-web

Assuming you have some program that understands the format of the e-mail sent by those on-line greeting cards your Grandfather just loves, it can extract the URL from it, run it through 'lynx --dump -force_html $URL' and you'll never need to actually click to get to the card at all. The output of your program will replace whatever was filtered through it (in this case the body, due to the "b" flag). A filtering recipe is not a delivering recipe. Thus your filter will replace the ad-laden greeting card info with the actual greeting card contents, and then Procmail will continue processing, eventually delivering that new e-mail somewhere. The previous example may have been better written as:

  	:0 bfW  	* Greeting card  	| /opt/bin/snag-card-from-web    	:0 A:  	$DEFAULT

This will filter the e-mail and deposit it instantly into the default mail spool. There's no need to continue looking through the remaining Procmail recipes, just deliver it.

One of the most convenient filters you can use is the Formail program, which comes with Procmail. You can use it to add or delete message headers, amongst many other features. For example you could use the following:

  	:0 Wf  	* ^Subject:.*Internet SIC Codes  	|formail -A "X-Spam: yes"   -A "X-Spam-Identifier: formail rocks"

to add two "X-Spam" headers to the e-mail, which could be used by your mail client to filter or sort the e-mail as you see fit. See the formail man page to learn all it can do.

Procmail Variables

There are a number of variables that Procmail uses, and you can define any number of them you like for later. You define a variable simply by putting

	VARIABLENAME=value

at the top of your .procmailrc. The most important variables used by Procmail itself are:

	VERBOSE		If set to 'on' then you'll get lots of verbose logging.  			(Defaults to 'off'.)  	LOGFILE		Place to write procmail logs (/dev/null by default.)  	MAILDIR		Directory to which all relative pathnames are located.  	DEFAULT		Default mail spool (/var/mail/username, for example.)

When testing, it's best to enable logging by defining a LOGFILE and setting VERBOSE to "on". If you plan to filter e-mail to multiple folders within the same directory, it's easiest to set MAILDIR to an appropriate value in your home directory, such as:

	MAILDIR=$HOME/Mail

Then any filenames you use as delivery actions will reside in $HOME/Mail unless they begin with a "/". All the other Procmail variables have pretty sane defaults that you won't usually bother with. You can set your own variables as well. The one that I use the most is:

	SPAM=IN.spam

I use variables for all my frequently used mailbox names, both for convenience and to avoid my frequent typos. For example:

	# Catch commonly used spam-sending programs.  	# Thanks for acknowledging yourselves in the headers...  	:0 H:  	* ^(Cyber-Bomber|E-Broadcaster|Ellipse Bulk E-mailer|E-mailBlaster|MailKing)  	$SPAM    	# I'm fine, thank you.  No need.  Really.  	:0 H:  	* ^Subject.* Viagra .*  	$SPAM    	# Why am I in so many people's address books?  	:0 BD:  	* I send you this file in order to have your advice  	$VIRUS

Mail Filtering Methodology

Detecting and filtering spam at the mail delivery agent takes CPU time and will slow down your mail delivery. It's completely unavoidable. Because of this, you want to perform your filters and tests in the most CPU-efficient manner possible, without compromising your detection rules. I filter my mail through the following steps:

Take any messages that are from moderated lists I've subscribed to and put them in their own folders. Presumably moderated lists won't allow spam through, so let them through right away:

	# deposit bugtraq mails    	:0:  	* ^List-Id: <bugtraq.list-id.securityfocus.com>  	IN.bugtraq

Accept other messages that are typically not spam:

	:0:  	* ^From: .*@current_client.com  	IN.current_client    	:0:  	* ^From:.*great_grandpa@his_e-mail.com  	$DEFAULT

Identify spam based through procmail natively:

  	# Drop messages which are pure HTML - Probably spam  	:0:  	* ^Content-Type: text/html  	$SPAM    	# I've got a real one, thanks.  	:0:  	* ^Subject:.*(real|university|authentic).*DIPLOMAS  	$SPAM    	# No From header?  What are you hiding?  	:0:  	* !^From:  	$SPAM

Perform more time-consuming but robust spam-detection. In this case I have Razor determine if the message is spam, and deposit it in the $SPAM mailbox if that's the case. (Razor is discussed in the next installment in this series.)

	:0 Wc  	| /usr/bin/razor-check    	:0 Wa  	$SPAM

Deposit mail from unmoderated mailing lists and other messages that don't go to $DEFAULT:

	:0:  	* ^X-List-Name: openssl-users  	IN.openssl

Drop all other messages to $DEFAULT

  	:0:  	$DEFAULT

You don't actually need this rule, since it's the default action anyway.

This system gets messages that are very likely not spam to the proper destinations as quickly as possible, minimizing the CPU usage of the tests. I also exclusively use header-based rules early on, since matching against the headers will take much less time than checking the entire body of the mails, in general.

Interfacing Procmail with External Spam Detection Programs

Procmail's power is in its simplicity. It allows you to make sophisticated decisions based on message content with both static expressions and scoring decisions. However, implementing all your spam blocking inside Procmail itself means you must maintain thousands of rules to make your friend-or-foe decision. This is not only a hassle, but it also means that you'll spend more time tweaking your spam rules than you may save by simply hitting the delete key.

For this reason you may wish to call external programs designed specifically for spam detection. You tell Procmail to pipe the e-mail to the spam detection program, and this program either modifies the message or exits with an error code indicated the message

is spam. Procmail then filters the message accordingly.

250 8BITMIME

Postfix doesn't normally require a HELO/ELHO to receive e-mail, but by including smtpd_helo_required = yes in /etc/postfix/main.cf, every client must provide a HELO before beginning the rest of the SMTP session. Some spam software does not send a HELO, and by using this option you may successfully block such bone-headed software.

This option alone may not do too much good. The host name sent by the client may be totally bogus, and this option alone doesn't care. However, we can place restrictions on the hostname provided, as seen here:

    stmpd_helo_restrictions = reject_invalid_hostname

There are several restrictions you may place, which I will list here briefly:

reject_invalid_hostname - reject unless the hostname has valid syntax.

reject_unknown_hostname - reject unless the host has a valid MX or A record in DNS.

reject_non_fqdn_hostname - reject unless the host is fully qualified.

permit_naked_ip_address - Postfix will allow dotted quads that are not wrapped in square brackets (à la [127.28.29.1]) even though it violates the RFC.

check_helo_access maptype:mapname - look up the hostname in the file mapname and reject or accept as appropriate.

These settings will block e-mail from hosts that provide invalid information intentionally (to hide the true source of the spam), but will also deny machines that are simply misconfigured. This is often the case with machines that are inside a firewall and have IP addresses or host names that cannot be resolved outside the firewall.

Valid Envelope Format

After the SMTP helo is sent, the client needs to tell Postfix who the e-mail is from (MAIL FROM) and where to send it to (RCPT TO). This communication is supposed to follow RFC-821. Some spam software isn't strict about it's conformance, and we can block spam based on this fact. You'll need to add the following to /etc/postfix/main.cf:

    strict_rfc821_envelopes = yes

Since most SMTP servers are forgiving about RFC-821 compliance, lots of software doesn't follow the spec correctly. This means that enabling RFC-821 compliance may cause you to reject legitimate e-mail.

This is the fourth and final installment in a series on filtering e-mail with Postfix and Procmail. The first two parts of this series focused on how you can stop receiving spam by configuring Postfix for spam prevention. The third segment introduced methods of stopping spam with Procmail. This installment will discuss two tools that are available for use with Procmail: Razor, an automated spam tagging and filtering tool, and SpamAssassin, a mail filter that contains hundreds of different spam tests.

Distributed Spam Detection and Reporting with Razor

The whole theory behind spamming is to send huge numbers of messages to everyone the world over, using as little local resources as needed. The least amount of CPU and network usage occurs when you send out many identical messages. Any variation, such as custom From/To/Subject lines or message body requires additional processing which spammers would prefer to avoid.

We can bank on the fact that unsophisticated spammers send the exact same message to everyone on the planet. It'd be nice if I could tag a message as spam, and tell all my friends not to bother reading it. However how do you do this? Obviously sending the spam itself doesn't solve anything, it just makes the matter worse.

Instead, if I were to calculate a digest of the message body, I'd have a short string that I could send you, and you could delete, ignore, or refile any messages that had the same digest. However that still isn't great, because I need to somehow communicate those digests to you.

Enter Vipul's Razor, an automated spam tagging and filtering tool. It allows users across the Internet to rely on each other to correctly identify spam messages. Installation is pretty painless. Simply download the razor-agents tarball from http://razor.sourceforge.net/ and install:

	$ tar xzvf razor-agents-VERSION.tar.gz 	$ cd razor-agents-VERSION 	$ perl Makefile.PL

There are a few perl modules that are required, namely Net::Ping, Net::DNS, Time::HiRes, Digest::SHA1, and Mail::Internet. You can install them manually, via CPAN like this:

	 # perl -MCPAN -e "install 'Net::Ping'"

or download the razor-agents-sdk tarball from the Razor page, which has them all bundled up for you.

Once installed, you have two main programs that you'll use: Razor-report and Razor-check.

Razor-report

When you get a spam message, you will use the razor-report program to compute a SHA Digest of the message body and report that digest to a Razor server. A SHA Digest is simply a small hex string that represents the data. SHA Digests are strong enough that it is extremely unlikely that two different messages will have the same digest.

The razor-report program will connect to a Razor server and submit this digest to the database, where it can be used by other people. It simply takes the message on the command line or standard input. You can test it like this:

	$ razor-report -d -s definately_spam_file 	debug: Razor Agents 1.19, protocol version 2. 	debug: Discovering closest server in the razor-report.vipul.net zone 	debug: Sorted (closest first) list of available servers & RTTs:  	debug: 64.90.187.2 (0.0915) 194.109.217.74 (0.1510)  	debug: Wrote server list to .razor-report.lst 	debug: Closest server is 64.90.187.2 	FATAL: Razor Error 4: This is a simulation. Won't connect to 64.90.187.2. 	debug: Agent terminated

The '-d' options selects more debugging info, and '-s' tells it to simulate the report, such that it doesn't actually contact the server and add the digest to the database. It's important to not add spurious digests, because then legitimate mail that matches the digest may be rejected down the road.

Razor-report automatically determines which of the razor servers are closest to you to allow the best response times. The razor servers share the digests between them automatically.

Razor-check

Razor-check queries the Razor databases to determine if a message you've received is spam. It takes a mail message on the command line or standard input, computes the digest, and looks up the digest in the closest razor server:

	$ razor-check -d potential_spam_file 	debug: Razor Agents 1.19, protocol version 2. 	debug: 168968 seconds before closest server discovery 	debug: Closest server is 64.90.187.2 	debug: Connecting to 64.90.187.2... 	debug: Connection established.  Returning self 	debug: Signature: a8e0ade8d5037db329f464b2ec62cbccdab13612 	debug: Server version: 1.11, protocol version 2 	debug: Server response: Negative a8e0ade8d5037db329f464b2ec62cbccdab13612 	debug: Message 1 NOT found in the catalogue. 	debug: Agent terminated 	$ echo "Razor-check returned: $?" 	Razor-check returned: 1

In this case, the digest was not found in the database, and the razor-check program return the value 1. If a message is found, then it will return 0.

Integrating Vipul's Razor with Procmail

Since razor-check returns 0 or 1 depending on if the message is spam or not, we have a simple way to have procmail tell the difference. Say we want to have all Razor-identified spam go to a special mailbox, we'd simply add the following to our .procmailrc:

	:0 Wc 	| razor-check  	:0 a 	IN.razor

This will make all Razor spam go to the IN.razor mailbox. Of course we could do other things with the mail, such as mangling the subject line instead:

	:0 Wc 	| razor-check  	:0 af 	| formail -i "Subject: SPAM (Identified by Vipul's Razor)"

or adding spam headers, which could be used by your mail client:

	:0 Wc 	|razor-check  	:0 af 	| formail   -A "X-Spam-Identifier: Razor"   -A "X-Spam-Probability: High"

Razor-check is all you need to tag your message as spam. However, if you want to give back to the community that is helping, you'll want to submit your own spam digests.

Reporting Spam to Vipul's Razor

All you need to do when you get a piece of spam is to run it through razor-report. However this still requires saving the mail and running razor-report, or piping it directly if your mail client supports it. To make it easier to report spam, and save us all from seeing it, you have a couple other options:

Setting up a dedicated razor-report mailbox

Set up a dummy user on your system with the following .procmailrc:

  	:0c 	| razor-report

Then any time you want to report spam, simply bounce it to this e-mail account. If you want to be even more proactive, you could link to this e-mail address from the Web, post to newsgroups using it, and eventually spambots will learn it and send spam to it, automatically reporting it. For example, I set up razor@ifokr.org for this purpose. (Don't send mail there, you have been warned.)

Configuring your mailer to report spam

If you use a configurable mail client, you can probably map a key to bounce the message automatically. For example in Mutt I mapped the F8 key in mutt by adding the following to my .muttrc:

  	macro pager  'b razor@ifokr.org^Myd'   	macro index  'b razor@ifokr.org^Myd'

The ^M in the two strings above is a literal carriage return. To generate it, you'd type “Ctrl-V Ctrl-M” in vi, for example. This bounces the e-mail and then deletes it, and I have it mapped to both the index and pager.

I separate my Razor-identified spam to IN.razor. I direct other probable spam to IN.spam. I have the following mutt commands in my .muttrc to allow me to report and delete all the spam in IN.spam by hitting F8:

	# We don't want f8 in any folders except IN.spam, 	# so replace any f8 hook with 'refresh screen' 	# (ctrl-l) by default. 	folder-hook .           "macro index <f8> '^L' "  	# Enable f8 for IN.spam only 	folder-hook IN.spam     "macro index <f8> 'T.^M;b razor@ifokr.org^My;d' "

This macro will tag all messages, bounce them all to the reporting mailbox, and delete them. It's only enabled in the IN.spam mailbox for safety.

If you don't have a dedicated mailbox to report spam, you could instead just run razor-report locally, for example using the following macro bindings:

	macro index  "|razor-report" 	macro pager  "|razor-report"

Drawbacks of Vipul's Razor

As spam preventionists come up with new ways to detect spam, the spammers adapt their software to compensate. Spammers are now beginning to alter the body of their messages to prevent Vipul's Razor from functioning. Since even a one byte change in the body will result in a different SHA1 digest, none of the spam messages will be blocked. The can do this in a number of ways. They sometimes include your e-mail address as part of the message, such as

	"This message was intended for razor@ifokr.org, as part 	our supposed opt-in commitment to hawking our wares..."

Sometimes their “remove yourself from our database” link contains an ID associated with your e-mail address, or they address you by name at the top of the message. Any of these modifications will render Vipul's Razor useless. In fact, there's no point in reporting this spam, because all it will do is add a digest to the database for an e-mail that no one else will get.

One of the most transparent methods that a spammer can use to vary an e-mail to avoid detection by Vipul's Razor is to add a line of random text at the bottom of each e-mail. Since spams are frequently HTML, spammers can even add this random text after the end of the html and the user will never see it. This method requires minimal resources on the spammer to produce unique messages that are undetectable by Razor.

Some spammers go so far as to radically alter the messages. I've seen an upswing of HTML-formatted spam that litters the message with HTML comments, such as:

	We w<!--church-->ant to help you get lo<!--jesus-->wer 	HOUSE pa<!--dads trailer-->yments

In a browser, this simply reads “We want to help you get lower HOUSE payments,” but due to all the random HTML comments it remains unique from the point of a SHA digest. You could write Procmail rules to check for these tricks, but it'd be difficult. SpamAssassin, which I cover next, has checks that may thwart this form of message obfuscation. However, I have a very simple solution I've used for years: block HTML e-mail.

No one I care to talk to sends mail as HTML. At worst folks send it as both HTML and text. Thus I have the following procmail rule to direct HTML mail to my spambox:

	:0: 	* ^Content-Type: text/html 	IN.spam

I check my spam box around once a day for misfiled messages from people with bad e-mail netiquette. But I'd have to say that 49 out of 50 HTML messages are spam. Send 'em where they belong.

Razor in the Future

At the time of this writing, Vipul's Razor v2 was just entering beta. This new version promises a plethora of exciting new features to better analyze and match spam, even when spammers employ tricks to defeat our checksuming algorithms, such as interspersing HTML comments, as seen previously. Among the new features are

Text Preprocessors - New message processors which can convert HTML to text, decode quoted-plain and Base64 encoded messages, will cleanse the message of many checksum-defeating tricks employed by the spammers.

Revocation - Razor V1 had no way to revoke a result from razor-report. You can revoke messages you accidentally reported, or messages that you believe others reported in error.

Fuzzy Signatures - Razor can use Nilsimsa signatures, a fuzzy algorithm which compares how close a message is to one in the database.

Truth Evaluation System - Users who wish to report spam to Razor must now register and authenticate. The system will be able to track the quality of the reports, and can use this to keep questionable checksums out of the database.

Performance Improvements - Razor v2 uses a better network protocol, the ability to use a single pipelined connection instead of multiple individual connections, and other enhancements to make it faster and more efficient.

Message Submission - Razor servers will accept the entire spam message when submitted. The server can then perform its own checksums and other similar ephemeral signatures – hashes on periodically changing sections of the e-mail message.

SpamAssassin

I've shown you how you can create Procmail recipes that will help weed out spam, and how Razor can catch messages that other users have flagged as spam. Unfortunately, there are a lot of messages that still sneak through the cracks. Most spams contain similar formatting and phrases. The problem is that it's a lot of work for individuals to maintain their own Procmail recipes to match spam characteristics.

SpamAssassin is a mail filter that contains hundreds of different spam tests. It is available at http://spamassassin.org, as well as through CPAN, and is included in some Linux distributions. Installation instructions are contained on the Web site, but boil down to:

	$ tar xzvf Mail-SpamAssassin-VERSION.tar.gz 	$ cd Mail-SpamAssassin-VERISON 	$ perl Makefile.pl 	$ make 	$ make test 	# make install

This installs the SpamAssassin software itself. You should test out the installation by running the sample messages through SpamAssassin:

	$ spamassassin -t < sample-spam.txt    | less 	$ spamassassin -t < sample-nonspam.txt | less

The first message should be recognized as spam, and SpamAssassin will prepend '*****SPAM*****' to the Subject: line. The second message, though it contains some spam-like characteristics, does not exceed the spam threshold. Both messages will have new X-Spam headers inserted, and a report of all the spam tests that matched. In normal usage you will only get this report when a message is spam, but we get it even for legitimate e-mails due to the “-t” test mode.

SpamAssassin Features

SpamAssassin has hundreds of tests to help it determine if a message is spam or not. It weights these differently, depending on how accurately it fits spam. It even has tests that indicate non-spam messages (pgp signatures, output from “diff”) that can counteract spam-like characteristics in legitimate e-mail. For a terse list of the tests and their scores, see http://spamassassin.org/tests.html. SpamAssassin sums the results of all tests and computes a final score for the message. By default any message of score 5 or higher is considered spam, although you can change this value if you wish.

SpamAssassin can perform DNSBL lookups based on the IP addresses and host names listed in the headers. It has whitelisting capabilities to allow legitimate but misflagged e-mail. It has an automated whitelisting feature, which keeps track of sender addresses and their spam vs non-spam history to automatically adjust scores. It can even report spam to Razor automatically if a threashold is reached. If you decide to use SpamAssassin, make sure you read all the features that are available, because I don't have enough space to cover them all here.

X-Spam Headers

SpamAssassin is traditionally used as Procmail filter and will edit the headers of the message to include the results of it's spam tests. Simply add the following to your .procmailrc, or to /etc/procmailrc if you wish it to apply to all users:

	:0fw 	| spamassassin -P

SpamAssassin will analyze the e-mail and create several new headers:

X-Spam-Flag: YES or NO, is the message spam (was the hit threashhold reached.) This header is the easiest to use with Procmail.

X-Spam-Level: Includes one asterisk (*) for each hit. Useful for matching messages with X or greater hits by counting the asterisks in Procmail rules.

X-Spam-Status: Includes the actual number of hits for this message, as well as the individual tests that matched. Can allow more sensitive Procmail selections.

X-Spam-Checker-Version: SpamAssassin version.

X-Spam-Prev-Content-Type: If SpamAssassin is configured to 'defang mime' (the default) then if the message is spam, it will replace the existing mime type with text/plain. This will stop things like JavaScript or HTML bugs from being effective.

Redirecting Tagged Spam

Both the Subject (which is modified if a message is spam) and the new X-Spam headers can be used by subsequent Procmail rules. For instance:

	# Have SpamAssassin analyze the mail and insert headers         :0fw 	| spamassassin -P  	# Redirect definitive spam 	:0: 	* ^X-Spam: YES 	$SPAM 	 	# Redirect things that we consider spam, 	# even if SpamAssassin doesn't agree by default 	:0: 	* ^X-Spam-Status:.*CTYPE_JUST_HTML 	$SPAM  	# Put messages with scores of 3 or more into probable spam 	# mailbox.  The three dots here match three or more asterisks. 	:0: 	* ^X-Spam-Level: ... 	$PROBABLY_SPAM

Changing User Preferences

Every SpamAssassin user can create his or her own preference file, which tweaks how it functions. This can help you rid yourself of historical Procmail rules by integrating them into SpamAssassin natively. Preferences live in the .spamassassin/user_prefs file in each user's home directory. For example:

	$ cat ~/.spamassassin/user_prefs  	# The default is way too low.  Let's bump it up. 	score CTYPE_JUST_HTML	4.5  	# Grandpa gets flagged a lot due to his sticky shift key 	whitelist_from grandpa@his_e-mail.com  	# Lower the number of hits required: 	required_hits 3  	# Turn off re-write subject if you don't like *****SPAM**** 	# put in your Subject: line. 	rewrite_subject 0  	# I get a lot of spam to this address, but haven't 	# used it in several years 	header   TO_RETIRED_E_MAIL 	To =~ /retired\@e-mail_address.org/ 	describe TO_RETIRED_E_MAIL      Mail to an old retired e-mail address 	score    TO_RETIRED_E_MAIL	3.00

Here I tweaked a few parameters to lower the spam threshold, turned off subject rewriting (which is not really needed if you redirect spams to their own mailbox), created a whitelist address, increased the score for CTYPE_JUST_HTML (and can now remove the special case in the .procmailrc in the last section), and even created my own new TO_RETIRED_E_MAIL rule. These changes will only affect this particular user, allowing a high degree of customizability. Of course if you wanted to make global changes, you can add or change rules in the default SpamAssassin rules, which are typically in /usr/share/spamassassin/ or /etc/spamassassin/.

Using the SpamAssassin Daemon

SpamAssassin is a perl script with many rules and, as such, takes a bit to compile internally each time it is run. This leads to a performance hit of a few seconds before the message is even analyzed, which will slow down your message delivery.

SpamAssassin comes with a daemonized version, named “spamd”, and a small effecient client program named “spamc”. Instead of running the SpamAssassin perl script from Procmail, you'd start the spamd daemon and put the following into your procmailrc:

	:0fw 	| spamc

Spamc will pipe the mail into the SpamAssassin daemon, and will rewrite the result back to stdout, filtering the e-mail just as if you had run SpamAssassin directly. However, since spamc has little startup overhead the processing goes much quicker. In my tests, using spamc with spamd resulted in a fourfold performance improvement over running SpamAssassin alone.

The spamd daemon still honors user's user_prefs file, so you only need to run one daemon to service everyone on the machine. However, you must make sure that the user_prefs file is readable by the spamd process. If you run spamd as a dummy user instead of as root, as suggested, this means you need to have at minimum the following permissions for users with user_prefs files:

	$ chmod go+x ~user 	$ chmod go+x ~user/.spamassassin 	$ chmod go+r ~user/.spamassassin/user_prefs

If you make global changes to the SpamAssassin rules, you will need to restart spamd, but user preferences are read each time automatically.

Conclusion

In this series of articles I've shown you some straightforward steps you can take to keep spam off of your network and out of your mailbox. By implementing various spam protection mechanisms at the optimal points, you can have a spam-free experience without overburdening your systems.

Blocking spam does require a significant amount of education and configuration up front as you create your rules on the SMTP server (Postfix), local delivery agent (Procmail), and any external spam-detection software (Razor, SpamAssassin) that you decide to use. However, once you have the pieces in place, it is largely a system that you can sit back and forget about entirely.

There are many other spam-blocking software packages out there besides Razor and SpamAssassin. Below is a list of mature spam and filtering software that you may want to look at when picking the right tools for you.

Distributed Checksum Clearinghouse

Spam Bouncer

Junkfilter

DeSpam

Tagged Message Delivery Agent (TMDA)

Active Spam Killer (ASK)

There are doubtless many more, but those are some packages that I've played with and can recommend.

Happy filtering!