Sorting Email – Using SpamAssassin, MimeDefang and Procmail





Last Updated on 03/23/2018 by dboth

I have been wanting to move to server side mail sorting for some years but have not gotten around to it. This article describes how I got server side mail sorting to work using three pieces of common and freely available open source software – SpamAssassin, MimeDefang and Procmail

The problem

I like to sort incoming email into a couple folders besides the inbox. Spam is always filed into the spam folder and I leave it there for a couple days so I can look at it later in case someone sends something that I want to receive but that got marked as spam. Some of the incoming ham (good) email from a couple other sources is also sorted into other folders. The rest does get filed into the inbox by default.

So a quick word about terminology before I go any further. Sorting is the process of classifying email and storing it in an appropriate folder. Filters like SpamAssassin classify the email. MimeDefang uses that classification to mark it as spam by adding a text string to the subject line. That classification allows other software to file the email into the designated folders. It is this last bit of software that I was looking for – the one that does the filing.

I have several email filters set up in Thunderbird, my client of choice and the best one I have found for my personal needs. I also have set up some email filters for my wife on her computer. When we travel, or use our handheld devices, those filters don’t always work because Thunderbird – or any other email client with filters – must be running in order to perform their filtering tasks. If I have my laptop with me, I can set that up to do the filtering, but that means I have to maintain multiple sets of filters.

I have also run into a technical problem that I wanted to fix. Client-side email filtering relies on scanning messages after they are deposited in the inbox. For some unknown reason this has resulted in situations where the client does not always delete (expunge) the moved messages from the Inbox. This may be an issue with Thunderbird or it may be a problem with my configuration of Thunderbird. I have worked on this problem for years with no success, even through multiple complete re-installations of Fedora and Thunderbird.

I have my own email server and Spam is a major problem for me. I have several email addresses I use, some of which I have had for a couple decades so they have become major spam magnets. In fact I get at a minimum at least 300 spam emails per day. The record was just over 2,500 spam emails in a single day. I get between 1,200 and 1,500 spam per day, and the numbers keep increasing.

So I needed a method for filing emails, i.e., sorting it into appropriate folders, that is server-based rather than client-based. This will solve a number of issues. I will no longer need to leave an email client running on my home workstation just to perform filtering. It will prevent the need to delete or expunge messages – especially the spam – from our inboxes. And it will require filter configuration in just one location, the server.

My email server

Having grown up with SendMail as the de-facto email server in more than one of my jobs, I started using it for my own email server as soon as I switched permanently from OS/2 to Red Hat Linux 5 in about 1997. I have used it as my mail transfer agent (MTA) since then for both business and personal use.

Note: I am not sure why Wikipedia refers to it as a “message” transfer agent. All my other references use “mail” transfer agent. The Talk tab of the Wikipedia page has a bit of discussion about this which generated even more confusion for me.

I already use SpamAssassin and Mimedefang together to score and mark incoming emails as spam, placing a known string in the subject, “###SPAM###”, so that I can identify and sort spam both as a human and with software. I use UW IMAP for client access to emails but that is not a factor in server side filtering and sorting.

Yes, I use a lot of old-school software for the server side of email, but it is well known, it works well, and I understand how to make it do the things I need it to do.

Project requirements

Having a well defined set of requirements before starting a project is imperative so, based on the description of the problem, I created five simple requirements for this project.

  1. Sort incoming spam emails into the spam folder on the server side using the identifying text that is already being added to the subject line.

  2. Sort other incoming emails into designated folders.

  3. Circumvent problems with moved messages not being deleted or expunged from the Inbox.

  4. Keep the existing SpamAssassin and Mimedefang software.

  5. Any new software would have to be easy to install and configure.

This set of objectives meant that I would therefore need to be using a sorting program that would integrate well with the parts I already have.

Procmail

After extensive research I settled on the venerable procmail. I know – more old stuff – and pretty much unsupported these days, too. But it does what I need it to do and is known to work well with the software I am already using. It is stable and has no known serious bugs. It can be configured for use at the system level as well as at the individual user level.

Red Hat and RH based distributions such as CentOS and Fedora use procmail as the default mail delivery agent (MDA) for SendMail so it does not even need to be installed because it is already there. My server runs CentOS, so this is a real no-brainer. I will use Procmail.

In addition to delivering email, procmail can be used to filter and sort it. Procmail rules – known as recipes – can be used to identify spam and delete or sort it into a designated mail folder. Other recipes can identify and sort other mail as well. Procmail can be used for many other things besides sorting email into designated folders, such as automated forwarding, duplication, and much more. Those other tasks are beyond the scope of this article but understanding how to use procmail for this will give you a better understanding of how to accomplish those other tasks.

How it works

A complete discussion of configuring SpamAssassin, MimeDefang, and Procmail is beyond the scope of this article, in part because there are so many ways of implementing anti-spam solutions using these three programs. I will confine this article to the configuration I used to integrate these three packages to implement my own solution.

Processing of incoming email begins with SendMail. I have added the line shown in Listing 1 to my sendmail.mc configuration file. This line calls MimeDefang as part of the email processing. Be sure to run the make command after making any configuration changes to SendMail and then restart SendMail. Refer to Chapter 8 of the SpamAssassin book listed in the Resources section of this article for more information.

INPUT_MAIL_FILTER(`mimedefang', `S=unix:/var/spool/MIMEDefang/mimedefang.sock, T=S:5m;R:5m')dnl

Listing 1: This line in the semdmail.mc file enables MimeDefang.


SpamAssassin can be run as standalone software in some applications. However, in this environment, SpamAssassin is not run as a daemon, it is called by MimeDefang and the email is first processed by SpamAssassin to generate a spam score for each email.

A default set of rules is provided by SpamAssassin, but you can modify the scores for existing rules, add your own rules, and create white- and blacklists that can assist you in adapting it to the needs of your own installation. The /etc/mail/spamassassin/local.cf file is used for all of this and it can grow quite large; mine is just over 70KB at this writing, and still growing.

SpamAssassin uses its default set of rules and scores, as well as any located in the local.cf file, to generate a total score for each email. MimeDefang uses SpamAssassin as a subroutine and receives the spam score as a return code.

MimeDefang is programmed in Perl, so it is easy to hack. I have hacked the last major portion of the code in /etc/mail/mimedefang-filter to provide a filtering breakdown with a little more granularity than it does by default. This section of the code now looks like Listing 2. Note that I have made significant changes to this portion of the code so your will probably not look much like this.

#####################################################################
# Determine how to handle the email based on its spam score and     #
# add an appropriate X-Spam-Status header and alter the subject.    #
#####################################################################
# Set required_hits in sa-mimedefang.cf to get value for $req #
#####################################################################
if ($hits >= $req) {
   action_add_header("X-Spam-Status", "Spam, score=$hits required=$req tests=$names");
   action_change_header("Subject", "####SPAM#### ($hits) $Subject");
   action_add_part($entity, "text/plain", "-suggest", "$report\n", "SpamAssassinReport.txt", "inline");
   # action_discard();
} elsif ($hits >= 8) {
   action_add_header("X-Spam-Status", "Probably, score=$hits required=$req tests=$names");
   action_change_header("Subject", "####Probably SPAM#### ($hits) $Subject");
   action_add_part($entity, "text/plain", "-suggest", "$report\n", "SpamAssassinReport.txt", "inline");
} elsif ($hits >= 5) {
   action_add_header("X-Spam-Status", "Possibly, score=$hits required=$req tests=$names");
   action_change_header("Subject", "####Possibly SPAM#### ($hits) $Subject");
   action_add_part($entity, "text/plain", "-suggest", "$report\n", "SpamAssassinReport.txt", "inline");
} elsif ($hits >= 0.00) {
   action_add_header("X-Spam-Status", "Probably not, score=$hits required=$req tests=$names");
   # action_add_part($entity, "text/plain", "-suggest", "$report\n", "SpamAssassinReport.txt", "inline");
} else {
   # If score (hits) is less than or equal to 0
   action_add_header("X-Spam-Status", "No, score=$hits required=$req tests=$names");
   # action_add_part($entity, "text/plain", "-suggest", "$report\n", "SpamAssassinReport.txt", "inline");
}

Listing 2: The section of /etc/mail/mimedefang-filter that categorizes and marks email.

I have highlighted in bold the line that changes the subject line of the email. Well it actually calls another Perl subroutine to do that using the string I want to add as an argument, but the effect is the same. The subject line now contains the string “####SPAM####”, but without the quotes, and the spam score, i.e., the variable $hits. Having this known string in the subject line makes further filtering easy.

The modified email is returned to SendMail for further processing. The last thing that SendMail does is call Procmail to act as the MDA.

Procmail uses global and user-level configuration files. The global /etc/procmailrc file and individual user ~/.procmailrc files must be created. The structure of the files is the same, but the global file operates on all incoming email while the local files can be configured for each individual user. I do not use a global file so all of the sorting is done on the user level. My .procmailrc file is shown in Listing 3 and is simple.

# .procmailrc file for david@both.org
# Rules are run sequentially - first match wins

PATH=/usr/sbin:/usr/bin
MAILDIR=$HOME/mail #location of your mailboxes
DEFAULT=/var/spool/mail/david

# Send Spam to the spam mailbox
# This is my new style SPAM subject
:0
* ^Subject:.*####SPAM####
$HOME/spam

# Political stuff goes here. Must be using my political email address
:0
* ^To:.*political
$HOME/Political

# SysAdmin stuff goes here. Usually system log messages
:0
* ^Subject:.*(Logwatch|rkhunter|Anacron|Cron|Fail2Ban)
$HOME/AdminStuff

# drops messages into the default box
:0
* .*

 

Listing 3: My simple ~/.procmailrc recipe file.


Note that the .procmailrc file must be located in the home directory of my email account on the email server. It does not go in my home directory on my workstation. Because most email accounts are not login accounts, they use the nologin program as the default shell. Therefore the admin will need to create and maintain these files. The other option is to change to a login shell such as BASH and set passwords so that knowledgeable users can login to their email accounts on the server and maintain their .procmailrc files.

Each recipe starts with :0 (yes, that is a zero) on the first line and contains a total of three lines. The second line starts with * and contains a conditional statement consisting of a regular expression (regex) that Procmail compares to each line in the incoming email. If there is a match Procmail sorts the email into the folder specified by the third line. The use of the ^ symbol denotes the beginning of the line when making the comparison.

The first recipe in my .procmailrc file sorts the spam identified in the subject line by MimeDefang into my spam folder. The second recipe sorts political email into its own folder. I give my “political” email address for various political organizations that I volunteer for. This makes it very easy to sort them into a folder of their own that I can ignore if I want.

I also get a huge amount of system emails from the many computers I deal with. I sort that email into a mailbox for messages relating to my system administrator duties. This setup makes those emails very easy to find. Note the use of parentheses to enclose a list of strings to match. Each string is separated by a vertical bar, aka the pipe ( | ) which is used as a logical “or”. So the conditional line, “* ^Subject:.*(Logwatch|rkhunter|Anacron|Cron|Fail2Ban)” reads, “if the Subject line contains Logwatch or rkhunter or … or Fail2Ban”. Procmail ignores case so there is no need to create recipes that look for various combinations of upper and lower case.

The last recipe drops all email that does not match another recipe into the default folder, usually the Inbox.

Having the .procmailrc file in my home directory does not cause Procmail to filter my mail. I have to add one more file, the ~/.forward file, which tells procmail to filter all of my incoming email. The .forward file is shown in Listing 4.

# .forward file
# process all incoming mail through procmail - see .procmailrc for 
# the filter rules.
|/usr/bin/procmail

Listing 4: The ~/.forward file activates Procmail filtering.


It is not necessary to restart either SendMail or MimeDefang when creating or modifying the Procmail configuration files.

The SpamAssassin book and the Red Hat Procmail link in the Resources section below describe the configuration of Procmail and creation of recipes in more detail.

Some final thoughts

Note that MimeDefang must be started first, before SendMail, so it can create the socket that SendMail sends emails to for processing. I have a short script – automate everything – I use to stop and restart SendMail and MimeDefang in the correct order so that new or modified rules in the local.cf file take effect.

I already have a large body of rules and score modifiers in my SpamAssassin local.cf file so, although I could have used Procmail by itself for spam filtering and sorting, it would have taken a lot of work to convert all of those rules. I think SpamAssassin also does a better job of scoring because it does not rely on a single rule to match, but rather the aggregate score from all of the rules, as well as scores from Bayesian filtering.

Procmail works very well when matches can be made very explicit with known strings such as the ones that I have configured MimeDefang to place in the subject line. I think Procmail works better as a final sorting stage in the spam filtering process than as a complete solution all by itself. Of course, I know that many admins have made complete spam filtering solutions using nothing more than Procmail.

Now that I have server side filtering, I am somewhat less limited in my choice of email clients because I no longer need a client that performs filtering and sorting. Nor do I have a need to leave an email client running all the time to perform that filtering and sorting.

Reports of Procmail’s demise

Having done a lot of Google searches while researching this article, I found a number of results dating from 2001 through about 2013 that declare Procmail to be dead. They point for evidence at the no longer working web pages, missing source code, and a short article on Wikipedia that does no more than declare Procmail to be dead and links to more recent replacements. However all Red Hat, Fedora, and CentOS distributions install Procmail as the MDA for SendMail. The Red Hat, Fedora, and CentOS repositories all have the source RPMs for Procmail, and the source code is also on GITHub.

Considering the continued use of Procmail by Red Hat, I have no problem with using this mature software that does its job silently and without fanfare.

Resources

  • SpamAssassin: A Practical Guide to Configuration, Customization, and Integration, [PACT] Publishing, ISBN1-904811-12-4. This book also contains information about MimeDefang and Procmail.
  • Wikipedia – SpamAssassin
  • Wikipedia – MimeDefang
  • Red Hat – Procmail
  • Procmail FAQ