CentOS 5.1 with Postfix, Courier IMAPD, Courier Maildrop, Spamassassin 

Robin Whittle rw@firstpr.com.au 2008-04-29  (2009-09-09 Minor change to anony.comf to allow Vcard vcf files.  2010-03-24 added "sa-update".)

../ Back to the parent directory concerning web-mail, modifications to Courier Maildrop etc.

Introduction

Configuring an running a mailserver is a demanding business.  This long page is a how-I-did-it for my home-office mail server.  This is not for massive numbers of users, with databases for authentication etc.  It might suit a small business or home user.  Here is a rough outline of the software I used:

Postfix
MTA (Message Transfer Agent) Widely respected, fast, secure and easy to administer mailserver replacement for sendmail.

Postfix is configured here to deliver incoming messages using the Courier Maildrop program, which can deliver to mailboxes in the Maildir format.  In this format, each message is a separate file, whereas the traditional mbox format has the entire mailbox as a single file.  For any serious use, mbox sucks and maildir swings.

Later, I intend to configure Postfix to use the Sender Policy Framework (SPF) http://www.openspf.org to help reduce backscatter emails: http://en.wikipedia.org/wiki/Backscatter_(e-mail) .

Courier Maildrop

Courier Maildrop is called whenever Postfix needs to deliver a message (an email) to a local user's mailbox.  Here I use Maildrop to perform filtering on the email's headers and body, such as for copying mailing list messages to their own mailboxes.  I use a modified version of Maildrop (as described in another page), which can deliver messages to the Inbox tagged for deletion.  This is very useful with large volumes of mailing list messages.

The mailfilter file calls Spamassassin to do spam filtering and anomy-sanitizer to handle attachments with executable file names, other kinds of malware, dangerous HTML etc.  Anomy-sanitizer hasn't been updated since 2006, but I think it is still good in 2008. It doesn't depend on a feed of updates containing virus definitions, which makes it a lot simpler than running an anti-virus program. I run Avast! on my Windows PCs anyway. An alternative would be to scan for viruses etc. with ClamAV http://www.clamav.org . A recipe for integration with Postfix is at: http://www.postfixvirtual.net/postfixantivirus.html#clamav .

Courier IMAPD

The Courier IMAPD server accesses the Maildir mailboxes and makes them available to any email client program which uses IMAP to store its mailboxes on a server.   For instance, I use Thunderbird on local machines to do this.  In principle, Thunderbird or any other email client program could be running anywhere on the Net and still access these mailboxes.  However, I think Thunderbird tends to do a lot of not-necessarily wanted activity, scanning the contents of mailboxes automatically etc.  This could be a problem if operating over a slow or expensive link, such as when mobile or far from home.

Various web-mail programs

One or more web-mail programs can run on this server, sending outgoing messages to Postfix via SMTP and reading and writing mailboxes, including the Inbox, via IMAP.  I describe these on separate web-pages.  These programs have to be integrated with the Apache web server.  Some of them run with PHP code, so the Apache server needs to support PHP too.

Here is an overall diagram depicting how these things work together.

I am not using any fancy database-based authentication system.  Each user whose email is handled by this system has an ordinary user account on the server.



Maildir format Web-mail email client
mailboxes
/-----------\ /----------\ *********** /---------\ /----------\
| | | | * Inbox * | | | |
| | ====> | Maildrop | ===> * * <======> | Courier | | Postman |
<=== | Postfix | | | =\ *********** | IMAPD | | with | /---------\
===> | MTA | \----------/ | | | IMAP | Apache | | |
SMTP | | | |====> ************ | | <---> | | <-----> | User's |
to & | (Message | Filtering | * YYY * | | | | HTTP | web |
from | Transfer | rules in \=>************ * <==> | | | | or | browser |
other | Agent) | .mailfilter * XXX * *** | | IMAP | SMTP | HTTPS | |
MTAs | | <--\
*** * * <======> | | <-\ | Out | | |
\-----------/ \ ************ \---------/ \ \----------/ | |
\ \ V \---------/
\---<----------------------------------------<---------------/
SMTP outgoing mail \ \ /------------------------\
\ \--> | Thunderbird or other |
\ | IMAP capable Email |
\ | client program on PC. |
\ | |
\-----------<--------- |< SMTP outgoing mail |
| |
\------------------------/
***  Spam and virus filtering

This is where extra programs can go for filtering viruses ("malware") and spam.  Another approach is to have Postfix detect and refuse to accept spam messages from remote MTAs (on the left-side of the Postfix box in this diagram).  This would cut down on the volume of mail which needs to be delivered locally.  However this approach is often unacceptable because some false positives are inevitable in spam detection and the problems resulting from refusing to accept non-spam email, without the intended recipient ever knowing that it has been sent or rejected, cannot be tolerated..  This *** location is actually two points in my .mailfilter file, for each user, which is executed by Courier Maildrop.  One point is where I call Anomy Sanitizer and the other is where I call SpamAssassin.  

Postfix is a collection of programs, and one way of dealing with spam is to configure Postfix to reject SMTP sessions based on the IP address of the machine starting the session (such as by a real-time black-hole lookup of the IP address with a remote service for this) and also on criteria such as sender address and other things in the header.  This, however, rejects the message entirely, so the recipient never knows what has been rejected - meaning that if legitimate messages are rejected, they don't know about it.  It has the advantage of reducing the communications traffic by rejecting spam, rather than accepting and filtering it, but it involves Postfix in more work and delays with looking up remote servers to check the IP address.
My approach to running Spam Assassin and Anomy Sanitizer is to use the xfilter command from within Maildrop's .mailfilter file.  I have a page on how I did this, which you can find from the parent directory.

Another way to run Spam Assassin and Anomy Sanitizer is described by Advosys in Ottawa:
http://advosys.ca/papers/postfix-filtering.html .

The Postfix configuration changes required for running the Advosys script for these two programs are entirely separate from how Postfix is configured to deliver messages with Maildrop.  However, this leads to them running on outgoing mail as well, not just mail to be delivered to local users' mailboxes.  (Actually, the script becomes part of Postfix's smtp program which also handles the messages arriving from local clients via the line entering the bottom right of the Postfix box above.)  Advosys solve this by running two instances of Postfix, with the first one doing the virus/spam filtering as usual and the second one responding to a second IP address purely for SMTP outgoing messages from local mail clients - therefore requiring all the clients be reconfigured.  For my own purposes, I think a better approach is to put Anomy Sanitizer and Spam Assassin in the local delivery section of Postfix, just before Maildrop.   See my separate page, from the parent directory for how I did this - running them from within Maildrop.  However, Advosys.ca have a lot of experience with large mail systems, for companies and educational institutions, and there are good reasons why they run two instances of Postfix, and do other things to improve efficiency.  For instance, their clients may want to filter for spam and viruses on all incoming email, but only filter for viruses on outgoing mail.  This is not because they are spammers, but because spam detection is an imperfect science.
Whether Advosys' or my approach is used, the aim is to label each suspected spam or virus message by adding distinctive headers and/or distinctive and human-readable material in the body of the message.  Anomy Sanitizer typically also changes suspect message bodies to "defang" suspect HTML tags, Javascript, executable files etc.  In the standard configuration this has the effect of greatly shortening long virus messages.  Then, it is easy in Maildrop's filtering rules to decide what to do with suspected spam or malware messages.   I use Maildrop with my "Delivered to Inbox Tagged for Deletion" system, with Cc:ing to separate mailboxes for suspected spam and malware, with the addition of a "[~~~SPAM]" or "[~~~VIRUS]" subject header for when they are delivered tagged for deletion in the Inbox.  This way, all messages are visible to the intended recipient - me - but I don't have go manually select and do something with the individual spams and viruses which are increasingly flooding my Inbox.  

I think it is best to store these highly suspected spam and virus emails somewhere, rather than to delete them manually or automatically as they are first viewed or arrive at the server.  To delete this stuff manually or automatically risks accidentally deleting a valid email - because I can't be super cautious and 100% correct in my judgments whenever I do this, which could be any time of the day.   With largely automated detection and copying to spam and virus mailboxes, supplemented by manual steps for the same purpose with what gets past the filters, at least I can periodically look into those mailboxes to ensure I haven't accidentally turfed something good into these cesspits.

To ease the problem of searching the spam pit for false positives (non-spam messages which were automatically classified as spam) I now sort the messages into two mailboxes:

To sort manually through Spam-Marginal, I normally use Thunderbird's search function to look for my name "Robin" in the body of the message, since spammers typically only know my email address: rw@firstpr.com.au.  (This is a very good reason for having my account name "rw", rather than "robin".)  I also look for a few other words related to my business.  



In the following sections, I describe how I installed the IMAP server Courier IMAPD, its required Authentication Daemon, and Courier Maildrop.

Some alternatives to Courier IMAPD include the newer Dovecot IMAPD.  I tried it once, but I have always been happy with Courier IMAPD.  

The University of Washington server is not suitable because (last time I looked) it did not use Maildir mailboxes.  See this note for some history of disputes about mailbox formats and about difficulties with the IMAP standard itself: http://www.courier-mta.org/fud/ .

People running really large IMAP systems probably need to consider more intense solutions such as Cyrus.  

Courier IMAPD is not available via the yum system (see the page from the parent directory ../ concerning miscellaneous configuration items) "yum list" using the repositories I chose during installation (see ../CentOS-5.1-RAID-1).  I wondered whether I could get a suitable RPM of Courier IMAPD via one of the other repositories listed at: http://wiki.centos.org/Repositories . First I tried RPMForge: https://rpmrepo.org/RPMforge/Finding but did not find any Courier RPMs there.  None of the others looked like they had it either. There's no sign of any CentOS packages at the Courier IMAPD site: http://www.courier-mta.org/imap/ .  

I couldn't easily find any pre-built binary RPM, so I downloaded the source code:

http://www.courier-mta.org/download.php#imap  courier-imap-4.3.1.tar.bz2 (3.2M)

and tried compiling it.  I put it in /opt/courier-imapd/  The Midnight Commander F2 menu (User menu from the File menu) has a handy "Extract compressed tar file to subdirectory" function I like to use on .tar.bz2 etc. tarballs such as this.

Before I can install Courier IMAPD, I need to install the Courier Authentication Library, which provides a daemon to handle authentication matters.

Courier Authentication Library 0.60.2 - compile and install

As root, I unzip it and read the README and INSTALL doco.  

I was also reading the INSTALL of Courier IMAPD at the same time, and became confused.  The IMAPD INSTALL says that the ./configure process must be done by an ordinary user, not root.  In the following, I did the unzip, ./configure and the compilation (make) - as an ordinary user, but then as root for "make install" and make "install-configure".  Maybe the whole thing could be done as root.

I did not install "expect" since I am not using the Courier webmail system, which needs "expect" to enable the user to change their password.

Since I am installing Courier for the first time, I create a user and group "courier":  "adduser courier".  I don't think I need any options, so:

./configure

As mentioned in INSTALL, the configure script seems to go round and round.  

make

No problems . . .

make install

This failed when I did it as a non-root user, so I did it as root and it seemed to work fine.

make install-configure

Likewise, as root, this worked too.

Now to make sure the daemon starts at boot, and to start it now.

INSTALL directs me to read README_authlib.html about setting up the authentication modules.  I am using ordinary shadow password authentication, since each IMAP user has a local account on the Linux machine.  This looks like essential reading for anyone doing a more elaborate authentication then I am - but there is nothing here which requires me to change anything.  Normally, 5 daemons are running, which is fine for me.  Busier systems would require more.  There is a note there on testing the authentication daemon, which I follow after I have made it run.

To make the daemon start at boot time, I copy the file courier-authlib.sysvinit (which is built in the source directory, as part of the compilation process) to  /etc/rc.d/init.d/courier-authlib with permissions 775.  Then I give the command:

chkconfig --add courier-authlib

Then, I can see that this script is part of the system:

chkconfig --list
. . .
conman          0:off   1:off   2:off   3:off   4:off   5:off   6:off
courier-authlib 0:off   1:off   2:on    3:on    4:on    5:on    6:off
cpuspeed        0:off   1:on    2:on    3:on    4:on    5:on    6:off

I start it now:

/etc/rc.d/init.d/courier-authlib start

Now I can test it on the command line, as described in README_authlib.html.

authtest robin

This looks good so far.


Courier IMAPD 4.3.1 - compile and install

I have not investigated using SSL with IMAP.  The INSTALL file details how to do it.

The INSTALL file has this which needs to be considered first:

You MUST run the configure script as normal user, not root. Did you
extract the tarball as root? It won't work. Remove the extracted source
code. Log in as a normal user. Extract the source code as a normal user,
then run configure. You will do everything as a normal user, except for
the final step of installing the compiled software.

I do not wish to include the Gamin or FAM system for real-time folder updates.

Either GDBM or the Berkeley DB library is needed.  Looking at the output of "yum list" I find these are installed:

gdbm.i386         1.8.0-26.2.1          
gdbm-devel.i386   1.8.0-26.2.1          

I need to check that neither inetd or xinetd are listening on the IMAP port.  (I am only using this for IMAP, not POP3.)  xinetd is used on CentOS 5.1. IMAP is TCP port 143 (as specified in /etc/services).  Looking at the files in /etc/xinet.d/ I see none of them have a line "service IMAP", so this looks OK.

So maybe I am ready to build Courier IMAPD. . .

./configure

No problems.

make check

Now become root.

su root

make install

No problems!  Note: The above line should probably be "make install-strip" to create binaries without debugging stuff.  The INSTALL doco indicates this, but not very clearly.

make install-configure

No problems!!

There is a bunch of stuff at /usr/lib/courier-imap/ including the config file: /usr/lib/courier-imap/etc/imapd .

It is vital to edit the above file so IMAPD will actually run.  The commented lines below are the original lines for these settings.

# ADDRESS=0
ADDRESS=10.0.0.2

# MAXDAEMONS=40
MAXDAEMONS=500

#MAXPERIP=4
MAXPERIP=100

#IMAPDSTART=NO
IMAPDSTART=YES              <<< You will definitely be wanting this one.

** When I make this machine become gair, change the address line to include gair's LAN address and (if I choose to do so) the public IP address.

At the end of this file, the name of the mailbox directory is specified.  By default it is "Maildir", which is fine.  The variable IMAP_CHECK_ALL_FOLDERS might be worth looking at.

Previous experience indicates that Mozilla (now Thunderbird) was much happier with MAXDAEMONS and MAXPERIP being set to 100 and 20, rather than 40 and 4.  However, a quick search indicates some people are setting these to values such as 5000 and 100.  I chose 500 and 100 quite arbitrarily.

To make the system start IMAPD at boot time, I copy the file courier-imap.sysvinit  from the source directory to /etc/rc.d/init.d/courier-imap with permissions 775.  Then I give the command:

chkconfig --add courier-imap

Then, I can see that this script is part of the system:

chkconfig --list
. . .

conman          0:off   1:off   2:off   3:off   4:off   5:off   6:off
courier-authlib 0:off   1:off   2:on    3:on    4:on    5:on    6:off
courier-imap    0:off   1:off   2:on    3:on    4:on    5:on    6:off
cpuspeed        0:off   1:on    2:on    3:on    4:on    5:on    6:off

To start it (but check first that /usr/lib/courier-imap/etc/imapd has IMAPDSTART=YES) :

/etc/rc.d/init.d/courier-imap start

Right now, there's no Maildir directories on this server to test it with.  I need to install Courier Maildrop and integrate it with Postfix to generate some maildir format mailboxes to test out the IMAP server with.

x


Courier Maildrop 2.0.4 - compile and install

This is just installing the standard version, without my modifications for delivery with the message marked for deletion.  See another section below #maildrop_mods for how I change two files and recompile maildrop with these new features.

I unpacked the tarball as a non-root user, but I see nothing in INSTALL requiring this, so I change to root for the rest of the compilation and installation

There is a README.Postfix

INSTALL tells me some things I need to check:

I need the PCRE library: http://www.pcre.org - Perl Compatible Regular Expressions. "yum list" tells me that pcre.i386 6.6-2.el5_1.7 is installed. I guess I only need this, not some dev library for it.  No . . . running ./configure complains of a missing "pcre/pcre.h", which sounds like it is in the development library.  So I need to install pcre-devel.i386 (6.6-2.el5_1.7):

yum install pcre-devel

I am not sure why I would want Courier Maildrop to know about the Courier authentication library.  Its role is to accept an email from Postfix and deliver it to the local user's Inbox, with potential mailfiltering which may also cause the message to be processed by other programs and delivered to other mailboxes of the same user.

Some options I consider using:

--without-db I don't think I use anything in Maildrop which involves databases, so I could make the executable smaller by removing that stuff.  Maildrop is executed for every incoming email.  It is not much fuss on my server, but on a busier machine, it might be worthwhile making Maildrop as minimal as possible.

I decide to use no configure options.

In all the various systems Maildrop can be compiled to work in, there are various places each user's mailboxes could be stored.  The configure script tries to figure this out.  However, INSTALL states (implicitly after running ./configure) that:

you MUST always verify the output file, config.h, to make sure that the settings are correct. 

and that there is another file xconfig.h too which contains things which can't be automatically determined.

More on this below.

./configure

No problems, but I need to edit some things in the file it produces: /maildrop/config.h.

In this system, I want each user's mailbox to be in their home directory.  So for user robin, there is a directory:

/home/robin/Maildir/

In that directory, if there is only an Inbox, then there are three subdirectories:

/home/robin/Maildir/cur      Contains messages already seen by an
                             IMAP client.
     

/home/robin/Maildir/new      Contains messages delivered to this
                             Inbox but not yet seen by an IMAP
                             client.

/home/robin/Maildir/tmp      Not used for messages or anything else.


With Courier IMAP, all the other mailboxes for that user are within the Inbox.  Each is a directory with a similar structure of 3 sub-directories, but each is a directory with a name such as ".Drafts", where the name of the mailbox is "Drafts".   Each such mailbox can have its own mailboxes, as far as the email client is concerned, but the Maildir (set of three directories like those above) for each one is still a directory of the original /home/robin/Maildir/ directory above.  The nesting of mailboxes is done like this:

To the client they appear as a tree of directories and sub-directories, such as:

    Inbox
     |
     |--aaa
     |
     |--Trash
     |
     |--xxx
     |
     |--yyy
         |
         zzz
 

Physically in the file system, they are as follows, not counting each directory's three subdirectories and two or three files which Courier IMAP puts
in each one.

/home/blah/Maildir/.aaa
/home/blah/Maildir/.Trash
/home/blah/Maildir/.yyy
/home/blah/Maildir/.yyy.zzz

Courier IMAPD and Maildrop are very fussy about the user, group and permissions of these directories, and of each file which contains an email.

In /maildrop/config.h is the line:

/* Default mail delivery instruction */
#define DEFAULT_DEF "/var/mail"

This is All Wrong.  It is not surprising - the configure script had no way of knowing I want the mailboxes to be in the user's directories.

I need to change this to:

#define DEFAULT_DEF "./Maildir"

This is in accordance with this part of INSTALL (and with my past experience with Maildrop):

To use maildrop with [5]qmail, which normally delivers to $HOME/Mailbox, set DEFAULT_DEF to ./Mailbox.

Then I try my luck compiling it:

make
make install-strip
make install-man

No problems.  The output of make install-strip lists where things are installed.  For my system, in summary:

/usr/local/bin/maildrop   <<  The maildrop binary.
/usr/local/bin/mailbot
/usr/local/bin/reformail

/usr/local/bin/deliverquota
/usr/local/bin/lockmail
/usr/local/bin/maildirmake
/usr/local/bin/reformime
/usr/local/bin/makemime
/usr/local/bin/makedatprog
/usr/local/bin/makedat

/usr/local/man/man1/lockmail.1
/usr/local/man/man1/maildirmake.1
/usr/local/man/man1/maildrop.1
/usr/local/man/man1/mailbot.1
/usr/local/man/man1/makemime.1
/usr/local/man/man1/reformail.1
/usr/local/man/man1/reformime.1
/usr/local/man/man5/maildir.5
/usr/local/man/man7/maildirquota.7
/usr/local/man/man7/maildropex.7
/usr/local/man/man7/maildropfilter.7
/usr/local/man/man7/maildropgdbm.7
/usr/local/man/man7/maildirquota.7
/usr/local/man/man8/deliverquota.8

/usr/local/share/maildrop/html/maildirquota.html
/usr/local/share/maildrop/html/deliverquota.html
/usr/local/share/maildrop/html/lockmail.html
/usr/local/share/maildrop/html/maildirmake.html
/usr/local/share/maildrop/html/maildropex.html
/usr/local/share/maildrop/html/maildir.html
/usr/local/share/maildrop/html/maildropfilter.html
/usr/local/share/maildrop/html/maildropgdbm.html
/usr/local/share/maildrop/html/maildrop.html
/usr/local/share/maildrop/html/mailbot.html
/usr/local/share/maildrop/html/makemime.html
/usr/local/share/maildrop/html/reformail.html
/usr/local/share/maildrop/html/reformime.html
/usr/local/share/maildrop/html/rfc822.html
/usr/local/share/maildrop/html/rfc2045.html
/usr/local/share/maildrop/html/makedat.html
/usr/local/share/maildrop/html/manpage.css

(The last time I compiled Maildrop in 2003 or so, it put its binary in /usr/bin.)

Now to create an initial Maildir format mailbox in a user directory, to configure Postfix to use Maildrop, and then to send some test emails.


Configure Postfix in general and to use Maildrop

This is basic stuff to get Postfix working with Maildrop.  Later, before turning this nair machine into gair I will do some other changes.

To create a Maildir format mailbox for user robin, at /home/robin I give the command (as user robin):

maildirmake ./Maildir

Before trying to configure Postfix,  I check it is working in the first place.

I add this machine's current name "nair.firstpr.com.au" to my DNS, with its address as 10.0.0.2 (I will remove it once nair switches over to be the new gair).  I send a test email to user robin at this machine.  

Nothing happened - nothing in /var/log/maillog or where the message should be delivered, without Maildrop: /var/spool/mail/robin/ .  Looking in the maillog of gair (my mailserver) I see "connect to nair.firstpr.com.au[10.0.0.2]: Connection refused (port 25)".  So Postfix is not running by default.  

So I need to check Postfix's configuration and start it.  /etc/postfix/main.cf looks OK (it is not - see below), though I will change it before this machine becomes gair.  I tried to start Postfix:
 
/etc/rc.d/init.d/postfix start

but this failed.  A line in /var/log/maillog indicated it was already running.  I can "telnet localhost 25" and get a response from Postfix.  However, I can' telneting to 10.0.0.2 port 25 from another machine - "connection refused".

I change these lines in /etc/postfix/main.cf from:

#inet_interfaces = all
#inet_interfaces = $myhostname
#inet_interfaces = $myhostname, localhost
inet_interfaces = localhost

to

inet_interfaces = all
#inet_interfaces = $myhostname
#inet_interfaces = $myhostname, localhost
#inet_interfaces = localhost

and then restart it:

/etc/rc.d/init.d/postfix restart

Now I  telnet to port 25 on this machine from outside.

In gair, I cause the Postfix server there to try to send all the messages it has queued:

postfix flush

and I find the message has arrived in the mbox format mailbox, the Inbox for user robin: /var/spool/mail/robin .

So now I should be ready to configure Postfix to use Maildrop.

There is a README.postfix file in the Maildrop distribution, but it only concerns things I think I already know about or can ignore:

I already knew about local_destination_concurrency_limit=1.  There are some notes about adding a "u" flag somewhere.  This has already been done in the relevant section of /etc/postfix/master.cf which specifies how Maildrop is called:

# maildrop. See the Postfix MAILDROP_README file for details.
# Also specify in main.cf: maildrop_destination_recipient_limit=1
#
maildrop  unix  -       n       n       -       -       pipe
  flags=DRhu user=vmail argv=/usr/local/bin/maildrop -d ${recipient}

The two changes required to /etc/postfix/main.cf are to add these two lines.  Both config commands are already mentioned in the file, in a commented form.

mailbox_command = /usr/local/bin/maildrop
local_destination_concurrency_limit=1

The second line is required because Maildrop can only deliver one email at a time and so should not be asked to deliver to two or more local mailboxes at the same time.  Postfix would otherwise ask Maildrop to deliver a message to multiple users (if it was addressed to those users)

It is necessary to restart Postfix after these changes.  (I put a script to do this in the /etc/postfix directory so I don't have to remember /etc/rc.d/init.d/postfix restart .)

There is a note in main.cf that if an external program (Maildrop in this case) is used for local delivery, an alias must be created to deliver mail addressed to root to some other account:

# IF YOU USE THIS TO DELIVER MAIL SYSTEM-WIDE, YOU MUST SET UP AN
# ALIAS THAT FORWARDS MAIL FOR ROOT TO A REAL USER.

Postfix Aliases

The active lines in main.cf for aliases are:

alias_maps = hash:/etc/aliases
alias_database = hash:/etc/aliases

The relevant notes there are:

# ALIAS DATABASE
#
# The alias_maps parameter specifies the list of alias databases used
# by the local delivery agent. The default list is system dependent.
#
# On systems with NIS, the default is to search the local alias
# database, then the NIS alias database. See aliases(5) for syntax
# details.
#
# If you change the alias database, run "postalias /etc/aliases" (or
# wherever your system stores the mail alias file), or simply run
# "newaliases" to build the necessary DBM or DB file.
#
# It will take a minute or so before changes become visible.  Use
# "postfix reload" to eliminate the delay.
#
#alias_maps = dbm:/etc/aliases
alias_maps = hash:/etc/aliases
#alias_maps = hash:/etc/aliases, nis:mail.aliases
#alias_maps = netinfo:/aliases

# The alias_database parameter specifies the alias database(s) that
# are built with "newaliases" or "sendmail -bi".  This is a separate
# configuration parameter, because alias_maps (see above) may specify
# tables that are not necessarily all under control by Postfix.
#
#alias_database = dbm:/etc/aliases
#alias_database = dbm:/etc/mail/aliases
alias_database = hash:/etc/aliases
#alias_database = hash:/etc/aliases, hash:/opt/majordomo/aliases

At the end of /etc/aliases is:

# Person who should get root's mail
#root:          marc

I add a line below that:

root:           robin

(** Later, for gair, change this to rw.)

and some notes to myself on what to do when changing aliases:

# After changing the aliases, run:
#
#    newaliases
#    postfix reload

The newaliases command complains about any syntax errors.

This means I don't need any Maildir format mailbox for root.


Now I can send a message to user robin or root from another machine and see the message appear as a file in /home/robin/Maildir/new/.

Some other changes to do now or later:

I recall that Maildrop will write the message to an mbox mailbox if there is no Maildir mailbox of the right name.  It will create that mbox file and keep adding to it, until it reaches 50Mbytes.  It is a good idea to check for any such mbox files, such as if a mailfilter command writes messages to some mailbox which doesn't exist as a Maildir - for instance due to some typo in the .mailfilter file.

Also, I configure Maildrop (via each user's .mailfilter file) to write to a log file: mailfilter-log.txt.  Maildrop will not write to that (or at least earlier versions wouldn't) if it grew beyond 50Mbytes.  So it is best to have some kind of logrotate process to chop these files back to 0 bytes regularly.

It is highly desirable that whenever I make a new user account that there will be a Maildir mailbox directory in their home directory.  So, at /etc/skel I give the command:

maildirmake ./Maildir

Please search this page for other mentions of /etc/skel - there are other things which need to go in it to support Spamassassin.


Testing IMAPD and the Authentication Daemon


As noted above, Postfix and Maildrop are working, receiving messages and delivering them to /home/robin/Maildir/new/ .

The Authentication Daemon and the IMAP Daemon are both running.

I set up a new account in Thunderbird, on my Windows machine, for user robin on the computer nair.firstpr.com.au.  I don't use the IP address - I use "nair.firstpr.com.au" because I recall some difficulties with this in the past.  So nair needs to be in my DNS.

I disable "Check for new messages every 10 minutes".

When I delete a message, I want it to be "Mark as deleted", not "Moved to the Deleted folder".  In Composition and Addressing I turn off "Compose messages in HTML format". I disable the junk email stuff.

Then when try to open the Inbox of this account, I am asked for my password and Thunderbird lists my messages.

I rebooted the machine and tested it all again.

Backup and Reporting Mailbox Size etc.

I run a nightly cron job to back all my user directories and their mail to another Linux machine on the LAN.  When I was using Mbox mailboxes, which could be altered at any time with incoming mail, I found I could not simply tar-gzip the directories themselves, but had to copy them to a temporary directory and tar-gzip them there.   Probably, with Maildir mailboxes, I don't need to do this, but it is part of the backup script and I still use it.

As time goes on and mail accumulates, mailboxes grow and this poses some problems:
  1. Storage limits on the server's hard drive.
  2. Time taken to copy and tar-gzip for backup purposes.
  3. Total size of the entire backup file.
Therefore, mailboxes need to be emptied or deleted - but which ones to get rid of?   In order to decide, I wanted to know how much storage each mailbox was taking in the tar-gzip file.   Here is a shell script - my slight modification of an excellent script by Michael Carmak on the Courier Users list on 23 September 2002.   Thanks Michael!
maildir-report-3.sh.txt     maildir-report-3.sh.txt.gz 
An example of its output follows.  It does not report the final size of the entire backup file, since it does not create any such file.  It generates temporary tarballs of each mailbox, and reports on the size of that, amongst other things.   In the final backup file, the size would be different, but not by much.  So with this, I can see which mailboxes contribute most to the backup file size.

This script reports on one user only, and it  reports on all the mailboxes except for the Inbox.  To do that, I would need to modify it to look into /cur as well.  But I have a pretty good idea of what is in the Inbox since I am using it all the time.  What I want to know about is the mailing list mailboxes, the virus and spam pits, and also any mailboxes I had forgotten about which are growing fat, such as ones to keep messages prior to filtering, the Drafts mailbox etc. A sample output is:

Mailbox:                          .0-Inbox-Old.Inbox-02-03
Number of files:             1447
Total size of all files:          21.0592 MB
Disk space used:                                25M
Approx tar.gz size:                                          13M


Mailbox:                          .0-Inbox-Old.Post-Filter-Inbox
Number of files:             9888
Total size of all files:          294.157 MB
Disk space used:                                326M
Approx tar.gz size:                                          107M



Mailbox:                          .0-SPAM-etc.0SPAM-HTML-refresh
Number of files:                7
Total size of all files:          56.7617 kB
Disk space used:                                88k
Approx tar.gz size:                                          29k


Mailbox:                          .Drafts
Number of files:             1397
Total size of all files:          22.4293 MB
Disk space used:                                26M
Approx tar.gz size:                                          6.1M


Mailbox:                          .Lists.Mail-Courier-users
Number of files:             5697
Total size of all files:          21.7214 MB
Disk space used:                                30M
Approx tar.gz size:                                          3.8M


Mailbox:                          .Lists.Music-dsp
Number of files:            14324
Total size of all files:          43.9091 MB
Disk space used:                                67M
Approx tar.gz size:                                          10M



Maildrop Mods - introducing the DELTAG feature

Please see a separate page ../Maildrop-mods-filtering where I provide new versions of two files which are part of the maildrop code:

/maildir/maildircreate.c
/maildrop/maildir.C

In the same build tree which resulted from the initial compilation, I copy the new files into these locations and go back to the main directory.

make

A bunch of new stuff is created, but AFAIK, only the maildrop binary is affected.  It can be found in:

/maildrop/maildrop

I get a bloated version: 764942 bytes.  This is mainly debugging information.  So from that directory:

strip maildrop

Now it is 164948 bytes.

Rename the original maildrop binary to something different:

/usr/local/bin/maildrop-orig

Copy the new version there:

/usr/local/bin/maildrop

and make sure the new version has the same permissions as the old.  For my system, that is:

User: root Group: mail Permissions 755

and perform these tests to show it is working.



Testing Maildrop's .mailfilter file, subjadd and the new DELTAG feature

Firstly, just send a message to an account and see it gets delivered to the Inbox.

Now create a .mailfilter file in the user's directory with the following contents:

# Sample Courier Maildrop file to demonstrate the DELTAG modifications

logfile "mailfilter-log.txt"

log "========"

if ( /^Subject: Test1/ )
{
    log "-------------------------------------------- Test1 found "
    to "Maildir/.blah"
}
               

if ( /^Subject: Test2/ )
{
    log "-------------------------------------------- Test2 found "
    cc "Maildir/.blah"
    to Maildir
}
               
if ( /^Subject: Test3/ )
{
    log "-------------------------------------------- Test3 found "
    cc "Maildir/.blah"
    DELTAG=1
    to Maildir

}

if ( /^Subject: Test4/ )
{
    log "-------------------------------------------- Test4 found "
    xfilter "subjadd [ABC]"
    to Maildir
}

    log "- - - - - - - - - - - - - - - - - - - - - -  No match"
  
                                                # Deliver to the Inbox.
    to "Maildir"
                                                   

Don't forget the dot at the start of ".mailfilter".  The above text is available as a file: mailfilter-test-1.txt  See the Spamassassin section for additional things to add to this file later.

The .mailfiter file must not be writable or even readable by anyone but the user.  Maildrop will refuse to use it if it is otherwise.  The user and group of this file should be that of the user whose account this is and the permissions should be 600.

Then create a mailbox (using the email client) called "blah".  This will appear as a maildir with /cur, /new and /tmp at:

/home/robin/Maildir/.blah

Now send three messages, with subjects:

Test1
Test2
Test3
Test4

The "to" command does something with the message and ends this run of Maildrop.  "cc" writes a copy of the message somewhere and keeps processing it.

The Test1 message should be copied to the mailbox "blah" and not arrive in the Inbox.  

The Test2 one should be copied to "blah" and arrive in the Inbox.  Likewise Test3, except that it will arrive in the Inbox tagged for deletion.  

The Test4 message should arrive in the Inbox, with a subject line:

[ABC] Test4

All messages with subject lines not starting with "Test1", "Test2", "Test3" or "Test3" should arrive in the Inbox as they otherwise would without the .mailfilter file.  Appropriate entries will be made in mailfilter-log.txt.  For debugging email mysteries, this is handy to scrutinize in combination with /var/log/maillog.

The xfilter command of Maildrop is clearly very powerful.  Anything at all could happen as a result of whatever script, binary or whatever is called.

If the command ("subjadd" in this case) can't be found, expect a syntax error and for the message not to be delivered.  Postfix will queue it ("defer") and try again from time-to-time.  Once the syntax error is fixed, giving the "postfix flush" command will cause the message to be presented to maildrop again within a second or so.

If I change the "subjadd" word above to something which Maildrop can't find, such as "subjaddxx", then the resulting message in /var/log/maillog looks like this:

Apr 28 20:25:32 nair postfix/local[29401]: 08A781755B5: to=<user@example.org>, orig_to=<user@example.org>, relay=local, delay=0.18, delays=0.08/0.03/0/0.07, dsn=4.3.0, status=deferred (temporary failure. Command output: bash: subjaddxx: command not found /usr/local/bin/maildrop: Unable to filter message. )


Always test the syntax of a newly created or modified .mailfilter file by sending a message to that account.  If it doesn't arrive as it should, then look into /var/log/maillog for a report from maildrop on what line the syntax problem was on.  Postfix writes this line.

It is very easy to forget brackets or to add extra blank lines in bracketed things which must not have them.

Maildrop's language is necessarily fiddly, but I have found it extremely useful since 2001 or so.


Spamassassin

In my final .mailfilter file, I test the messages with Spamassassin first, and then with Anomy Sanitizer.

Here I describe installing and configuring Spamassasin (http://spamassassin.apache.org), then likewise Anomy Sanitizer.  Then I describe how I call them from my .mailfilter file.

Update 2010-03-24:  I just realised that I should run (as root) "sa-update" periodically to (somehow, I haven't figured it out) update the Spamassassin installation according to fixes which have been made since this version was created.  For instance, there was a problem in 2010 with all messages scoring extra points, as described here: https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6269 .  Today after I ran "sa-update" the problem disappeared, and the text in the file:

/var/lib/spamassassin/3.002004/updates_spamassassin_org/72_active.cf

was:

##{ FH_DATE_PAST_20XX
header   FH_DATE_PAST_20XX      Date =~ /20[2-9][0-9]/ [if-unset: 2006]
describe FH_DATE_PAST_20XX      The date is grossly in the future.
##} FH_DATE_PAST_20XX

with the original "20[1-9][0-9]" original replaced with "20[2-9][0-9]".  So this will only complain about dates 2020 and beyond - good for another decade!


I selected Spamassasin to be installed as part of the original CentOS 5.1 installation, which is documented at another page via ../.  "yum --list" tells me I have Spamassasin 3.1.9-1 installed.  However, 3.1.9 is from 2007-06-11.  The latest version is 3.2.4 from 2008-01-05.  Its install file http://svn.apache.org/repos/asf/spamassassin/branches/3.2/INSTALL states that Perl 5.6.1 or later is required.  I see from the directory this CentOS machine has: /usr/lib/perl5/5.8.8 that I have Perl 5.8.8.

I decided to get rid of 3.1.9-1 and install the latest version myself.

yum remove spamassassin

OK.  I will install the new version independently of the yum and RPM system. The Download link from the home page leads me to:

http://spamassassin.apache.org/downloads.cgi?update=200801071704

Where there is a note:

Please create a local copy of the report_template text in a file named something like /etc/mail/spamassassin/10_local_report.cf, and modify it to provide your tech support desk's contact information, instead of the default. Otherwise your users will be confused, and some may ultimately contact the SpamAssassin development team, which is not appreciated;

I will do this later . . . but see below, it seems to happen as part of the installation process.

The easiest way to get the latest version seems to be via CPAN, using the cpan program.  I actually did this after using CPAN for the first time to download a module I needed for Anomy Sanitizer.  See the Anomy Sanitizer section below for my first use of cpan. As instructed in the Download page, I give the command:

cpan Mail::SpamAssassin

This results in a bunch of activity concerning  J/JM/JMASON/Mail-SpamAssassin-3.2.4.tar.gz . I give an email address for contact regarding spam problems.

A bunch of stuff happens which I don't understand.  Considering how much intelligent-looking stuff is happening and how few characters I typed, I would say this is the easy way to install Spamassassin.  This takes at least an hour on my 824MHz PIII Celeron.

Some of the text at the end of this process includes:

Installing /usr/lib/perl5/site_perl/5.8.8/spamassassin-run.pod
Installing /usr/lib/perl5/site_perl/5.8.8/Mail/SpamAssassin.pm
Installing /usr/lib/perl5/site_perl/5.8.8/Mail/SpamAssassin/DBBasedAddrList.pm
Installing /usr/lib/perl5/site_perl/5.8.8/Mail/SpamAssassin/SQLBasedAddrList.pm
Installing /usr/lib/perl5/site_perl/5.8.8/Mail/SpamAssassin/Bayes.pm
...
Installing /usr/share/man/man1/sa-update.1
Installing /usr/share/man/man1/spamc.1
Installing /usr/share/man/man1/spamassassin.1
Installing /usr/share/man/man1/spamassassin-run.1
Installing /usr/share/man/man1/spamd.1
Installing /usr/share/man/man1/sa-compile.1
Installing /usr/share/man/man1/sa-learn.1
Installing /usr/share/man/man3/Mail::SpamAssassin::Plugin::Shortcircuit.3pm
Installing /usr/share/man/man3/Mail::SpamAssassin::Timeout.3pm
Installing /usr/share/man/man3/Mail::SpamAssassin::Plugin::AWL.3pm
Installing /usr/share/man/man3/Mail::SpamAssassin::Plugin::AutoLearnThreshold.3pm
...
Installing /usr/bin/sa-compile
Installing /usr/bin/sa-update
Installing /usr/bin/spamc
Installing /usr/bin/spamd
Installing /usr/bin/spamassassin
Installing /usr/bin/sa-learn
Writing /usr/lib/perl5/site_perl/5.8.8/i386-linux-thread-multi/auto/Mail/SpamAssassin/.packlist
Appending installation info to /usr/lib/perl5/5.8.8/i386-linux-thread-multi/perllocal.pod
/usr/bin/perl "-MExtUtils::Command" -e mkpath /etc/mail/spamassassin
...
/usr/bin/perl -MFile::Copy -e "copy(q{rules/local.cf}, q{/etc/mail/spamassassin/local.cf}) unless -f q{/etc/mail/spamassassin/local.cf}"
/usr/bin/perl -MFile::Copy -e "copy(q{rules/init.pre}, q{/etc/mail/spamassassin/init.pre}) unless -f q{/etc/mail/spamassassin/init.pre}"
/usr/bin/perl -MFile::Copy -e "copy(q{rules/v310.pre}, q{/etc/mail/spamassassin/v310.pre}) unless -f q{/etc/mail/spamassassin/v310.pre}"
/usr/bin/perl -MFile::Copy -e "copy(q{rules/v312.pre}, q{/etc/mail/spamassassin/v312.pre}) unless -f q{/etc/mail/spamassassin/v312.pre}"
/usr/bin/perl -MFile::Copy -e "copy(q{rules/v320.pre}, q{/etc/mail/spamassassin/v320.pre}) unless -f q{/etc/mail/spamassassin/v320.pre}"
/usr/bin/perl "-MExtUtils::Command" -e mkpath /usr/share/spamassassin
/usr/bin/perl -e "map unlink, </usr/share/spamassassin/*>"
...
/usr/bin/perl build/preprocessor -Mvars -DVERSION="3.002004" -DPREFIX="/usr" -DDEF_RULES_DIR="/usr/share/spamassassin" -DLOCAL_RULES_DIR="/etc/mail/spamassassin" -DLOCAL_STATE_DIR="/var/lib/spamassassin" -DINSTALLSITELIB="/usr/lib/perl5/site_perl/5.8.8" -DCONTACT_ADDRESS="rw@firstpr.com.au" -m644 -Irules -O/usr/share/spamassassin
10_default_prefs.cf
20_advance_fee.cf
20_body_tests.cf
20_compensate.cf
20_dnsbl_tests.cf
20_drugs.cf
20_dynrdns.cf
20_fake_helo_tests.cf
20_head_tests.cf
20_html_tests.cf
20_imageinfo.cf
20_meta_tests.cf
20_net_tests.cf
20_phrases.cf
20_porn.cf
20_ratware.cf
20_uri_tests.cf
20_vbounce.cf
23_bayes.cf
25_accessdb.cf
25_antivirus.cf
25_asn.cf
25_dcc.cf
25_dkim.cf
25_domainkeys.cf
25_hashcash.cf
25_pyzor.cf
25_razor2.cf
25_replace.cf
25_spf.cf
25_textcat.cf
25_uribl.cf
30_text_de.cf
30_text_fr.cf
30_text_it.cf
30_text_nl.cf
30_text_pl.cf
30_text_pt_br.cf
50_scores.cf
60_awl.cf
60_shortcircuit.cf
60_whitelist.cf
60_whitelist_dk.cf
60_whitelist_dkim.cf
60_whitelist_spf.cf
60_whitelist_subject.cf
72_active.cf user_prefs.template
languages sa-update-pubkey.txt
chmod 755 /usr/share/spamassassin


This list of .cf files are config files for various tests within Spamassassin which are written to /usr/share/spamassassin/.  A note in each says they should not be changed manually, but to use this command for help regarding configuration:

perldoc Mail::SpamAssassin::Conf

Better to use the HTML version of this extensive documentation:

http://spamassassin.apache.org/full/3.2.x/doc/Mail_SpamAssassin_Conf.html

The complete documentation is here:

http://spamassassin.apache.org/full/3.2.x/doc/

23_bayes.cf is interesting.  This is where raw output from the Baysian classifier (compares the whole message with a bunch of stuff generated from two sets of messages, one known to be spam and the other not) and converts it into some more usable things.

I did not actually have a local.cf file copied to my /etc/mail/spamassassin directory, since I had already written a file of this name there, generated as noted below from a web page.  I see most of the stuff was installed in /usr/lib/perl5/site_perl/5.8.8/Mail/SpamAssassin.

I generate a config file with the web page: http://www.yrex.com/spam/spamconfig.php My notes on how I did this are:

Low Threshold (5.0, default)
Don't Rewrite Subjects (default)
Don't Use Attachments (0)
Use Bayes System (default)
Use Auto Learning (default)
Enable RBL Checks (default)

  Use Network Checksum Tests? Choose whether to use
  these services that compare message checksums to
  known spam: Vipul's Razor 2.x, DCC, and Pyzor.
  These will only work when the client software for
  each service is installed.

Use Razor 2 if available
Use DCC if available
Use Pyzor if available

  SpamAssassin 3.1 Note: Due to licensing issues,
  Razor2 and DCC are not enabled by default in
  version 3.1. Your administrator must enable their
  plugins in /etc/mail/spamassassin/v310.pre or
  the setting above will be ignored.

Use Language Testing:

  Analyzes body text rather than simply checking the
  character set. This is more effective, but slows
  down SpamAssassin slightly. If this option is
  disabled, only the boldface languages above will be
  detected. (ok_languages)
 
  SpamAssassin 3.1 Note: Language checking has been
  moved to a plugin in version 3.1. This setting will
  not work unless your administrator has enabled the
  TextCat plugin in /etc/mail/spamassassin/v310.pre.

and the resulting config file is:

# SpamAssassin config file for version 3.x
# NOTE: NOT COMPATIBLE WITH VERSIONS 2.5 or 2.6
# See http://www.yrex.com/spam/spamconfig25.php for earlier versions
# Generated by http://www.yrex.com/spam/spamconfig.php (version 1.50)

# How many hits before a message is considered spam.
required_score           5.0

# Encapsulate spam in an attachment (0=no, 1=yes, 2=safe)
report_safe             0

# Enable the Bayes system
use_bayes               1

# Enable Bayes auto-learning
bayes_auto_learn              1

# Enable or disable network checks
skip_rbl_checks         0
use_razor2              1
use_dcc                 1
use_pyzor               1

# Mail using languages used in these country codes will not be marked
# as being possibly spam in a foreign language.
ok_languages            all

# Mail using locales used in these country codes will not be marked
# as being possibly spam in a foreign language.
ok_locales              all
 
According to this page: "The usual location for this file is /etc/mail/spamassassin/local.cf for a system-wide configuration." so I wrote it there.  

There are 746 tests listed at: http://spamassassin.apache.org/tests_3_2_x.html .

According to the Top-Level README file: http://svn.apache.org/repos/asf/spamassassin/branches/3.2/README :

The distribution provides "spamassassin", a command line tool to
perform filtering, along with the "Mail::SpamAssassin" module set
which allows SpamAssassin to be used in spam-protection proxy SMTP or
POP/IMAP server, or a variety of different spam-blocking scenarios.

In addition, "spamd", a daemonized version of SpamAssassin which
runs persistently, is available. Using its counterpart, "spamc",
a lightweight client written in C, an MTA can process large volumes of
mail through SpamAssassin without having to fork/exec a perl interpreter
for each message.
In the past, from Maildrop's .mailfilter file, I called the /usr/bin/spamassasin program, which is a Perl script.  I will keep doing this, since this is not a high-volume mailserver.

Back to the doco http://svn.apache.org/repos/asf/spamassassin/branches/3.2/INSTALL I see I need to edit /etc/mail/spamassassin/v310.pre if I am not using Razor2 (I am not using it), but I don't see which line in v310.pre is its loadplugin line to comment out.

How do I get Spamassasin to use its Baysian learning system, for each user, for two mailboxes full of spam and non-spam respectively?  Before trying this, I will integrate Spamassassin into my test .mailfilter file described above. I add this before the section "   log "- - - - - - - - - - - - - - - - - - - - - -  No match"".

  LMB="Maildir"

xfilter "/usr/bin/spamassassin -x"

                                # Watch out for header line added by
                                # Spamassassin.

                                # Don't allow any blank lines after the
                                # if statement!
if (  /^
X-Spam-Flag: YES
/  )
{
  log "----------------------------------- Spam general. "

  cc "Maildir/.-SPAM"           #  Make this "cc" for copy or "to" to not
                                #  send it to Inbox.
  DELTAG=1
  xfilter "subjadd ~~~[SPAM]"
  to "$LMB"
}
                                # The Anomy Sanitizer stuff goes in here
                                # when we are ready to test it too.

                                                   

The resulting file is available here as mailfilter-test-2.txt .

I send the account a test email to ensure it gets through OK.  If I have not specified the location of spamassassin correctly, or if it won't run, then an ordinary message will never get to the Inbox.

It doesn't . . . and /var/log/maillog has something from Maildrop (reported via Postfix, which called Maildrop):

/usr/local/bin/maildrop: Cannot have world/group permissions on the filter file - for your own good.

Hmmm - I had upset the permissions of the .mailfilter file when I added this text.  It must have permissions 600 and have the user and group of the user whose account this is.  Now it works fine for an ordinary message.  To test Spamassassin and this rudimentary spam filtering arrangement, I use Thunderbird to make a mailbox called "-SPAM" and then send a message to this account with the following subject and body text:



However, beware of the Baysian learning stuff -  I don't want it learning from this test message what is spam and what is not.  So, I meant to turn off Baysian learning for the while in the /etc/mail/spamassassin/local.cf file.   (I actually turned of the use of bayes to analyze each message - "use_bayes 0".)

Subject: BUY TEST SPAM NIGERIA VIAGRA SEX FREE

NIGERIA VIAGRA SEX ENLAGEMENT FREE REMOVE SALE PENIS $$$ BUY GUARANTEE
MASS EMAIL OPPORTUNITY HERBAL DIPLOMA WORK AT HOME STOCK AUCTION

Hmm - nothing arrives in any mailbox.  "top" indicates that spamassassin is chewing 20% of CPU cycles and 8% of RAM (256M x 0.8 = 20MBytes).

Beginning of fuss . . .

Scroll down to the next heading to find the solution.

  This goes on for some minutes, but the RAM usage drops to 3.5%. Then it goes to 71% of CPU cycles!  I would have thought that Postfix or Maildrop would have timed out by now . . .  The system is sluggish . . . and so far I can't see a maillog message about this message.  I send a benign message, and it doesn't arrive either.  Minutes later the spamassasin process is down to 0.9% of CPU and 0.1% of RAM.  But IMAP access is slow and I decide to shut the machine down and reboot.  I found this in maillog:

Apr 29 17:00:40 nair postfix/local[4580]: 738BF175305: to=<blah@example.org>, relay=local, delay=386, delays=0.89/0.27/0/385, dsn=4.3.0, status=deferred (temporary failure. Command output: [4617] warn: config: path "/usr/share/spamassassin/languages" is inaccessible: Permission denied [4617] warn: config: path "/usr/share/spamassassin/languages" is inaccessible: Permission denied [4617] warn: config: path "/usr/share/spamassassin/languages" is inaccessible: Permission denied maildrop: Timeout quota exceeded. [4617] warn: spamassassin: killed by SIGPIPE )

There is no subdirectory "languages" in /usr/share/spamassassin/.  This wasn't a problem until I sent the spammy test message.  I sent another normal message after rebooting, and the same thing happened - so it was not the spammy message.  

Google only finds 12 mentions of "/usr/share/spamassassin/languages" and none relate to an error message like this.

A spamassassin process was behaving the same way, according to "top".  I used "kill" to kill it, via its process number.  However within a second or two another spamassassin process was running.  Thinking of the Sorcerer's Apprentice now . . .

Was it my change to /etc/mail/spamassassin/local.cf : "use_bayes               0"?  I changed it back to 1, killed the spamassassin process, did "postfix flush" and . . . I became confused and perplexed by lack of messages in the mailboxes and lack of informative messages in /var/maillog or mailfilter-log.txt.  I rebooted the machine again.  The monitor (previously doing X Windows) had no signal, the hard drives were still being used . . as before.  I hit the reset button and rebooted the machine.

Initially there is no sign of spamassassin or another sus-looking process "setroubleshootd", which after this fuss started was hogging about half of the CPU cycles.  I think this is a nasty interaction between a badly configured spamassassin and SELinux.  Googling "setroubleshootd" turns up only 2350 pages, one of which is: http://danwalsh.livejournal.com/7995.html from 2006.  So this is a pretty obscure daemon. This page caused me to look at /var/log/audit/audit.log where I found this daemon had been adding a verbose entry to the file every 12msec while this spamassassin trouble was occurring. There were 17M of logs, neatly bundled into multiple files of 5MB each.

The package doing this is "setroubleshoot" - I removed it. (Version .noarch 0:1.8.11-4.el5.)  I don't have time for this stuff.

"postfix flush" didn't lead to any messages appearing in mailboxes - I don't know what happened to them.

I sent an ordinary message and the process repeated, with masses of stuff being written to /var/log/audit/audit.log - 100KBytes a second!  spamassassin was hogging the CPU.  

Solution to fuss: disable SELinux

While this was occurring, using X Windows, I got to System > Administration > Security Level and Firewall > SELinux and changed it from "Enforcing" to "Disabled".  The other option was "Permissive".  The writing to /var/log/audit/audit.log immediately stopped, and spamassasin disappeared from "top".  What's more, the ordinary message was delivered to the Inbox.  In its headers only one appeared to be from spamassassin:

X-Spam-Checker-Version: SpamAssassin 3.2.4 (2008-01-01) on nair.firstpr.com.au

I sent the spammy message again.  spamassasin appeared briefly in "top" (with 65% CPU . . .) and then was gone.  The message appeared in the Inbox, and I found this in its headers:

X-Spam-Checker-Version: SpamAssassin 3.2.4 (2008-01-01) on nair.firstpr.com.au
X-Spam-Level: ***
X-Spam-Status: No, score=3.5 required=5.0 tests=ALL_TRUSTED,DRUGS_ERECTILE,
DRUG_ED_CAPS,SUBJ_ALL_CAPS,SUBJ_BUY autolearn=no version=3.2.4
I don't understand how spamassassin scored 3.5 for this when a few years ago (2003) it scored it as 10.  Perhaps because its sender address now is my address.

I need a spammy message, so I grab one from the pit and find the original version of it, in my pre-SA-Anomy mailbox.  I edit it as new and address it to the test account.  This time it worked fine.  The score was above 5, so there was a header: "X-Spam-Flag: YES" and this was detected by my .mailfilter logic, resulting in the message being turfed into the spam pit and a copy of it sent to the Inbox, with something extra in the subject line - and tagged for deletion.  The message is like this, with some things turned into xxx.  The -6.6 is due to it being sent with a from address which is mine, rather than the original spam address

Return-Path: <xx@xxxxxx.xxx.xx>
X-Spam-Flag: YES
X-Spam-Checker-Version: SpamAssassin 3.2.4 (2008-01-01) on xxxx.xxxxxxx.xxx.xx
X-Spam-Level: **********
X-Spam-Status: Yes, score=10.1 required=5.0 tests=ALL_TRUSTED,AWL,FS_REPLICA,
    REPLICA_WATCH,URIBL_AB_SURBL,URIBL_BLACK,URIBL_JP_SURBL,URIBL_OB_SURBL,
    URIBL_SBL,URIBL_SC_SURBL autolearn=spam version=3.2.4
X-Spam-Report:
    * -1.4 ALL_TRUSTED Passed through trusted hosts only via SMTP
    *  1.2 FS_REPLICA Subject says "replica"
    *  3.4 REPLICA_WATCH BODY: Message talks about a replica watch
    *  2.0 URIBL_BLACK Contains an URL listed in the URIBL blacklist
    *      [URIs: reppsrapill.com]
    *  1.6 URIBL_AB_SURBL Contains an URL listed in the AB SURBL blocklist
    *      [URIs: reppsrapill.com]
    *  2.9 URIBL_JP_SURBL Contains an URL listed in the JP SURBL blocklist
    *      [URIs: reppsrapill.com]
    *  2.1 URIBL_OB_SURBL Contains an URL listed in the OB SURBL blocklist
    *      [URIs: reppsrapill.com]
    *  2.5 URIBL_SC_SURBL Contains an URL listed in the SC SURBL blocklist
    *      [URIs: reppsrapill.com]
    *  2.5 URIBL_SBL Contains an URL listed in the SBL blocklist
    *      [URIs: reppsrapill.com]
    * -6.6 AWL AWL: From: address is in the auto white-list
X-Original-To: xxxxx@xxxx.xxxxxxx.xxx.xx
Delivered-To: xxxxx@xxxx.xxxxxxx.xxx.xx
Received: from gair.firstpr.com.au (gair.firstpr.com.au [10.0.0.1])
    by xxxx.xxxxxxx.xxx.xx (Postfix) with ESMTP id 59B46175579
    for <xxxxx@xxxx.xxxxxxx.xxx.xx>; Tue, 29 Apr 2008 19:05:55 +1000 (EST)
Received: from [10.0.0.6] (unknown [10.0.0.6])
    by gair.firstpr.com.au (Postfix) with ESMTP
    id E259459DA1; Tue, 29 Apr 2008 19:05:54 +1000 (EST)
Message-ID: <4816E500.8020706@firstpr.com.au>
Date: Tue, 29 Apr 2008 19:06:08 +1000
From: Robin Whittle <rw@firstpr.com.au>
Organization: First Principles
User-Agent: Thunderbird 2.0.0.12 (Windows/20080213)
MIME-Version: 1.0
To: xxxxx@xxxx.xxxxxxx.xxx.xx
Subject: Replica Rolex Swiss Watches
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
X-Antivirus: avast! (VPS 080429-0, 29/04/2008), Inbound message
X-Antivirus-Status: Clean

Buy the Patek Philippe watch and know everything about time!

What to look for when purchasing a replica watch

Startling Brietling watches at Replica Classics

http://reppsrapill.com/


While this is not my final arrangement for detecting spam, it shows that Spamassassin is working.

I don't see where Bayes was used.  Bayes autolearn was on, which is bad - I had meant to turn it off.  I don't want it learning from these test messages. Perhaps it was not used because it hasn't had a chance to learn enough yet.

A better filtering arrangement - for Spam-marginal

In my previous (2003 to April 2008) Spamassassin arrangement, I left Spamassassin's threshold at 5.0 - which controls whether it adds the "X-Spam-Flag: YES" header or not - and used separate tests for the actual spam score (previously known as "hits").

Anything below 2 went to my Inbox as usual.  Anything scoring between 3.0 and 7.99 went to the -SPAM-marginal mailbox, and no to the Inbox.  Anything above 8.0 went to the -SPAM mailbox and not to the Inbox.

Here is how I do this in the new system, but at present I don't know enough about the behaviour of the scoring system to decide what threshold I want to use.  This is a complete .mailfilter file:

# Sample Courier Maildrop file to demonstrate filtering messages
# according to how Spamassassin scores them.

logfile "mailfilter-log.txt"

log "========"

if ( /^Subject: Test1/ )
{
    log "-------------------------------------------- Test1 found "
    to "Maildir/.blah"
}
                
xfilter "/usr/bin/spamassassin -x"
 
                               # Look out for marginal scoring spam &
                               # put it somewhere I can scrutinise it
                               # easily, away from the worst stuff.
                               # If the threshold of 3.0 is too low
                               # there will be man false-positives in
                               # -SPAM-marginal, and I will need to
                               # fish them out manually.
                    
if (   ( /X-Spam-Status: No, score=3./  )    \
     ||( /X-Spam-Status: No, score=4./  )    \
     ||( /X-Spam-Status: Yes, score=5./ )    \
   )       
{
    log "-------------------------------- Spam marginal. "    
    to "Maildir/.-SPAM-marginal"         
}

                                # Watch out for header line added by
                                # Spamassassin, which it will be for
                                # anything scoring 5.0 or above.
                                #
                                # Don't allow any blank lines after the
                                # if statement!
if (  /^X-Spam-Flag: YES/  )
{
  log "----------------------------------- Spam general. "
  to "Maildir/.-SPAM"           
}
                                # The Anomy Sanitizer stuff goes in here
                                # when we are ready to test it too.

    log "- - - - - - - - - - - - - - - - - - - - - -  No match"
   
                                # Deliver to the Inbox.
    to "Maildir"

I made a mailbox -SPAM-marginal and tested the system with various messages.  The above is mailfilter-test-3.txt and is a complete little .mailfilter file which could be adapted for spam detection, mailing list filtering etc. after some reading of the documentation for the Maildrop filtering language.

In this arrangement, scores in these ranges result in:

Below 3.0:        Inbox
3.0 to 5.999:     -SPAM-marginal
6.00 and above:   -SPAM

Although Spamassassin sets the "X-Spam-Flag: YES" for scores of 5.0 or above, in the above arrangement, messages scoring 5.0 to 5.00 never get to the second test, because they are caught and sent to -SPAM-marginal instead.

So as long as Spamassassin's threshold is within the range of what is considered marginal spam, then the range of scores for -SPAM-marginal can be fine-tuned by adding, deleting (or commenting out) and changing the contents of lines such as:

     ||( /X-Spam-Status: Yes, score=4./ )    \
 
I sent the two spammy test messages mentioned above, and an ordinary message, and the behavior was as expected.

Spamassassin takes a few seconds to crunch each message, including simple messages which are not very spammy.

In a section below I explain why and how I configured Spamassassin to give more weight to Baysian scores.  Before that, the examples I give use the normal scoring arrangement.

Training the Bayesian function of Spamassassin - per user

In order to improve Spamassassin's assessments of messages, I want to give it two thousand or so, non-spam messages and likewise known spam messages, to train its Baysian function.  The doco is here:

http://spamassassin.apache.org/full/3.2.x/doc/sa-learn.html

Last time I tried this in 2003, I found that in order to get sa-learn to work I had to create this directory and empty file in the user account:
~/.spamassassin/user_prefs
This needs to go in /etc/skel/ too.

I prepared two mailboxes of messages: SA-OK and SA-SPAM.  I did this in my regular email account rw@firstpr.com.au using the old version of gair, which is still running while I build the new one, currently called nair.  I did this with Thunderbird, and then copied the contents of these mailboxes to mailboxes of the same name in Thunderbird's account for rw on nair.

By the way, a handy method of getting a bunch of messages in a Maildir into a single text file is to store them to a local mailbox of Thunderbird's.  There they are stored in Mbox format, all in one file, with any body line which starts with "From" changed to "> From".

In SA-OK I got 1675 ordinary messages of my own, carefully checked not to be spam, as described below.  ("Several thousand" is the recommended number in sa-learn.html.)

In SA-Spam, I put 1768 spams (7.8MBytes).  I got these by manually picking them from my post filter mailbox - a copy of all messages (in this case, in the last 30 days) not identified as mailing list messages and which were not classed as spam - and therefore which were sent to the Inbox.  This included dozens a day which were spam, but I had manually deleted them from the Inbox as they turned up.  So these are generally the spams which scored rather low in the old system.  In the past month or so, with the old system (Spamassassin 2.61 from 2003): I was getting approximately this number of spams a day:

Score below 3.0    ~30 a day  -> Inbox
3.0 to 7.99        ~528 a day -> -SPAM-marginal  
8.0 and above      ~522 a day -> -SPAM

This is pretty impressive - a 5 year old (hand-tweaked, Bayesian trained) Spamassassin setup was still catching 97% of spam, with very few false positives (threshold 3.0).  I think I was only getting a false positive every month or two, but it is possible I missed some.

Ideally, I think, when the new system is running, only one or two spams will get through, and the false positive rate will still be as low as one a month or so.  Ideally, I will tune the -SPAM-marginal score range so about 10 to 20% of spam goes there, which will hopefully still contain all the false-positives, and so be easier and more reliable to sort through manually.

This level of spam is mainly addressed (outer envelope) to  my main rw@firstpr.com.au address and others at firstpr.com.au, such as root, postmaster and some others I alias to rw.  Spam addressed to my other domains should generally caught by separate mail filters, since I don't have any active email address in those domains.  For instance, I get about 40 a day for some address at astroneu.com.  However, I need to look more closely at how I handle, with Postfix's aliases file, messages addressed to (in the headers or in the SMTP delivery "envelope") to addresses other than my main email address.

I deliberately excluded some spams from SA-Spam - those which closely resemble legitimate emails.  For instance I got a bunch of phishing spams which look very much like emails from Google Adwords.  But does Google Adwords send me emails?  Yes - they sometimes do.  I also tried to get rid of backscatter bounce emails, notification of messages not delivered etc.  Maybe I should think of them as spam too.  The trouble is, if Spamassassin did catch them, then I wouldn't get bounces for messages I actually sent.  I manually deleted the non-spam messages by scrutinizing them when sorted according to subject and then by sender. Finally I searched for my first name in the bodies of the messages. Messages which look like they are from eBay and PayPal are a problem too, since I do get legitimate emails from these companies.

For SA-OK, I used recent good messages from the Inbox, carefully checked in a similar way, without undeliverable messages.  This was 1675 messages, from the last 7 months - 51MBytes.  A few are from lists, announcement systems etc. which I don't handle with the filtering system.  Some are eBay emails, reminders that a domain is to expire, emails from ticket purchasing systems etc.  I removed backscatter emails from this.  So I get about 7.9 messages a day which I consider proper emails (not counting a few hundred a day on mailing lists) and about 1000 a day which are spam.

I copied the contents of these mailboxes with Midnight Commander via the NFS link from the new machine nair to the old one gair (where I had sorted them).  I had to copy them first from their home in /home/rw/Maildir to another directory outside /home, because the NFS system doesn't get past /home.   I could have copied them via the Windows Thunderbird, which has both the rw at gair account and the rw at nair accounts open at the same time.  The NFS copying is faster, but then I have to set all of them to rw's user and group in the new machine.  In the old machine, the mailboxes have been seen by Thunderbird, so the messages are all in /cur.  In the newly created mailboxes in the new machine (made with Thunderbird) I copy the messages into the /new directory, so when I open them with Thunderbird, it will see them as new messages.

Now, from the rw account in the new machine, as user rw, I use this script to teach Spamassassin what's what:

#!/bin/bash
sa-learn --ham  --showdots --dir ~/Maildir/.SA-OK/cur
sa-learn --spam --showdots --dir ~/Maildir/.SA-Spam/cur

As noted above, this requires the presence of a directory /home/rw/.spamassassin/ and I think a file there user_prefs.  This took about 24 minutes with sa-learn chewing virtually all the CPU of this 824MHz PIII Celeron.  The results are in /home/rw/.spamassassin/ as database files bayes_toks (5.2MB) and bayes_seen (331KB).

More on my final mail filtering arrangement later in this page.


Anomy Sanitizer 1.76

The purpose of Anomy Sanitizer is detailed on the home page:

http://mailtools.anomy.net

It scans for files with names which would make them executable (for instance on Windows, there is a long list of such filename extensions) and changes their name so they are not directly executable. It also defangs HTML which could pose security risks.

I have been running Anomy Sanitizer 1.56 (2003-10-23) until now (2008-04-29) and am now installing version 1.76 (2006-01-03).  I have extensive notes on my old installation in the zip file of the old page, at its directory: ../Postfix-SA-Anomy-Maildrop/ .

The documentation is at: http://mailtools.anomy.net/sanitizer.html .

I downloaded the tarball and unzipped it to: /opt/anomy-sanitizer-1.76/.  I then copied the /anomy/ directory from this (which contains everything) to /usr/local/anomy/  This means the location of the main Perl script is:

/usr/local/anomy/anomy/bin/sanitizer.pl

Anomy needs a Perl interpreter of 5.005_03 or later, and these Perl modules:
How do I test if these are present in my system? CPAN is the way to get Perl modules: http://en.wikipedia.org/wiki/Cpan  Based on the last time I tried this, I gave the command:

LANG=C perl -MCPAN -e shell

though I think just "cpan" might have done the trick.

It asked if I was ready for manual configuration.  Yes.

Cache directory /root/.cpan/ .  10M. atstart. yes. no. history filename = /root/.cpan/histfile. 100. ask. (A bunch of questions I just pressed Enter for). I selected a local mirror.  Then I got a prompt: "cpan>".

To this I gave commands to load modules:

install MIME::Base64

After some chewing away, it either downloaded this or checked it and told me this module is up-to-date.

(I know next to nothing about Perl modules and am not in the mood for learning - I just want to install mailserver software right now.)

install MIME::QuotedPrint 

In this case, the cpan program instantly told me it was up-to-date.


I ran the test cases by giving the command "./testall.sh" in the /usr/local/anomy/ .  This lead to a warning message:

WARNING: Your default language setting is en_US.UTF-8, which may enable
UTF-8 (unicode) support in various programs, including Perl.  This
may cause the Anomy Sanitizer to malfunction.
Please read the file UNICODE.TXT for further information.
 
followed by successful completion of all tests except two which were skipped due to "F-Prot" not being installed.  I didn't do any more tests - I will test its actual invocation from my .mailfilter .

UNICODE.TXT tells me that I need to set two environment variables before calling Anomy Sanitizer in the mail filtering system:

LC_ALL=C
LANG=en_US
 

Configuration

There is no configuration file in the tarball. Here is how I created a config file, based on my 2003 work:

This is my file /usr/local/anomy/anomy.conf.  Note, the file can have any name, provided it is given to the program when it is run, as I do in the .mailfilter file, as noted below.  

This is a totally freshly written config file, but with the business end of it based on the work of Advosys.ca.  

Here, I have explicitly set every option documented at: http://mailtools.anomy.net/sanitizer.html on 31 May 2003, but that page does not mention all features.  It includes:  "WARNING: This document is outdated! Please refer to the CHANGELOG for up-to-date information on new features."  Indeed the CHANGELOG at: http://mailtools.anomy.net/CHANGELOG.sanitizer.txt there is a new feature: feat_mime_files which is also explained in a mailing list message from Bjarni Einarsson on 29 May 2003.

I have also tried to document the meaning of different values for the various options, and indicate the default, in a way which is clearer than on this page.  However, I am a beginner with this program, there is lots I don't understand - so please do not consider my comments, or the values I have chosen, as being necessarily wise.

Looking at the changelog in April 2008, here are some new options added since I wrote this config file:

Added the following options to configure the HTML cleaner (all are off
by default):

feat_html_noexe Disallow links to executables
feat_html_unknown Allow unknown HTML tags
feat_html_paranoid Paranoid HTML Cleaner mode, bans all src= links
and enables feat_html_noexe paranoia as well.

Made the file-name/MIME-type sanity checks configurable (default on)
via. the feat_sane_names variable. Set to 0 to disable.

This file is available here as: anomy.conf.txt and anomy.conf.txt.gz .

# Configuration file for Anomy Sanitizer
#
# Based on a file from Advosys Consulting Inc., Ottawa
# http://advosys.ca/papers/postfix-filtering.html
#
# Works with Anomy Sanitizer revision 1.60
#
# Doctored by Robin Whittle   http://www.firstpr.com.au/web-mail/
#
# All config items are set explicitly, with their defaults marked "*",
# as per http://mailtools.anomy.net/sanitizer.html on 2003-05-31.


# Warn user about unscanned parts, etc.
#   0  Don't warn. 
# * 1  Warn.
#
feat_verbose = 1   


# Insert log in the message itself.
#   0 Off.
# * 1 Maybe.
#   2 Force.
#
feat_log_inline = 0


# Log to STDERR:
#   0  Don't log.
# * 1  Log.
#
feat_log_stderr = 1       


# XML format for logs.
# * 0  Off.
#   1  On.
#
feat_log_xml = 0


# Include trace info from logs.
# * 0  Off.
#   1  Include.
#
feat_log_trace = 0       


# Add scratch space to part headers.
# * 0  Off.
#   1  Add.
#
feat_log_after = 0


# Enable filename-based policy decisions.
#   0  Off.
# * 1  Enable. 
#
feat_files = 1


# Force all parts (except text/html parts) to have file names.
# * 0  Don't force.
#   1  Force.
#
feat_force_name = 0


# Replace all boundary strings with our own
# NOTE:  Always breaks PGP/MIME messages!
# * 0  Off.
#   1  Replace.
#
feat_boundaries = 0


# Protect against buffer overflows and null values.
#   0  Off.
# * 1  Protect.
#
feat_lengths = 1


# Defang incoming shell scripts.
#   0  Off.
# * 1  On.
#
feat_scripts = 1


# Defang active HTML content - Javascript and more.
#   0  Off.
# * 1  Defang.
#
feat_html = 1


# Allow Web-bugs.
# * 0  Allow.
#   1  Disallow.
#
feat_webbugs = 0


# Scan PGP signed message parts.  See custom message below.
# * 0  Don't scan.  
#   1  Scan
#
feat_trust_pgp = 0


# Sanitize inline uuencoded files.  Bjarni R. Einarsson wrote:
# This should always be set to 1, or people will be able to send you
# uuencoded viruses/attachments and they'll slip by the sanitizer.
# Also, if this is 0 then uuencoded attachments won't be detected as
# such, and will instead get treated as text or HTML - and will get
# corrupted by the HTML cleaner.
#
#   0  Don't sanitize.
# * 1  Sanitize.
#
feat_uuencoded = 1


# Sanitize forwarded messages.
#   0  Don't sanitize.
# * 1  Sanitize.
#
feat_forwards = 1


# This isn't a test-case configuration.
# * 0  Not testing.
#   1  Test case.
#
feat_testing = 0


# Fix invalid MIME, if possible.
#   0  Don't fix.
# * 1  Fix.
#
feat_fixmime = 1


# Paranoia about MIME headers etc.
# * 0  Don't be excessively paraniod.
#   1  ???
#
feat_paranoid = 0


# Scoring and exit status.
# Any message requring this many modifications
# will cause the sanitizer to return a non-zero
# exit code after processing the entire message.
#
# eg. the default: score_bad = 100
#
# Here, this is disabled:
#
score_bad = 0  
       


# Depth of recursion when including config files.
# Default = 5. I left it alone.

#
#  max_conf_recursions = 5


# Temp file and quarantine directory.  This must exist
# and be writable by the user running the sanitizer.
# Temporary or saved files are created using this template.
#
#   file_name_tpl = /var/quarantine/att-$F-$T.$$$
#
# An attachment named "dude.txt" might be saved as
#
#    /var/quarantine/att-dude-txt.A9Y
#
file_name_tpl       = /var/spool/anomy/att-$F-$T.$$$


# Add two lines of informational headers to each message.
#
# header_info  = X-Sanitizer: Gotcha!
# header_info += \nX-Gotcha: Sanitizer!
#
# Here is my version:
#
header_info = X-Sanitizer: Spam Assassin and Anomy Sanitizer - see http://www.firstpr.com.au/web-mail/.


# Disable these built-in headers:
#      
header_url = 0
header_rev = 0


# Message to begin the log.
#
msg_log_prefix  = This message has been sanitized - it may have been altered \n
msg_log_prefix += to improve security, as described below. \n


# Define a new, more informative message, for when a file is
# dropped.
#
msg_file_drop  = \n*** Attached file dropped ***\n
msg_file_drop += An attachment named %FILENAME was deleted from this \n
msg_file_drop += message because it contained a Windows executable \n
msg_file_drop += or other potentially dangerous file type. \n
msg_file_drop += Contact the system administrator for more information. \n


# Message suitable for not scanning PGP messages.
#
msg_pgp_warning = PGP encrypted content follows and has not been sanitized. \n


# Default policy for attached files which do not match any policy.
# One of:
#     accept  = Leave the attachment in the message unchanged.
#  *  defang  = Accept the file but mangle name to make it less dangerous.
#     mangle  = Alter the file name completely.
#     save    = Remove from the message, but save in the file_name_tpl directory.
#     drop    = Remove from the message, but leave test there that this has been done.
#  
#
file_default_policy = defang


# See this entry in the CHANGELOG for version 1.60:
#
#    Made the filename checker check ALL possible file names against
#    each rule, instead of just checking the "default" one. If
#    feat_mime_files is set, then the default file-name for that mime
#    type will be checked as well. This is a major improvement to
#    security, but requires that filename rules are ordered so that

#    all DROP/DEFANG/MANGLE rules precede any ACCEPT rules.

# Beyond here no more items have defaults.


# Number of rulesets we are defining.
#
file_list_rules = 2

# Both the following rule sets do not use an external scanner
# program.

####  Rule 1  Drop (delete) probably nasty attachments.
####
####
#
#
# In practice with long virus executables, Anomy passes on to
# the output message a shortened and in some way changed
# version of the file with a different, non executable,
# extension.
#
# The (?i) prefix makes the regexp case insensitive.
#
# Note, starting with version 1.56, the following will still
# match file names with spaces such as:
#
#   name=CODE  .bat            
#
# which are invalid MIME, should not produce a file which
# has an executable extension, but nonetheless are sometimes
# created by the Bugbear virus.

file_list_1_scanner = 0
file_list_1_policy = drop

file_list_1 = (?i)(winmail.dat)|
file_list_1 += (\.(exe|com|vb[se]|dll|ocx|cmd|bat|pif|lnk|hlp|ms[ip]|reg|sct|inf
file_list_1 += |asd|cab|sh[sb]|scr|cpl|chm|ws[fhc]|hta|vcd|eml|nws))$

Note: in the above line I used to have vcf, but this is used for Vcard attachments.



####  Rule 2  Allow known "safe" file types and those that may be
####          scanned by the user's desktop virus scanner.
####

file_list_2_scanner = 0
file_list_2_policy = accept
file_list_2 = (?i)\.

#  Word processor and document formats:
file_list_2 += (doc|dot|txt|rtf|pdf|ps|htm|[sp]?html?

#  Spreadsheets:
file_list_2 += |xls|xlw|xlt|csv|wk[1-4]

#  Presentation applications:
file_list_2 += |ppt|pps|pot

#  Bitmap graphic files:
file_list_2 += |jpe?g|gif|png|tiff?|bmp|psd|pcx|jpg

#  Vector graphics and diagramming:
file_list_2 += |vsd|drw|cdr|swf

#  Multimedia:
file_list_2 += |mp3|avi|mpe?g|mov|ram?|mid|ogg|vcf

Note: in the above line I used not to have vcf, but this is used for Vcard attachments.

#  Archives:
file_list_2 += |zip|g?z|rar|tgz|bz2|tar
#  Source code:

file_list_2 += |[ch](pp|\+\+)?|s|inc|asm|patch|java|php\d?|jsp|bas)

# Any file type not listed above gets renamed to prevent
# MS Outlook from auto-executing it - because, above, we
# have already specified:
#
#     file_default_policy = defang

       


The above config file is for all users, and must be readable by each user.  Its location is specified in the command line which calls Anomy Sanitizer - as in my .mailfilter file.  The new lines to run Anomy Sanitizer come after the messages have been run through the mailing list sorting stuff and Spamassassin.  This means it only runs on messages which are not from recognised mailing lists, and of those, the ones I don't filter out to -SPAM or -SPAM-marginal according to Spamassassin's score.

I configure Anomy Sanitizer to drop any attached file which it identifies as being executable (Rule 1 above).  Attached files which match Rule 2 (benign filename extensions) are allowed to pass without changes.  All other files have their filename extension mangled to prevent it being executable if in fact the extension was of an executable type.  Messages with a dropped file were, in the past, simply sent to a virus mailbox, and not at all to the Inbox.  Now, I will send them to the Inbox, but with an additional piece of text at the start of the subject line: "~~~[Executable]".


Maildrop filtering for Anomy Sanitizer output

Here is a complete .mailfilter file, with a real test for a mailing list. It is available as mailfilter-test-4.txt  It uses several new mailboxes, but the lines which use these can easily be changed or deleted.

# Sample Courier Maildrop file to demonstrate filtering messages
# according to:
#
#   1 - Mailing list headers. 
#
#   2 - How Spamassassin scores them.
#
#   3 - Whether Anomy Sanitizer finds an executable attachment
#       and drops it.
#
# Real mailing list sorting may be trickier if there isn't a
# clearly identifiable list-related header line.

logfile "mailfilter-log.txt"

log "========"

if ( /^Subject: Test1/ )
{
    log "-------------------------------------------- Test1 found "
    to "Maildir/.blah"
}


if ( /^List-Id: IETF Discussion/ )
{
    log "-------------------------------------------- IETF "
    cc "Maildir/.Lists.IETF"
    xfilter "subjadd [IETF]"
    DELTAG=$DTAG
    to "$LMB"
}

                                 # Copy all messages which are not
                                 # caught by one of the above mailing
                                 # list filters to be copied to
                                 # special folder so I can find a
                                 # message, if I want to, in
                                 # the state before Spamassassin
                                 # looked at it (and therefore wrote
                                 # something in the headers) and before
                                 # Anomy Sanitizer may have defanged
                                 # its HTML or dropped its attachment.
                                 #
                                 # In this example, "Pre-Filter-Inbox"
                                 # is a mailbox within the mailbox:
                                 # "0-Inbox-Old" - as is another
                                 # mailbox used at the end:
                                 # "Post-Filter-Inbox".
                                   
cc "Maildir/.0-Inbox-Old.Pre-Filter-Inbox"               

xfilter "/usr/bin/spamassassin -x"
 
                               # Look out for marginal scoring spam &
                               # put it somewhere I can scrutinise it
                               # easily, away from the worst stuff.
                               # If the threshold of 3.0 is too low
                               # there will be many false-positives in
                               # -SPAM-marginal, and I will need to
                               # fish them out manually.
                   
if (   ( /X-Spam-Status: No, score=3./  )    \
     ||( /X-Spam-Status: No, score=4./  )    \
     ||( /X-Spam-Status: Yes, score=5./ )    \
   )      
{
    log "-------------------------------- Spam marginal. "   
    to "Maildir/.-SPAM-marginal"        
}

                                # Watch out for header line added by
                                # Spamassassin, which it will be for
                                # anything scoring 5.0 or above.
                                #
                                # Don't allow any blank lines after
                                # the if statement!
if (  /^X-Spam-Flag: YES/  )
{
  log "----------------------------------- Spam general. "
  to "Maildir/.-SPAM"          
}
                                # If we are still processing the
                                # the message, it is because its
                                # Spamassassin score was below 3.0.

                                # Some debugging lines and a place to
                                # save things when I am tweaking
                                # Anomy Sanitizer: a mailbox "Debug".
cc "Maildir/.Debug"
log "Send to Anomy"

                                # Set up the environment variable ANOMY
                                # to keep Anomy happy.
ANOMY=/usr/local/anomy/

                                # Set up two other environment
                                # variables, as required by UNICODE.TXT.
LC_ALL=C
LANG=en_US

                                # Filter the message via stdin to Anomy,
                                # with the config file specified,
                                # logging output being appended to the
                                # Maildrop log file and then the output
                                # being piped to cat so cat's stdout
                                # sends it back to Maildrop.  The use
                                # of "2>>" for appending stderr with
                                # the log material means we need the
                                # "| cat".
                                #
                                # If Anomy's conf file has:
                                #
                                #   feat_log_inline = 0
                                #   feat_log_stderr = 1
                                #
                                # Then a report of Anomy's progress in
                                # working on the message will be
                                # appended to the Maildrop log file. 
                                # This seems to work fine, so
                                # presumably each "log" line for
                                # Maildrop means it opens and closes
                                # the log file.  There could be
                                # multiple instances of Anomy
                                # running at the same time, each on
                                # different message. 
                                #
                                # Anomy Sanitizer can be extremely
                                # verbose in its log output.

xfilter "/usr/local/anomy/bin/sanitizer.pl /usr/local/anomy/anomy.conf 2>>~/mailfilter-log.txt | cat"

log "Anomy done."

                                # Watch out for text added to body
                                # by Anomy Sanitizer when it *drops* a
                                # file which is an attachment within
                                # the email - not just when it renames
                                # a file or defangs some HTML in the
                                # message.  This:
                                #
                                # *** Attached file dropped ***
                                #
                                # is  a part of the drop message I specified
                                # in the Anomy Sanitizer config file.
                                # It always starts at the start of a
                                # line.
                                #
                                # The backslash escapes the asterisk.
                                # An ordinary asterisk is a special
                                # character in the PCRE pattern matching
                                # language, so use:
                                #
                                #   \*
                                #
                                # to match an asterisk.
                                #
                                # ":b" means look in the body, rather
                                # then the headers.


if (  /^\*\*\* Attached file dropped \*\*\*/:b )
{
    log "--------------------------------- Executable attachment! "
   
                                # Make this "cc" for copy or "to" to
                                # not send it to Inbox.
                                # Copy it to the "-Executable"
                                # mailbox.
    cc "Maildir/.-Executable"            
   
                                # Add something highly visible to the
                                # subject line and deliver the message
                                # to the Inbox.  The dropped attachment
                                # is replaced by a text file attachment
                                # with the warning text.
                                #
                                # Intrepid users who want to find a
                                # copy of the message with its original
                                # attachment will find it in the
                                # mailbox "Pre-Filter-Inbox", as
                                # described above.
                                      
    xfilter "subjadd ~~~[Executable]"
    to "Maildir"
}

                                # Copy all messages which are not
                                # caught by one of the above filters
                                # to be copied to special folder
                                # so I can find them, even if I
                                # accidentally delete it from the
                                # Inbox.  I don't intend to keep
                                # this mailbox's contents for long.
                                   
cc "Maildir/.0-Inbox-Old.Post-Filter-Inbox"


                               
    log "- - - - - - - - - - - - - - - - - - - - - -  No match"
  
                                # Deliver to the Inbox.
    to "Maildir"
                                                   



Watch /var/log/maillog if something is not working - Maildrop will complain if it can't run the Anomy Sanitizer line, such as due to a non-existent .directory  If it fails, then tweak the .mailfilter file and then issue the command, as root: postfix flush.  Another approach to seeing SpamAssassin and Anomy Sanitizer working is top -d 0.3.

Anomy Sanitizer 1.76 works!

I was able to see the "X-Sanitizer: ..." header line in each email.

I sent it ordinary emails, those with .jpg attachments and one with a .exe attachment.

All mails which Anomy Sanitizer is fed will lead to a detailed diagnostic report in the mailfilter-log.txt, due to the way I have constructed my .mailfilter file.  This can lead to rather large log files, so beware that Maildrop will spit the dummy if the log file ever gets to 50,000,000 bytes - log rotation is one approach to stopping this, but its not bulletproof if there is a vast number of messages for some reason.  I suppose some shell script could be run at the start of the .mailfilter file to check if the log file was getting too long, and then rename it to a new name so a fresh one would be generated.


Advanced filtering

My filtering system is actually more complex than this.

Firstly, I have some tests for certain messages I want never to see, such as from a persistent troll on some DSP mailing lists who does not seem to be active anymore.

Then I have a bunch of tests for messages from mailing lists, and from other regularly generated sources such as cron reports in various servers.  These are generally copied to their own mailbox and then delivered to the Inbox (called "Maildir" in .mailfilter) tagged for deletion.  For some of them, I add a label at the start of the subject line, such as [IETF] for the main IETF mailing list which curiously has no such label.

Then I have some tests for messages from sources I definitely want to get messages from, but which might fall foul of Spamassassin.  They get copied to the Post-Filter-Inbox and sent to the Inbox, with a log-file line to the effect that the message was "Saved from spam filtering".

Then I use Spamassassin and Anomy Sanitizer exactly as shown above, writing whatever messages are still being processed by Maildrop after this first to the Post-Filter-Inbox and then to the Inbox.

There is much more which can be done with Maildrop, as the maildropfilter.html doco (linked to above) explains.  In combination with external programs, pretty much anything could be done.  

However, one probably has to be a bit of a geek to go this far . . .

Increasing the power of the Bayesian analysis to affect Spamassassin's score for each message

In 2003 I found I could get significantly better results if I increased the weight given to the output of the Bayesian analysis.  Perhaps this was because at the time the other tests, including the online tests (RBL etc. - real-time communication with servers) were not so well developed and/or because I went to more trouble than some users to train the Baysian system with good samples of spam and ham.  Details of why I did this are in the zipped version of the old page at ../Postfix-SA-Anomy-Maildrop/ .

The details of how I do this now are a little different, but the principle is the same.  The Bayesian result is now quantized into 9 levels, whereas before (2003) it was quantized into 14 levels

To see how the Bayesian result is used to affect the final score, see the big table at: http://spamassassin.apache.org/tests_3_2_x.html .  This page explains how to override these defaults with lines in the config file.  The items of interest are below:





L    N    B      BN 
body
Bayesian spam probability is 0 to 1% BAYES_00 0    0   -2.312 -2.599 Wiki
body
Bayesian spam probability is 1 to 5% BAYES_05 0    0   -1.110 -1.110 Wiki
body
Bayesian spam probability is 5 to 20% BAYES_20 0    0   -0.740 -0.740 Wiki
body
Bayesian spam probability is 20 to 40% BAYES_40 0    0   -0.185 -0.185 Wiki
body
Bayesian spam probability is 40 to 60% BAYES_50 0    0    0.001  0.001 Wiki
body
Bayesian spam probability is 60 to 80% BAYES_60 0    0    1.0    1.0 Wiki
body
Bayesian spam probability is 80 to 95% BAYES_80 0    0    2.0    2.0 Wiki
body
Bayesian spam probability is 95 to 99% BAYES_95 0    0    3.0    3.0 Wiki
body
Bayesian spam probability is 99 to 100% BAYES_99 0    0    3.5    3.5 Wiki

The four columns of figures are for four situations:

Column:                  L       N      B      BN
                         Local   Net    Bayes  Bayes & Net
Network tests, such
as RBL etc.              No      Yes    No     Yes

Bayes tests              No      No     Yes    Yes  

If the Bayes result is 60%, 63%, 79% or anywhere from 60 to 80%, then 1.0 will be added to the score.

If the Bayes result is, for instance, 97%, then 3.0 will be added.

Based on my careful analysis in 2003 (which I don't have time to replicate in 2008) I decided to boost the scores for the higher ranges of Bayesian results.

A table of what I did then is as follows:

Bayes    Bayes  Name         Score          Score with
lower    upper               original       my new   
limit    limit                              config lines
                             B     BN        
  0        1    BAYES_00    -5.300 -5.200   <
  1       10    
BAYES_01    -5.400 -5.400   <
 10       20    
BAYES_10    -5.300 -4.701   < 
 20       30    
BAYES_20    -4.701 -2.601   < 
 30       40    
BAYES_30    -1.070 -0.927   < 
 40       44    
BAYES_40     0.001  0.001   <        << Ooops.
 44       50    
BAYES_44     0.001  0.001  -0.500
 50       56    
BAYES_50     0.001  0.001   0.500     
 56       60    BAYES_56     0.001  0.001   2.000
 60       70    BAYES_60     1.997  1.101   3.500     
 70       80    
BAYES_70     2.593  2.310   5.000
 80       90    BAYES_80     5.300  2.862   6.000
 90       99    BAYES_90     4.027  3.002   6.000
 99      100    BAYES_99     5.200  3.008   6.000

Now there are fewer levels to which the Baysian score is quantized.  Here is a table for the current version, with my new values, which are still just guesswork.  With a lot of time, one might be able to statistically analyze the results and come up with something more optimal.  But there are other things to do in life!

Bayes    Bayes  Name         Score          Score with
lower    upper               original       my new   
limit    limit                              config lines
                             B     BN        
  0        1    BAYES_00    -2.312 -2.599   <
  1        5    BAYES_05    -1.110 -1.110   <
  5       20    BAYES_20    -0.740 -0.740   < 
 20       40    BAYES_40    -0.185 -0.185   < 
 40       60    BAYES_50     0.001  0.001   < 
 60       80    BAYES_60    1.000   1.000   2.000  
 80       95    BAYES_80    2.000   2.000   4.000
 95       99    BAYES_95    3.000   3.000   6.000     
 99      100    BAYES_99    3.500   3.500   7.000

The lines I add to the Spamassasin config file (of each user: /home/xx/.spamassassin/.user_prefs) are:

score BAYES_60  2.0
score BAYES_80  4.0
score BAYES_95  6.0
score BAYES_99  7.0

This could go in each user's config file: ~/.spamassassin/user_prefs .  

It would be great to see a histogram of the scores produced by Spamassassin.  Ideally there would be a notch in the middle with the spam on the right and the non-spam on the left, with few in the middle - which is where we want to put our threshold.  It would be interesting to see the distribution of Bayesian scores with respect to the total of scores from other tests.

I will fine-tune the threshold between emails deemed not to be spam and those tossed into Spam-Marginal according to my scrutiny of what false-positives wind-up in Spam-Marginal and what false negatives arrive in the Inbox.

Further fine-tuning of Spamassassin

There is a lot of material at http://spamassassin.apache.org and its FAQ and connected wiki.  Here are some which look promising to me:

http://wiki.apache.org/spamassassin/VBounceRuleset

Easily enabled rules for catching backscatter messages.  But how well could those caught messages be used to train the Bayes system?  They would superficially resemble legitimate bounce messages, I think, so the Bayes system would then be more likely to class legitimate bounces as spam.


http://taint.org/2007/05/30/164456a.html

Linked to from the VBounceRuleset page, a simple config change to help Postfix reject backscatter messages.
.