CentOS 5.1 with
Postfix, Courier IMAPD, Courier Maildrop, Spamassassin
Robin Whittle rw@firstpr.com.au 2008-04-29 (2009-09-09 Minor change to anony.comf to allow Vcard vcf files. 2010-03-24 added "sa-update".)
../
Back to the parent directory concerning web-mail, modifications to
Courier Maildrop etc.
Introduction
Configuring an running a mailserver is a
demanding business. This long page is a how-I-did-it for my
home-office mail server. This is not for massive numbers of
users, with databases for authentication etc. It might suit a
small business or home user. Here is a rough outline of the
software I used:
Postfix MTA (Message Transfer Agent) Widely
respected, fast, secure and easy to administer mailserver replacement
for sendmail.
Postfix is configured here to deliver incoming
messages using the Courier Maildrop program, which can deliver to
mailboxes in the Maildir format. In this format, each message is
a separate file, whereas the traditional mbox format has the entire
mailbox as a single file. For any serious use, mbox sucks and
maildir swings.
Later, I intend to configure Postfix to use
the Sender Policy Framework (SPF)
http://www.openspf.org
to help reduce backscatter emails:
http://en.wikipedia.org/wiki/Backscatter_(e-mail)
.
Courier Maildrop
Courier Maildrop is
called whenever Postfix needs to deliver a message (an email) to a
local user's mailbox. Here I use Maildrop to perform filtering on
the email's headers and body, such as for copying mailing list messages
to their own mailboxes. I use a modified version of Maildrop (as
described in another page), which can deliver messages to the Inbox
tagged for deletion. This is very useful with large volumes of
mailing list messages.
The mailfilter file calls
Spamassassin to do spam filtering
and
anomy-sanitizer to handle
attachments with executable file names, other kinds of malware,
dangerous HTML etc. Anomy-sanitizer hasn't been updated since
2006, but I think it is still good in 2008. It doesn't depend on a feed
of updates containing virus definitions, which makes it a lot simpler
than running an anti-virus program. I run Avast! on my Windows PCs
anyway. An alternative would be to scan for viruses etc. with ClamAV
http://www.clamav.org . A recipe for
integration with Postfix is at:
http://www.postfixvirtual.net/postfixantivirus.html#clamav
.
Courier IMAPD
The Courier IMAPD server
accesses the Maildir mailboxes and makes them available to any email
client program which uses IMAP to store its mailboxes on a server.
For instance, I use Thunderbird on local machines to do this.
In principle, Thunderbird or any other email client program could
be running anywhere on the Net and still access these mailboxes.
However, I think Thunderbird tends to do a lot of not-necessarily
wanted activity, scanning the contents of mailboxes automatically etc.
This could be a problem if operating over a slow or expensive
link, such as when mobile or far from home.
Various web-mail programs One or more web-mail programs can run on
this server, sending outgoing messages to Postfix via SMTP and reading
and writing mailboxes, including the Inbox, via IMAP. I describe
these on separate web-pages. These programs have to be integrated
with the Apache web server. Some of them run with PHP code, so
the Apache server needs to support PHP too.
Here is
an overall diagram depicting how these things work together.
I am not using any fancy database-based authentication system.
Each user whose email is handled by this system has an ordinary
user account on the server.
Maildir format Web-mail email client
mailboxes
/-----------\ /----------\ *********** /---------\ /----------\
| | | | * Inbox * | | | |
| | ====> | Maildrop | ===> * * <======> | Courier | | Postman |
<=== | Postfix | | | =\ *********** | IMAPD | | with | /---------\
===> | MTA | \----------/ | | | IMAP | Apache | | |
SMTP | | | |====> ************ | | <---> | | <-----> | User's |
to & | (Message | Filtering | * YYY * | | | | HTTP | web |
from | Transfer | rules in \=>************ * <==> | | | | or | browser |
other | Agent) | .mailfilter * XXX * *** | | IMAP | SMTP | HTTPS | |
MTAs | | <--\ *** * * <======> | | <-\ | Out | | |
\-----------/ \ ************ \---------/ \ \----------/ | |
\ \ V \---------/
\---<----------------------------------------<---------------/
SMTP outgoing mail \ \ /------------------------\
\ \--> | Thunderbird or other |
\ | IMAP capable Email |
\ | client program on PC. |
\ | |
\-----------<--------- |< SMTP outgoing mail |
| |
\------------------------/
***
Spam and virus filtering
This is where extra programs can go for filtering
viruses ("malware") and spam. Another approach is to have Postfix
detect and refuse to accept spam messages from remote MTAs (on the
left-side of the Postfix box in this diagram). This would cut
down on the volume of mail which needs to be delivered locally.
However this approach is often unacceptable because some false
positives are inevitable in spam detection and the problems resulting
from refusing to accept non-spam email, without the intended recipient
ever knowing that it has been sent or rejected, cannot be tolerated..
This *** location is actually two points in my
.mailfilter file, for each user, which is executed by Courier Maildrop.
One point is where I call Anomy Sanitizer and the other is where
I call SpamAssassin. Postfix
is a collection of programs, and one way of dealing with spam is to
configure Postfix to reject SMTP sessions based on the IP address of
the machine starting the session (such as by a real-time black-hole
lookup of the IP address with a remote service for this) and also on
criteria such as sender address and other things in the header.
This, however, rejects the message entirely, so the recipient
never knows what has been rejected - meaning that if legitimate
messages are rejected, they don't know about it. It has the
advantage of reducing the communications traffic by rejecting spam,
rather than accepting and filtering it, but it involves Postfix in more
work and delays with looking up remote servers to check the IP address.
- Spam Assassin http://spamassassin.apache.org This
is a widely respected spam detection system, which works in various
ways, including by reference to remote blacklists etc. and Bayesian
recognition of spam based on the contents of the user's personal spam
pit.
- The Anomy Sanitizer http://mailtools.anomy.net
Not actually a virus scanner, in terms of identifying specific viruses
(which would require constant updates to its filtering rules) but a
program to eliminate problems in attachments such as Javascript in HTML
emails, and to rename or delete executable files and to do other things
which generally render them less harmful. (This is my rough
summary - see the web page - for instance it can call a virus scanner.)
I have it "defang" various suspect HTML elements, and "drop"
attached files which are executable. In April 2008, this software was
last updated in 2006. I have found it useful, and I am not sure
what other programs do the same job.
My approach to running Spam Assassin
and Anomy Sanitizer is to use the xfilter command from within
Maildrop's .mailfilter file. I have a page on how I did
this, which you can find from the parent directory.
Another
way to run Spam Assassin and Anomy Sanitizer is described by Advosys in
Ottawa: http://advosys.ca/papers/postfix-filtering.html
.
The Postfix configuration changes required for running the
Advosys script for these two programs are entirely separate from how
Postfix is configured to deliver messages with Maildrop. However,
this leads to them running on outgoing mail as well, not just mail to
be delivered to local users' mailboxes. (Actually, the script
becomes part of Postfix's smtp program which also handles the messages
arriving from local clients via the line entering the bottom right of
the Postfix box above.) Advosys solve this by running two
instances of Postfix, with the first one doing the virus/spam filtering
as usual and the second one responding to a second IP address purely
for SMTP outgoing messages from local mail clients - therefore
requiring all the clients be reconfigured. For my own purposes, I
think a better approach is to put Anomy Sanitizer and Spam Assassin in
the local delivery section of Postfix, just before Maildrop. See
my separate page, from the parent directory for how I
did this - running them from within Maildrop. However, Advosys.ca
have a lot of experience with large mail systems, for companies and
educational institutions, and there are good reasons why they run two
instances of Postfix, and do other things to improve efficiency.
For instance, their clients may want to filter for spam and
viruses on all incoming email, but only filter for viruses on outgoing
mail. This is not because they are spammers, but because spam
detection is an imperfect science.
Whether Advosys'
or my approach is used, the aim is to label each suspected spam or
virus message by adding distinctive headers and/or distinctive and
human-readable material in the body of the message. Anomy
Sanitizer typically also changes suspect message bodies to "defang"
suspect HTML tags, Javascript, executable files etc. In the
standard configuration this has the effect of greatly shortening long
virus messages. Then, it is easy in Maildrop's filtering rules to
decide what to do with suspected spam or malware messages. I use
Maildrop with my "Delivered to Inbox Tagged for Deletion" system, with
Cc:ing to separate mailboxes for suspected spam and malware, with the
addition of a "[~~~SPAM]" or "[~~~VIRUS]" subject header for when they
are delivered tagged for deletion in the Inbox. This way, all
messages are visible to the intended recipient - me - but I don't have
go manually select and do something with the individual spams and
viruses which are increasingly flooding my Inbox.
I
think it is best to store these
highly suspected spam and virus
emails somewhere, rather than to delete them manually or automatically
as they are first viewed or arrive at the server. To delete this
stuff manually or automatically risks accidentally deleting a valid
email - because I can't be super cautious and 100% correct in my
judgments whenever I do this, which could be any time of the day.
With largely automated detection and copying to spam and virus
mailboxes, supplemented by manual steps for the same purpose with what
gets past the filters, at least I can periodically look into those
mailboxes to ensure I haven't accidentally turfed something good into
these cesspits.
To ease the problem of searching the spam pit
for false positives (non-spam messages which were automatically
classified as spam) I now sort the messages into two mailboxes:
- Spam-Marginal. Those with a spam score marginally above my
chosen threshold - below some marginally higher threshold.
- Spam.
Those scoring above the higher threshold. It is exceedingly
unlikely that there will be a false positive in this lot, so I
generally don't manually check the contents of this mailbox.
To sort manually through Spam-Marginal, I normally use Thunderbird's
search function to look for my name "Robin" in the body of the message,
since spammers typically only know my email address: rw@firstpr.com.au.
(This is a very good reason for having my account name "rw",
rather than "robin".) I also look for a few other words related
to my business.
In the following sections, I describe how
I installed the IMAP server Courier IMAPD, its required Authentication
Daemon, and Courier Maildrop.
Some alternatives to Courier
IMAPD include the newer Dovecot IMAPD. I tried it once, but I
have always been happy with Courier IMAPD.
The
University of Washington server is not suitable because (last time I
looked) it did not use Maildir mailboxes. See this note for some
history of disputes about mailbox formats and about difficulties with
the IMAP standard itself:
http://www.courier-mta.org/fud/
.
People running really large IMAP systems probably need to
consider more intense solutions such as Cyrus.
Courier
IMAPD is not available via the yum system (see the page from the parent
directory
../ concerning miscellaneous configuration
items) "yum list" using the repositories I chose during installation
(see
../CentOS-5.1-RAID-1). I
wondered whether I could get a suitable RPM of Courier IMAPD via one of
the other repositories listed at:
http://wiki.centos.org/Repositories
. First I tried RPMForge:
https://rpmrepo.org/RPMforge/Finding
but did not find any Courier RPMs there. None of the others
looked like they had it either. There's no sign of any CentOS packages
at the Courier IMAPD site:
http://www.courier-mta.org/imap/
.
I couldn't easily find any pre-built binary RPM, so
I downloaded the source code:
and
tried compiling it. I put it in /opt/courier-imapd/ The
Midnight Commander F2 menu (User menu from the File menu) has a handy
"Extract compressed tar file to subdirectory" function I like to use on
.tar.bz2 etc. tarballs such as this.
Before I can install
Courier IMAPD, I need to install the Courier Authentication Library,
which provides a daemon to handle authentication matters.
Courier Authentication Library 0.60.2 - compile and install
As root, I unzip it and read the
README and INSTALL doco.
I was also reading the INSTALL of Courier
IMAPD at the same time, and became confused. The IMAPD INSTALL
says that the ./configure process must be done by an ordinary user, not
root. In the following, I did the unzip, ./configure and the
compilation (make) - as an ordinary user, but then as root for "make
install" and make "install-configure". Maybe the whole thing
could be done as root.
I did not install "expect" since
I am not using the Courier webmail system, which needs "expect" to
enable the user to change their password.
Since I am
installing Courier for the first time, I create a user and group
"courier": "adduser courier". I don't think I need any
options, so:
./configure
As
mentioned in INSTALL, the configure script seems to go round and round.
make
No problems .
. .
make install
This
failed when I did it as a non-root user, so I did it as root and it
seemed to work fine.
make install-configure
Likewise, as root, this worked too.
Now to make sure the
daemon starts at boot, and to start it now.
INSTALL directs
me to read README_authlib.html about setting up the authentication
modules. I am using ordinary shadow password authentication,
since each IMAP user has a local account on the Linux machine.
This looks like essential reading for anyone doing a more
elaborate authentication then I am - but there is nothing here which
requires me to change anything. Normally, 5 daemons are running,
which is fine for me. Busier systems would require more.
There is a note there on testing the authentication daemon, which
I follow after I have made it run.
To make the daemon start
at boot time, I copy the file courier-authlib.sysvinit (which is built
in the source directory, as part of the compilation process) to
/etc/rc.d/init.d/courier-authlib
with permissions 775. Then I give the command:
chkconfig
--add courier-authlib
Then, I can see that this
script is part of the system:
chkconfig --list
.
. .
conman
0:off 1:off 2:off 3:off
4:off 5:off 6:off
courier-authlib
0:off 1:off 2:on
3:on 4:on 5:on
6:off
cpuspeed
0:off 1:on 2:on
3:on 4:on 5:on
6:off
I start it
now:
/etc/rc.d/init.d/courier-authlib start
Now I can test
it on the command line, as described in README_authlib.html.
authtest
robin
This looks good so far.
Courier IMAPD 4.3.1 - compile and install
I have not investigated using SSL with
IMAP. The INSTALL file details how to do it.
The
INSTALL file has this which needs to be considered first:
You MUST run the configure script as
normal user, not root. Did you
extract the tarball as root? It
won't work. Remove the extracted source
code. Log in as a normal user. Extract the source
code as a normal user,
then run configure. You will do everything
as a normal user, except for
the final step of installing the compiled
software.
I do not wish to include the Gamin or FAM
system for real-time folder updates.
Either GDBM or the
Berkeley DB library is needed. Looking at the output of "yum
list" I find these are installed:
gdbm.i386
1.8.0-26.2.1
gdbm-devel.i386 1.8.0-26.2.1
I need to check that neither inetd or xinetd are listening
on the IMAP port. (I am only using this for IMAP, not POP3.)
xinetd is used on CentOS 5.1. IMAP is TCP port 143 (as specified
in /etc/services). Looking at the files in /etc/xinet.d/ I see
none of them have a line "service IMAP", so this looks OK.
So
maybe I am ready to build Courier IMAPD. . .
./configure
No problems.
make check
Now
become root.
su root
make install
No
problems! Note: The above line should probably be "make
install-strip" to create binaries without debugging stuff. The
INSTALL doco indicates this, but not very clearly.
make
install-configure
No problems!!
There is a
bunch of stuff at /usr/lib/courier-imap/ including the config file:
/usr/lib/courier-imap/etc/imapd .
It is vital to edit
the above file so IMAPD will actually run. The commented
lines below are the original lines for these settings.
# ADDRESS=0
ADDRESS=10.0.0.2
# MAXDAEMONS=40
MAXDAEMONS=500
#MAXPERIP=4
MAXPERIP=100
#IMAPDSTART=NO
IMAPDSTART=YES <<< You will definitely be wanting this
one.
** When I make this machine become gair,
change the address line to include gair's LAN address and (if I choose
to do so) the public IP address.
At the end of this file, the
name of the mailbox directory is specified. By default it is
"Maildir", which is fine. The variable
IMAP_CHECK_ALL_FOLDERS
might be worth looking at.
Previous experience indicates that
Mozilla (now Thunderbird) was much happier with MAXDAEMONS and MAXPERIP
being set to 100 and 20, rather than 40 and 4. However, a
quick
search indicates some people are setting these to values such as
5000 and 100. I chose 500 and 100 quite arbitrarily.
To
make the system start IMAPD at boot time, I copy the file
courier-imap.sysvinit from the source directory to
/etc/rc.d/init.d/courier-imap with
permissions 775. Then I give the command:
chkconfig
--add courier-imap
Then, I can see that this script
is part of the system:
chkconfig --list
.
. .
conman
0:off 1:off 2:off 3:off
4:off 5:off 6:off
courier-authlib
0:off 1:off 2:on
3:on 4:on 5:on
6:off
courier-imap 0:off
1:off 2:on 3:on
4:on 5:on 6:off
cpuspeed 0:off
1:on 2:on 3:on
4:on 5:on 6:off
To start it (but check first that /usr/lib/courier-imap/etc/imapd
has IMAPDSTART=YES) :
/etc/rc.d/init.d/courier-imap start
Right now, there's no Maildir directories on this server to
test it with. I need to install Courier Maildrop and integrate it
with Postfix to generate some maildir format mailboxes to test out the
IMAP server with.
x
Courier Maildrop
2.0.4 - compile and install
This is just installing the standard version, without my modifications
for delivery with the message marked for deletion. See another
section below
#maildrop_mods for how I
change two files and recompile maildrop with these new features.
I unpacked the tarball as a non-root user, but I see nothing in INSTALL
requiring this, so I change to root for the rest of the compilation and
installation
There is a README.Postfix
INSTALL
tells me some things I need to check:
I need the PCRE
library:
http://www.pcre.org - Perl
Compatible Regular Expressions. "yum list" tells me that
pcre.i386 6.6-2.el5_1.7 is installed. I guess I only need this,
not some dev library for it. No . . . running ./configure
complains of a missing "pcre/pcre.h", which sounds like it is in the
development library. So
I need
to install pcre-devel.i386 (6.6-2.el5_1.7):
yum
install pcre-devel
I am not sure why I would want
Courier Maildrop to know about the Courier authentication library.
Its role is to accept an email from Postfix and deliver it to the
local user's Inbox, with potential mailfiltering which may also cause
the message to be processed by other programs and delivered to other
mailboxes of the same user.
Some options I consider using:
--without-db I don't think I use
anything in Maildrop which involves databases, so I could make the
executable smaller by removing that stuff. Maildrop is executed
for every incoming email. It is not much fuss on my server, but
on a busier machine, it might be worthwhile making Maildrop as minimal
as possible.
I decide to use no configure options.
In all the various systems Maildrop can be compiled to work in, there
are various places each user's mailboxes could be stored. The
configure script tries to figure this out. However, INSTALL
states (implicitly after running ./configure) that:
you MUST
always verify the output file, config.h, to make sure that the
settings are correct.
and that there is
another file xconfig.h too which contains things which can't be
automatically determined.
More on this below.
./configure
No problems, but I need to edit some things in the file it
produces:
/maildrop/config.h.
In this system, I want each user's mailbox to be in their home
directory. So for user robin, there is a directory:
/home/robin/Maildir/
In that directory, if there is only an Inbox, then there are
three subdirectories:
/home/robin/Maildir/cur
Contains messages already seen by an
IMAP client.
/home/robin/Maildir/new
Contains messages delivered to this
Inbox but not yet seen by an IMAP
client.
/home/robin/Maildir/tmp
Not used for messages or anything else.
With Courier IMAP, all the
other mailboxes for that user are within the Inbox. Each is a
directory with a similar structure of 3 sub-directories, but each is a
directory with a name such as ".Drafts", where the name of the mailbox
is "Drafts". Each such mailbox can have its own mailboxes, as
far as the email client is concerned, but the Maildir (set of three
directories like those above) for each one is still a directory of the
original /home/robin/Maildir/ directory above. The nesting of
mailboxes is done like this:
To the client they appear as a
tree of directories and sub-directories, such as:
Inbox
|
|--aaa
|
|--Trash
|
|--xxx
|
|--yyy
|
zzz
Physically in the file system, they are as follows,
not counting each directory's three subdirectories and two or three
files which Courier IMAP puts
in each one.
/home/blah/Maildir/.aaa
/home/blah/Maildir/.Trash
/home/blah/Maildir/.yyy
/home/blah/Maildir/.yyy.zzz
Courier IMAPD and Maildrop are very fussy about the user, group and
permissions of these directories, and of each file which contains an
email.
In /maildrop/config.h is the line:
/*
Default mail delivery instruction */
#define
DEFAULT_DEF "/var/mail"
This is All Wrong. It
is not surprising - the configure script had no way of knowing I want
the mailboxes to be in the user's directories.
I need to
change this to:
#define DEFAULT_DEF "./Maildir"
This is in accordance with this part of INSTALL (and with my
past experience with Maildrop):
To use maildrop with [5]qmail, which
normally delivers to $HOME/Mailbox, set DEFAULT_DEF to ./Mailbox.
Then I try my luck compiling it:
make
make install-strip
make install-man
No problems. The output of make install-strip lists
where things are installed. For my system, in summary:
/usr/local/bin/maildrop <<
The maildrop binary.
/usr/local/bin/mailbot
/usr/local/bin/reformail
/usr/local/bin/deliverquota
/usr/local/bin/lockmail
/usr/local/bin/maildirmake
/usr/local/bin/reformime
/usr/local/bin/makemime
/usr/local/bin/makedatprog
/usr/local/bin/makedat
/usr/local/man/man1/lockmail.1
/usr/local/man/man1/maildirmake.1
/usr/local/man/man1/maildrop.1
/usr/local/man/man1/mailbot.1
/usr/local/man/man1/makemime.1
/usr/local/man/man1/reformail.1
/usr/local/man/man1/reformime.1
/usr/local/man/man5/maildir.5
/usr/local/man/man7/maildirquota.7
/usr/local/man/man7/maildropex.7
/usr/local/man/man7/maildropfilter.7
/usr/local/man/man7/maildropgdbm.7
/usr/local/man/man7/maildirquota.7
/usr/local/man/man8/deliverquota.8
/usr/local/share/maildrop/html/maildirquota.html
/usr/local/share/maildrop/html/deliverquota.html
/usr/local/share/maildrop/html/lockmail.html
/usr/local/share/maildrop/html/maildirmake.html
/usr/local/share/maildrop/html/maildropex.html
/usr/local/share/maildrop/html/maildir.html
/usr/local/share/maildrop/html/maildropfilter.html
/usr/local/share/maildrop/html/maildropgdbm.html
/usr/local/share/maildrop/html/maildrop.html
/usr/local/share/maildrop/html/mailbot.html
/usr/local/share/maildrop/html/makemime.html
/usr/local/share/maildrop/html/reformail.html
/usr/local/share/maildrop/html/reformime.html
/usr/local/share/maildrop/html/rfc822.html
/usr/local/share/maildrop/html/rfc2045.html
/usr/local/share/maildrop/html/makedat.html
/usr/local/share/maildrop/html/manpage.css
(The last
time I compiled Maildrop in 2003 or so, it put its binary in
/usr/bin.)
Now to create an initial Maildir format mailbox in
a user directory, to configure Postfix to use Maildrop, and then to
send some test emails.
Configure Postfix in
general and to use Maildrop
This is basic stuff to get Postfix working with Maildrop. Later,
before turning this nair machine into gair I will do some other changes.
To create a Maildir format mailbox for user robin, at /home/robin
I give the command (as user robin):
maildirmake
./Maildir
Before
trying to configure Postfix, I check it is working in the first
place.
I add this machine's current name
"nair.firstpr.com.au" to my DNS, with its address as 10.0.0.2 (I will
remove it once nair switches over to be the new gair). I send a
test email to user robin at this machine.
Nothing
happened - nothing in /var/log/maillog or where the message should be
delivered, without Maildrop: /var/spool/mail/robin/ . Looking in
the maillog of gair (my mailserver) I see "connect to
nair.firstpr.com.au[10.0.0.2]: Connection refused (port 25)". So
Postfix is not running by default.
So I need to check
Postfix's configuration and start it. /etc/postfix/main.cf looks
OK (it is not - see below), though I will change it before this machine
becomes gair. I tried to start Postfix:
/etc/rc.d/init.d/postfix
start
but this failed. A line in
/var/log/maillog indicated it was already running. I can "telnet
localhost 25" and get a response from Postfix. However, I can'
telneting to 10.0.0.2 port 25 from another machine - "connection
refused".
I change these lines in /etc/postfix/main.cf from:
#inet_interfaces = all
#inet_interfaces = $myhostname
#inet_interfaces = $myhostname, localhost
inet_interfaces =
localhost
to
inet_interfaces = all
#inet_interfaces
= $myhostname
#inet_interfaces = $myhostname,
localhost
#inet_interfaces = localhost
and then restart it:
/etc/rc.d/init.d/postfix
restart
Now I telnet to port 25 on this
machine from outside.
In gair, I cause the Postfix server
there to try to send all the messages it has queued:
postfix
flush
and I find the message has arrived in the mbox
format mailbox, the Inbox for user robin: /var/spool/mail/robin .
So now I should be ready to configure Postfix to use Maildrop.
There is a README.postfix file in the Maildrop distribution, but it
only concerns things I think I already know about or can ignore:
I already knew about
local_destination_concurrency_limit=1. There are some notes about
adding a "u" flag somewhere. This has already been done in the
relevant section of /etc/postfix/master.cf which specifies how Maildrop
is called:
# maildrop. See
the Postfix MAILDROP_README file for details.
# Also specify in
main.cf: maildrop_destination_recipient_limit=1
#
maildrop unix -
n
n
-
- pipe
flags=DRhu
user=vmail argv=/usr/local/bin/maildrop -d ${recipient}
The two changes required to
/etc/postfix/main.cf
are to add these two lines. Both config commands are already
mentioned in the file, in a commented form.
mailbox_command = /usr/local/bin/maildrop
local_destination_concurrency_limit=1
The second line is required because Maildrop can only
deliver one email at a time and so should not be asked to deliver to
two or more local mailboxes at the same time. Postfix would
otherwise ask Maildrop to deliver a message to multiple users (if it
was addressed to those users)
It is necessary to restart
Postfix after these changes. (I put a script to do this in the
/etc/postfix directory so I don't have to remember
/etc/rc.d/init.d/postfix restart
.)
There is a note in main.cf that if an external program
(Maildrop in this case) is used for local delivery, an alias must be
created to deliver mail addressed to root to some other account:
# IF YOU USE
THIS TO DELIVER MAIL SYSTEM-WIDE, YOU MUST SET UP AN
# ALIAS THAT
FORWARDS MAIL FOR ROOT TO A REAL USER.
Postfix Aliases
The active
lines in main.cf for aliases are:
alias_maps
= hash:/etc/aliases
alias_database
= hash:/etc/aliases
The relevant notes there are:
# ALIAS DATABASE
#
#
The alias_maps parameter specifies the list of alias databases used
#
by the local delivery agent. The default list is system dependent.
#
#
On systems with NIS, the default is to search the local alias
#
database, then the NIS alias database. See aliases(5) for syntax
#
details.
#
#
If you change the alias database, run "postalias /etc/aliases" (or
#
wherever your system stores the mail alias file), or simply run
#
"newaliases" to build the necessary DBM or DB file.
#
#
It will take a minute or so before changes become visible. Use
#
"postfix reload" to eliminate the delay.
#
#alias_maps
= dbm:/etc/aliases
alias_maps = hash:/etc/aliases
#alias_maps
= hash:/etc/aliases, nis:mail.aliases
#alias_maps
= netinfo:/aliases
#
The alias_database parameter specifies the alias database(s) that
#
are built with "newaliases" or "sendmail -bi". This is a separate
#
configuration parameter, because alias_maps (see above) may specify
#
tables that are not necessarily all under control by Postfix.
#
#alias_database
= dbm:/etc/aliases
#alias_database = dbm:/etc/mail/aliases
alias_database
= hash:/etc/aliases
#alias_database = hash:/etc/aliases,
hash:/opt/majordomo/aliases
At the end of
/etc/aliases is:
# Person who should get root's mail
#root:
marc
I add a line below that:
root:
robin
(** Later, for gair, change this to rw.)
and some notes to myself on what to do when changing aliases:
# After
changing the aliases, run:
#
# newaliases
# postfix reload
The newaliases command
complains about any syntax errors.
This means I don't need
any Maildir format mailbox for root.
Now I can send a message
to user robin or root from another machine and see the message appear
as a file in /home/robin/Maildir/new/.
Some
other changes to do now or later:
I recall that Maildrop will write the
message to an mbox mailbox if there is no Maildir mailbox of the right
name. It will create that mbox file and keep adding to it, until
it reaches 50Mbytes. It is a good idea to check for any such mbox
files, such as if a mailfilter command writes messages to some mailbox
which doesn't exist as a Maildir - for instance due to some typo in the
.mailfilter file.
Also, I configure Maildrop (via each user's
.mailfilter file) to write to a log file: mailfilter-log.txt.
Maildrop will not write to that (or at least earlier versions
wouldn't) if it grew beyond 50Mbytes. So it is best to have some
kind of logrotate process to chop these files back to 0 bytes regularly.
It is highly desirable that whenever I make a new user account
that there will be a Maildir mailbox directory in their home directory.
So, at
/etc/skel I give
the command:
maildirmake ./Maildir
Please search this page for other mentions of /etc/skel - there are
other things which need to go in it to support Spamassassin.
Testing IMAPD and the Authentication Daemon
As noted above, Postfix and Maildrop are
working, receiving messages and delivering them to
/home/robin/Maildir/new/ .
The Authentication Daemon and the
IMAP Daemon are both running.
I set up a new account in
Thunderbird, on my Windows machine, for user robin on the computer
nair.firstpr.com.au. I don't use the IP address - I use
"nair.firstpr.com.au" because I recall some difficulties with this in
the past. So nair needs to be in my DNS.
I disable
"Check for new messages every 10 minutes".
When I delete a
message, I want it to be "Mark as deleted", not "Moved to the Deleted
folder". In Composition and Addressing I turn off "Compose
messages in HTML format". I disable the junk email stuff.
Then when try to open the Inbox of this account, I am asked for my
password and Thunderbird lists my messages.
I rebooted the
machine and tested it all again.
Backup and Reporting
Mailbox Size etc.
I run a
nightly cron job to back all my user directories and their mail to
another Linux machine on the LAN. When I was using Mbox
mailboxes, which could be altered at any time with incoming mail, I
found I could not simply tar-gzip the directories themselves, but had
to copy them to a temporary directory and tar-gzip them there.
Probably, with Maildir mailboxes, I don't need to do this, but it is
part of the backup script and I still use it.
As time goes on
and mail accumulates, mailboxes grow and this poses some problems:
- Storage limits on the server's hard drive.
- Time taken
to copy and tar-gzip for backup purposes.
- Total size of the
entire backup file.
Therefore, mailboxes need to be
emptied or deleted - but which ones to get rid of? In order to
decide, I wanted to know how much storage each mailbox was taking in
the tar-gzip file. Here is a shell script - my slight
modification of an excellent script by
Michael
Carmak on the Courier Users list on 23 September 2002.
Thanks Michael!
maildir-report-3.sh.txt
maildir-report-3.sh.txt.gz
An example of its output follows. It does not report
the final size of the entire backup file, since it does not create any
such file. It generates temporary tarballs of each mailbox, and
reports on the size of that, amongst other things. In the final
backup file, the size would be different, but not by much. So
with this, I can see which mailboxes contribute most to the backup file
size.
This script reports on one user only, and it
reports on all the mailboxes except for the Inbox. To do that, I
would need to modify it to look into
/cur as well. But
I have a pretty good idea of what is in the Inbox since I am using it
all the time. What I want to know about is the mailing list
mailboxes, the virus and spam pits, and also any mailboxes I had
forgotten about which are growing fat, such as ones to keep messages
prior to filtering, the Drafts mailbox etc. A sample output is:
Mailbox:
.0-Inbox-Old.Inbox-02-03
Number
of
files:
1447
Total size of all
files: 21.0592 MB
Disk space
used:
25M
Approx tar.gz
size:
13M
Mailbox:
.0-Inbox-Old.Post-Filter-Inbox
Number
of
files:
9888
Total size of all
files: 294.157 MB
Disk space
used:
326M
Approx tar.gz
size:
107M
Mailbox:
.0-SPAM-etc.0SPAM-HTML-refresh
Number
of
files:
7
Total size of all
files: 56.7617 kB
Disk space
used:
88k
Approx tar.gz
size:
29k
Mailbox:
.Drafts
Number of
files:
1397
Total size of all
files: 22.4293 MB
Disk space
used:
26M
Approx tar.gz
size:
6.1M
Mailbox:
.Lists.Mail-Courier-users
Number
of
files:
5697
Total size of all
files: 21.7214 MB
Disk space
used:
30M
Approx tar.gz
size:
3.8M
Mailbox:
.Lists.Music-dsp
Number of
files:
14324
Total size of all
files: 43.9091 MB
Disk space
used:
67M
Approx tar.gz
size:
10M
Maildrop
Mods - introducing the DELTAG feature
Please see a separate page
../Maildrop-mods-filtering where
I provide new versions of two files which are part of the maildrop code:
/maildir/maildircreate.c
/maildrop/maildir.C
In the
same build tree which resulted from the initial compilation, I copy the
new files into these locations and go back to the main directory.
make
A bunch of new stuff is created, but AFAIK, only the
maildrop binary is affected. It can be found in:
/maildrop/maildrop
I get a bloated version: 764942 bytes. This is mainly
debugging information. So from that directory:
strip
maildrop
Now it is 164948 bytes.
Rename
the original maildrop binary to something different:
/usr/local/bin/maildrop-orig
Copy the new version there:
/usr/local/bin/maildrop
and make sure the new version has the same permissions as the old.
For my system, that is:
User: root Group: mail
Permissions 755
and perform these tests to show it
is working.
Testing Maildrop's
.mailfilter file, subjadd and the new DELTAG feature
Firstly, just send a message to an account
and see it gets delivered to the Inbox.
Now create a
.mailfilter file in the user's directory with the following contents:
# Sample Courier Maildrop file to
demonstrate the DELTAG modifications
logfile "mailfilter-log.txt"
log "========"
if ( /^Subject: Test1/ ) {
log "-------------------------------------------- Test1 found "
to "Maildir/.blah" }
if ( /^Subject: Test2/ ) {
log "-------------------------------------------- Test2 found "
cc "Maildir/.blah" to Maildir } if
( /^Subject: Test3/ ) {
log "-------------------------------------------- Test3 found "
cc "Maildir/.blah" DELTAG=1
to Maildir }
if ( /^Subject: Test4/ ) {
log "-------------------------------------------- Test4 found "
xfilter "subjadd [ABC]" to Maildir }
log "- - - - -
- - - - - - - - - - - - - - - - - No match"
# Deliver to the Inbox. to "Maildir"
|
Don't forget the dot at
the start of ".mailfilter". The above text is available as a
file:
mailfilter-test-1.txt
See the Spamassassin section for additional things to add to this
file later.
The .mailfiter file must not be writable or even
readable by anyone but the user. Maildrop will refuse to use it
if it is otherwise.
The user and
group of this file should be that of the user whose account this is and
the permissions should be 600. Then create a mailbox
(using the email client) called "blah". This will appear as a
maildir with /cur, /new and /tmp at:
/home/robin/Maildir/.blah
Now send three messages, with subjects:
Test1
Test2
Test3
Test4
The "to" command does something with the
message and ends this run of Maildrop. "cc" writes a copy of the
message somewhere and keeps processing it.
The Test1 message
should be copied to the mailbox "blah" and not arrive in the
Inbox.
The Test2 one should be copied to "blah" and
arrive in the Inbox. Likewise Test3, except that it will arrive
in the Inbox tagged for deletion.
The Test4 message
should arrive in the Inbox, with a subject line:
[ABC]
Test4
All messages with subject lines not
starting with "Test1", "Test2", "Test3" or "Test3" should arrive
in the Inbox as they otherwise would without the .mailfilter file.
Appropriate entries will be made in mailfilter-log.txt. For
debugging email mysteries, this is handy to scrutinize in combination
with /var/log/maillog.
The xfilter command of Maildrop is
clearly very powerful. Anything at all could happen as a result
of whatever script, binary or whatever is called.
If the
command ("subjadd" in this case) can't be found, expect a syntax error
and for the message not to be delivered. Postfix will queue it
("defer") and try again from time-to-time. Once the syntax error
is fixed, giving the "postfix flush" command will cause the message to
be presented to maildrop again within a second or so.
If I
change the "subjadd" word above to something which Maildrop can't find,
such as "subjaddxx", then the resulting message in /var/log/maillog
looks like this:
Apr 28
20:25:32 nair postfix/local[29401]: 08A781755B5:
to=<user@example.org>, orig_to=<user@example.org>,
relay=local, delay=0.18, delays=0.08/0.03/0/0.07, dsn=4.3.0,
status=deferred (temporary failure. Command output: bash: subjaddxx:
command not found /usr/local/bin/maildrop: Unable to filter message. )
Always test the syntax of a newly created or modified .mailfilter
file by sending a message to that account. If it doesn't arrive
as it should, then look into /var/log/maillog for a report from
maildrop on what line the syntax problem was on. Postfix writes
this line.
It is very easy to forget brackets or to add extra
blank lines in bracketed things which must not have them.
Maildrop's language is necessarily fiddly, but I have found it
extremely useful since 2001 or so.
Spamassassin
In my final .mailfilter file, I
test the messages with Spamassassin first, and then with Anomy
Sanitizer.
Here I describe installing and configuring
Spamassasin (
http://spamassassin.apache.org),
then likewise Anomy Sanitizer. Then I describe how I call them
from my .mailfilter file.
Update 2010-03-24: I just realised that I should run (as root) "
sa-update"
periodically to (somehow, I haven't figured it out) update the
Spamassassin installation according to fixes which have been made since
this version was created. For instance, there was a problem in
2010 with all messages scoring extra points, as described here:
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6269 . Today after I ran "sa-update" the problem disappeared, and the text in the file:
/var/lib/spamassassin/3.002004/updates_spamassassin_org/72_active.cf
was:
##{ FH_DATE_PAST_20XX
header FH_DATE_PAST_20XX Date =~ /20[2-9][0-9]/ [if-unset: 2006]
describe FH_DATE_PAST_20XX The date is grossly in the future.
##} FH_DATE_PAST_20XX
with the original "
20[1-9][0-9]" original replaced with "
20[2-9][0-9]".
So this will only complain about dates 2020 and beyond - good for another decade!
I selected Spamassasin to be
installed as part of the original CentOS 5.1 installation, which is
documented at another page via
../. "yum
--list" tells me I have Spamassasin 3.1.9-1 installed. However,
3.1.9 is from 2007-06-11. The latest version is 3.2.4 from
2008-01-05. Its install file
http://svn.apache.org/repos/asf/spamassassin/branches/3.2/INSTALL
states that Perl 5.6.1 or later is required. I see from the
directory this CentOS machine has: /usr/lib/perl5/5.8.8 that I have
Perl 5.8.8.
I decided to get rid of 3.1.9-1 and install the
latest version myself.
yum remove spamassassin
OK. I will install the new version independently of the yum and
RPM system. The Download link from the home page leads me to:
Where there is a note:
Please create a local copy of the
report_template text in a file named something like
/etc/mail/spamassassin/10_local_report.cf, and modify it to provide
your tech support desk's contact information, instead of the default.
Otherwise your users will be confused, and some may ultimately contact
the SpamAssassin development team, which is not appreciated;
I will do this later . . . but see below, it seems to happen as part of
the installation process.
The easiest way to get the latest
version seems to be via CPAN, using the cpan program. I actually
did this after using CPAN for the first time to download a module I
needed for Anomy Sanitizer. See the Anomy Sanitizer section below
for my first use of cpan. As instructed in the Download page, I give
the command:
cpan
Mail::SpamAssassin
This results in a bunch of
activity concerning J/JM/JMASON/Mail-SpamAssassin-3.2.4.tar.gz .
I give an email address for contact regarding spam problems.
A bunch of stuff happens which I don't understand. Considering
how much intelligent-looking stuff is happening and how few characters
I typed, I would say this is the easy way to install Spamassassin.
This takes at least an hour on my 824MHz PIII Celeron.
Some of the text at the end of this process includes:
Installing
/usr/lib/perl5/site_perl/5.8.8/spamassassin-run.pod
Installing
/usr/lib/perl5/site_perl/5.8.8/Mail/SpamAssassin.pm
Installing
/usr/lib/perl5/site_perl/5.8.8/Mail/SpamAssassin/DBBasedAddrList.pm
Installing
/usr/lib/perl5/site_perl/5.8.8/Mail/SpamAssassin/SQLBasedAddrList.pm
Installing /usr/lib/perl5/site_perl/5.8.8/Mail/SpamAssassin/Bayes.pm
...
Installing /usr/share/man/man1/sa-update.1
Installing
/usr/share/man/man1/spamc.1
Installing
/usr/share/man/man1/spamassassin.1
Installing
/usr/share/man/man1/spamassassin-run.1
Installing
/usr/share/man/man1/spamd.1
Installing
/usr/share/man/man1/sa-compile.1
Installing
/usr/share/man/man1/sa-learn.1
Installing
/usr/share/man/man3/Mail::SpamAssassin::Plugin::Shortcircuit.3pm
Installing /usr/share/man/man3/Mail::SpamAssassin::Timeout.3pm
Installing /usr/share/man/man3/Mail::SpamAssassin::Plugin::AWL.3pm
Installing
/usr/share/man/man3/Mail::SpamAssassin::Plugin::AutoLearnThreshold.3pm
...
Installing /usr/bin/sa-compile
Installing /usr/bin/sa-update
Installing /usr/bin/spamc
Installing /usr/bin/spamd
Installing /usr/bin/spamassassin
Installing
/usr/bin/sa-learn
Writing
/usr/lib/perl5/site_perl/5.8.8/i386-linux-thread-multi/auto/Mail/SpamAssassin/.packlist
Appending installation info to
/usr/lib/perl5/5.8.8/i386-linux-thread-multi/perllocal.pod
/usr/bin/perl "-MExtUtils::Command" -e mkpath /etc/mail/spamassassin
...
/usr/bin/perl -MFile::Copy -e "copy(q{rules/local.cf}, q{/etc/mail/spamassassin/local.cf})
unless -f q{/etc/mail/spamassassin/local.cf}"
/usr/bin/perl
-MFile::Copy -e "copy(q{rules/init.pre},
q{/etc/mail/spamassassin/init.pre}) unless -f
q{/etc/mail/spamassassin/init.pre}"
/usr/bin/perl -MFile::Copy -e
"copy(q{rules/v310.pre}, q{/etc/mail/spamassassin/v310.pre}) unless -f
q{/etc/mail/spamassassin/v310.pre}"
/usr/bin/perl -MFile::Copy -e
"copy(q{rules/v312.pre}, q{/etc/mail/spamassassin/v312.pre}) unless -f
q{/etc/mail/spamassassin/v312.pre}"
/usr/bin/perl -MFile::Copy -e
"copy(q{rules/v320.pre}, q{/etc/mail/spamassassin/v320.pre}) unless -f
q{/etc/mail/spamassassin/v320.pre}"
/usr/bin/perl
"-MExtUtils::Command" -e mkpath /usr/share/spamassassin
/usr/bin/perl -e "map unlink, </usr/share/spamassassin/*>"
...
/usr/bin/perl build/preprocessor -Mvars -DVERSION="3.002004"
-DPREFIX="/usr" -DDEF_RULES_DIR="/usr/share/spamassassin"
-DLOCAL_RULES_DIR="/etc/mail/spamassassin"
-DLOCAL_STATE_DIR="/var/lib/spamassassin"
-DINSTALLSITELIB="/usr/lib/perl5/site_perl/5.8.8"
-DCONTACT_ADDRESS="rw@firstpr.com.au" -m644 -Irules
-O/usr/share/spamassassin
10_default_prefs.cf
20_advance_fee.cf
20_body_tests.cf
20_compensate.cf
20_dnsbl_tests.cf
20_drugs.cf
20_dynrdns.cf
20_fake_helo_tests.cf
20_head_tests.cf
20_html_tests.cf
20_imageinfo.cf
20_meta_tests.cf
20_net_tests.cf
20_phrases.cf
20_porn.cf
20_ratware.cf
20_uri_tests.cf
20_vbounce.cf
23_bayes.cf
25_accessdb.cf
25_antivirus.cf
25_asn.cf
25_dcc.cf
25_dkim.cf
25_domainkeys.cf
25_hashcash.cf
25_pyzor.cf
25_razor2.cf
25_replace.cf
25_spf.cf
25_textcat.cf
25_uribl.cf
30_text_de.cf
30_text_fr.cf
30_text_it.cf
30_text_nl.cf
30_text_pl.cf
30_text_pt_br.cf
50_scores.cf
60_awl.cf
60_shortcircuit.cf
60_whitelist.cf
60_whitelist_dk.cf
60_whitelist_dkim.cf
60_whitelist_spf.cf
60_whitelist_subject.cf
72_active.cf user_prefs.template
languages sa-update-pubkey.txt
chmod 755 /usr/share/spamassassin
This list of .cf files are config files for various
tests within Spamassassin which are written to
/usr/share/spamassassin/. A note in each says they should not be
changed manually, but to use this command for help regarding
configuration:
perldoc Mail::SpamAssassin::Conf
Better to use the HTML version of this extensive
documentation:
The complete documentation is here:
23_bayes.cf is interesting. This is where raw output
from the Baysian classifier (compares the whole message with a bunch of
stuff generated from two sets of messages, one known to be spam and the
other not) and converts it into some more usable things.
I
did not actually have a local.cf file copied to my
/etc/mail/spamassassin directory, since I had already written a file of
this name there, generated as noted below from a web page. I see
most of the stuff was installed in
/usr/lib/perl5/site_perl/5.8.8/Mail/SpamAssassin.
I generate a config file with the web page:
http://www.yrex.com/spam/spamconfig.php
My notes on how I did this are:
Low Threshold (5.0, default)
Don't
Rewrite Subjects (default)
Don't Use Attachments (0)
Use Bayes
System (default)
Use Auto Learning (default)
Enable RBL Checks
(default)
Use Network Checksum Tests? Choose whether
to use
these services that compare message checksums to
known spam: Vipul's Razor 2.x, DCC, and Pyzor.
These
will only work when the client software for
each service is
installed.
Use Razor 2 if available
Use DCC if available
Use Pyzor if available
SpamAssassin 3.1 Note: Due to
licensing issues,
Razor2 and DCC are not enabled by default
in
version 3.1. Your administrator must enable their
plugins in /etc/mail/spamassassin/v310.pre or
the
setting above will be ignored.
Use Language Testing:
Analyzes body text rather than simply checking the
character set. This is more effective, but slows
down
SpamAssassin slightly. If this option is
disabled, only the
boldface languages above will be
detected. (ok_languages)
SpamAssassin 3.1 Note: Language checking has been
moved to a plugin in version 3.1. This setting will
not work unless your administrator has enabled the
TextCat
plugin in /etc/mail/spamassassin/v310.pre.
and the
resulting config file is:
#
SpamAssassin config file for version 3.x
#
NOTE: NOT COMPATIBLE WITH VERSIONS 2.5 or 2.6
#
See http://www.yrex.com/spam/spamconfig25.php for earlier versions
#
Generated by http://www.yrex.com/spam/spamconfig.php (version 1.50)
# How many hits before a message
is considered spam.
required_score
5.0
#
Encapsulate spam in an attachment (0=no, 1=yes, 2=safe)
report_safe
0
#
Enable the Bayes system
use_bayes
1
#
Enable Bayes auto-learning
bayes_auto_learn
1
#
Enable or disable network checks
skip_rbl_checks
0
use_razor2
1
use_dcc
1
use_pyzor
1
#
Mail using languages used in these country codes will not be marked
#
as being possibly spam in a foreign language.
ok_languages
all
#
Mail using locales used in these country codes will not be marked
#
as being possibly spam in a foreign language.
ok_locales
all
According to this page: "The usual
location for this file is /etc/mail/spamassassin/local.cf for a
system-wide configuration." so I wrote it there.
There
are 746 tests listed at:
http://spamassassin.apache.org/tests_3_2_x.html
.
According to the Top-Level README file:
http://svn.apache.org/repos/asf/spamassassin/branches/3.2/README
:
The distribution provides "spamassassin", a command line tool to
perform filtering, along with the "Mail::SpamAssassin" module set
which allows SpamAssassin to be used in spam-protection proxy SMTP or
POP/IMAP server, or a variety of different spam-blocking scenarios.
In addition, "spamd", a daemonized version of SpamAssassin which
runs persistently, is available. Using its counterpart, "spamc",
a lightweight client written in C, an MTA can process large volumes of
mail through SpamAssassin without having to fork/exec a perl interpreter
for each message.
In the past, from Maildrop's .mailfilter file, I called
the /usr/bin/spamassasin program, which is a Perl script. I
will keep doing this, since this is not a high-volume mailserver.
Back to the doco
http://svn.apache.org/repos/asf/spamassassin/branches/3.2/INSTALL
I see I need to edit /etc/mail/spamassassin/v310.pre if I am not using
Razor2 (I am not using it), but I don't see which line in v310.pre is
its loadplugin line to comment out.
How do I get Spamassasin
to use its Baysian learning system, for each user, for two mailboxes
full of spam and non-spam respectively? Before trying this, I
will integrate Spamassassin into my test .mailfilter file described
above. I add this before the section "
log "- - - - - - - - - -
- - - - - - - - - - - - No match"".
LMB="Maildir" xfilter "/usr/bin/spamassassin -x"
# Watch out for header line added by
# Spamassassin.
# Don't allow any blank lines after the
# if statement! if ( /^X-Spam-Flag: YES/ ) {
log "----------------------------------- Spam general. "
cc
"Maildir/.-SPAM" #
Make this "cc" for copy or "to" to not
# send it to Inbox. DELTAG=1
xfilter "subjadd ~~~[SPAM]" to "$LMB" }
# The Anomy Sanitizer stuff goes in
here
# when we
are ready to test it too.
|
The resulting file is
available here as
mailfilter-test-2.txt
.
I send the account a test email to ensure it gets through
OK. If I have not specified the location of spamassassin
correctly, or if it won't run, then an ordinary message will never get
to the Inbox.
It doesn't . . . and /var/log/maillog has
something from Maildrop (reported via Postfix, which called Maildrop):
/usr/local/bin/maildrop: Cannot
have world/group permissions on the filter file - for your own good.
Hmmm - I had upset the permissions of the .mailfilter file when I added
this text. It must have permissions 600 and have the user and
group of the user whose account this is. Now it works fine for an
ordinary message. To test Spamassassin and this rudimentary spam
filtering arrangement, I use Thunderbird to make a mailbox called
"-SPAM" and then send a message to this account with the following
subject and body text:
However, beware of the
Baysian learning stuff - I don't want it learning from this test
message what is spam and what is not. So, I meant to turn off
Baysian learning for the while in the /etc/mail/spamassassin/local.cf
file. (I actually turned of the use of bayes to analyze each
message - "use_bayes 0".)
Subject:
BUY TEST SPAM NIGERIA VIAGRA SEX FREE
NIGERIA
VIAGRA SEX ENLAGEMENT FREE REMOVE SALE PENIS $$$ BUY GUARANTEE
MASS
EMAIL OPPORTUNITY HERBAL DIPLOMA WORK AT HOME STOCK AUCTION
Hmm - nothing arrives in any mailbox. "top" indicates that
spamassassin is chewing 20% of CPU cycles and 8% of RAM (256M x 0.8 =
20MBytes).
Beginning of
fuss . . . Scroll down to the next heading to find the
solution.
This
goes on for some minutes, but the RAM usage drops to 3.5%. Then it goes
to 71% of CPU cycles! I would have thought that Postfix or
Maildrop would have timed out by now . . . The system is sluggish
. . . and so far I can't see a maillog message about this message.
I send a benign message, and it doesn't arrive either.
Minutes later the spamassasin process is down to 0.9% of CPU and
0.1% of RAM. But IMAP access is slow and I decide to shut the
machine down and reboot. I found this in maillog:
Apr 29 17:00:40 nair
postfix/local[4580]: 738BF175305: to=<blah@example.org>,
relay=local, delay=386, delays=0.89/0.27/0/385, dsn=4.3.0,
status=deferred (temporary failure. Command output: [4617] warn:
config: path "/usr/share/spamassassin/languages" is inaccessible:
Permission denied [4617] warn: config: path
"/usr/share/spamassassin/languages" is inaccessible: Permission denied
[4617] warn: config: path "/usr/share/spamassassin/languages" is
inaccessible: Permission denied maildrop: Timeout quota exceeded.
[4617] warn: spamassassin: killed by SIGPIPE )
There is no subdirectory "languages"
in /usr/share/spamassassin/. This wasn't a problem until I sent
the spammy test message. I sent another normal message after
rebooting, and the same thing happened - so it was not the spammy
message.
Google only finds 12 mentions of
"/usr/share/spamassassin/languages" and none relate to an error message
like this.
A spamassassin process was behaving the same way,
according to "top". I used "kill" to kill it, via its process
number. However within a second or two another spamassassin
process was running. Thinking of the Sorcerer's Apprentice now .
. .
Was it my change to /etc/mail/spamassassin/local.cf
:
"use_bayes
0"? I changed it back to 1, killed the spamassassin process, did
"postfix flush" and . . . I became confused and perplexed by lack of
messages in the mailboxes and lack of informative messages in
/var/maillog or mailfilter-log.txt. I rebooted the machine again.
The monitor (previously doing X Windows) had no signal, the hard
drives were still being used . . as before. I hit the reset
button and rebooted the machine.
Initially there is no sign
of spamassassin or another sus-looking process "setroubleshootd", which
after this fuss started was hogging about half of the CPU cycles.
I think this is a nasty interaction between a badly configured
spamassassin and SELinux. Googling "setroubleshootd" turns up
only 2350 pages, one of which is:
http://danwalsh.livejournal.com/7995.html
from 2006. So this is a pretty obscure daemon. This page caused
me to look at /var/log/audit/audit.log where I found this daemon had
been adding a verbose entry to the file every 12msec while this
spamassassin trouble was occurring. There were 17M of logs, neatly
bundled into multiple files of 5MB each.
The package doing
this is "setroubleshoot" - I removed it. (Version .noarch
0:1.8.11-4.el5.) I don't have time for this stuff.
"postfix flush" didn't lead to any messages appearing in mailboxes - I
don't know what happened to them.
I sent an ordinary message
and the process repeated, with masses of stuff being written
to /var/log/audit/audit.log - 100KBytes a second!
spamassassin was hogging the CPU.
Solution to fuss: disable SELinux
While this was occurring, using X Windows, I got to System >
Administration > Security Level and Firewall > SELinux and
changed it from "Enforcing" to "Disabled". The other option was
"Permissive". The writing to /var/log/audit/audit.log
immediately stopped, and spamassasin disappeared from "top".
What's more, the ordinary message was delivered to the Inbox.
In its headers only one appeared to be from spamassassin:
X-Spam-Checker-Version: SpamAssassin 3.2.4 (2008-01-01) on nair.firstpr.com.au
I sent the spammy message again. spamassasin appeared
briefly in "top" (with 65% CPU . . .) and then was gone. The
message appeared in the Inbox, and I found this in its headers:
X-Spam-Checker-Version: SpamAssassin 3.2.4 (2008-01-01) on nair.firstpr.com.au
X-Spam-Level: ***
X-Spam-Status: No, score=3.5 required=5.0 tests=ALL_TRUSTED,DRUGS_ERECTILE,
DRUG_ED_CAPS,SUBJ_ALL_CAPS,SUBJ_BUY autolearn=no version=3.2.4
I don't understand how spamassassin scored 3.5 for this when a
few years ago (2003) it scored it as 10. Perhaps because its
sender address now is my address.
I need a spammy message, so
I grab one from the pit and find the original version of it, in my
pre-SA-Anomy mailbox. I edit it as new and address it to the test
account. This time it worked fine. The score was above 5,
so there was a header: "X-Spam-Flag: YES" and this was detected by my
.mailfilter logic, resulting in the message being turfed into the spam
pit and a copy of it sent to the Inbox, with something extra in the
subject line - and tagged for deletion. The message is like this,
with some things turned into xxx. The -6.6 is due to it being
sent with a from address which is mine, rather than the original spam
address
Return-Path: <xx@xxxxxx.xxx.xx>
X-Spam-Flag:
YES
X-Spam-Checker-Version: SpamAssassin
3.2.4 (2008-01-01) on xxxx.xxxxxxx.xxx.xx
X-Spam-Level:
**********
X-Spam-Status: Yes, score=10.1
required=5.0 tests=ALL_TRUSTED,AWL,FS_REPLICA,
REPLICA_WATCH,URIBL_AB_SURBL,URIBL_BLACK,URIBL_JP_SURBL,URIBL_OB_SURBL,
URIBL_SBL,URIBL_SC_SURBL autolearn=spam version=3.2.4
X-Spam-Report:
* -1.4 ALL_TRUSTED Passed through trusted hosts only via SMTP
* 1.2 FS_REPLICA Subject says "replica"
* 3.4 REPLICA_WATCH BODY: Message talks about a replica watch
* 2.0 URIBL_BLACK Contains an URL listed in the URIBL blacklist
* [URIs: reppsrapill.com]
* 1.6 URIBL_AB_SURBL Contains an URL listed in the AB SURBL
blocklist
* [URIs: reppsrapill.com]
* 2.9 URIBL_JP_SURBL Contains an URL listed in the JP SURBL
blocklist
* [URIs: reppsrapill.com]
* 2.1 URIBL_OB_SURBL Contains an URL listed in the OB SURBL
blocklist
* [URIs: reppsrapill.com]
* 2.5 URIBL_SC_SURBL Contains an URL listed in the SC SURBL
blocklist
* [URIs: reppsrapill.com]
* 2.5 URIBL_SBL Contains an URL listed in the SBL blocklist
* [URIs: reppsrapill.com]
* -6.6 AWL AWL: From: address is in the auto white-list
X-Original-To:
xxxxx@xxxx.xxxxxxx.xxx.xx
Delivered-To: xxxxx@xxxx.xxxxxxx.xxx.xx
Received:
from gair.firstpr.com.au (gair.firstpr.com.au [10.0.0.1])
by xxxx.xxxxxxx.xxx.xx
(Postfix) with ESMTP id
59B46175579
for <xxxxx@xxxx.xxxxxxx.xxx.xx>; Tue, 29 Apr 2008 19:05:55 +1000
(EST)
Received: from [10.0.0.6] (unknown
[10.0.0.6])
by
gair.firstpr.com.au (Postfix) with ESMTP
id E259459DA1; Tue, 29 Apr 2008 19:05:54 +1000 (EST)
Message-ID:
<4816E500.8020706@firstpr.com.au>
Date:
Tue, 29 Apr 2008 19:06:08 +1000
From:
Robin Whittle <rw@firstpr.com.au>
Organization:
First Principles
User-Agent: Thunderbird 2.0.0.12
(Windows/20080213)
MIME-Version: 1.0
To:
xxxxx@xxxx.xxxxxxx.xxx.xx
Subject:
Replica Rolex Swiss Watches
Content-Type: text/plain;
charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
X-Antivirus:
avast! (VPS 080429-0, 29/04/2008), Inbound message
X-Antivirus-Status:
Clean
Buy
the Patek Philippe watch and know everything about time!
What to look for when purchasing
a replica watch
Startling
Brietling watches at Replica Classics
http://reppsrapill.com/
While this is not my final arrangement for detecting
spam, it shows that Spamassassin is working.
I don't see
where Bayes was used. Bayes autolearn was on, which is bad - I
had meant to turn it off. I don't want it learning from these
test messages. Perhaps it was not used because it hasn't had a chance
to learn enough yet.
A better filtering
arrangement - for Spam-marginal
In my previous (2003 to April 2008) Spamassassin arrangement, I left
Spamassassin's threshold at 5.0 - which controls whether it adds the
"X-Spam-Flag: YES" header or not - and used separate tests for the
actual spam score (previously known as "hits").
Anything
below 2 went to my Inbox as usual. Anything scoring between 3.0
and 7.99 went to the -SPAM-marginal mailbox, and no to the Inbox.
Anything above 8.0 went to the -SPAM mailbox and not to the Inbox.
Here is how I do this in the new system, but at present I don't
know enough about the behaviour of the scoring system to decide what
threshold I want to use. This is a complete .mailfilter file:
# Sample Courier Maildrop file to
demonstrate filtering messages # according to how Spamassassin
scores them. logfile "mailfilter-log.txt" log
"========" if ( /^Subject: Test1/ ) {
log "--------------------------------------------
Test1 found " to "Maildir/.blah" }
xfilter "/usr/bin/spamassassin -x"
# Look out for marginal scoring spam &
# put it somewhere I can scrutinise it
# easily, away from the worst stuff.
# If the threshold of 3.0 is too low
# there will be man false-positives in
# -SPAM-marginal, and I will need to
# fish them out manually.
if ( ( /X-Spam-Status: No, score=3./
) \ ||( /X-Spam-Status:
No, score=4./ ) \
||( /X-Spam-Status: Yes, score=5./ ) \
) {
log "-------------------------------- Spam marginal.
" to
"Maildir/.-SPAM-marginal"
}
# Watch out for header line added by
# Spamassassin, which it will be for
# anything scoring 5.0 or above.
#
# Don't allow any blank lines after the
# if statement! if ( /^X-Spam-Flag: YES/ ) {
log "----------------------------------- Spam general. "
to
"Maildir/.-SPAM"
}
# The Anomy Sanitizer stuff goes in here
# when we are ready to test it too. log "-
- - - - - - - - - - - - - - - - - - - - - No match"
# Deliver to the Inbox. to "Maildir" |
I made a mailbox -SPAM-marginal
and tested the system with various messages. The above is
mailfilter-test-3.txt and is a
complete little .mailfilter file which could be adapted for spam
detection, mailing list filtering etc. after some reading of the
documentation for the Maildrop filtering language.
In this
arrangement, scores in these ranges result in:
Below 3.0:
Inbox
3.0 to 5.999: -SPAM-marginal
6.00 and above: -SPAM
Although Spamassassin sets the
"X-Spam-Flag: YES" for scores of 5.0 or above, in the above
arrangement, messages scoring 5.0 to 5.00 never get to the second test,
because they are caught and sent to -SPAM-marginal instead.
So as long as Spamassassin's threshold is within the range of what is
considered marginal spam, then the range of scores for -SPAM-marginal
can be fine-tuned by adding, deleting (or commenting out) and changing
the contents of lines such as:
||( /X-Spam-Status: Yes, score=4./ ) \
I sent the two spammy test messages mentioned above,
and an ordinary message, and the behavior was as expected.
Spamassassin takes a few seconds to crunch each message, including
simple messages which are not very spammy.
In a section below
I explain why and how I configured Spamassassin to give more weight to
Baysian scores. Before that, the examples I give use the normal
scoring arrangement.
Training the Bayesian
function of Spamassassin - per user
In order to improve Spamassassin's
assessments of messages, I want to give it two thousand or so, non-spam
messages and likewise known spam messages, to train its Baysian
function. The doco is here:
Last time I tried this in 2003, I found that in order to get
sa-learn to work I had to create this directory and empty file in the
user account:
~/.spamassassin/user_prefs
This needs to go in
/etc/skel/ too.
I
prepared two mailboxes of messages: SA-OK and SA-SPAM. I did this
in my regular email account rw@firstpr.com.au using the old version of
gair, which is still running while I build the new one, currently
called nair. I did this with Thunderbird, and then copied the
contents of these mailboxes to mailboxes of the same name in
Thunderbird's account for rw on nair.
By the way, a handy
method of getting a bunch of messages in a Maildir into a single text
file is to store them to a local mailbox of Thunderbird's. There
they are stored in Mbox format, all in one file, with any body line
which starts with "From" changed to "> From".
In SA-OK I
got 1675 ordinary messages of my own, carefully checked not to be spam,
as described below. ("Several thousand" is the recommended number
in sa-learn.html.)
In SA-Spam, I put 1768 spams
(7.8MBytes). I got these by manually picking them from my post
filter mailbox - a copy of all messages (in this case, in the last
30 days) not identified as mailing list messages and which were not
classed as spam - and therefore which were sent to the Inbox.
This included dozens a day which were spam, but I had manually
deleted them from the Inbox as they turned up. So these are
generally the spams which scored rather low in the old system. In
the past month or so, with the old system (Spamassassin 2.61 from
2003): I was getting approximately this number of spams a day:
Score
below 3.0 ~30 a day -> Inbox
3.0 to 7.99
~528 a day -> -SPAM-marginal
8.0 and above ~522 a day -> -SPAM
This is pretty impressive - a 5 year old (hand-tweaked, Bayesian
trained) Spamassassin setup was still catching 97% of spam, with very
few false positives (threshold 3.0). I think I was only getting a
false positive every month or two, but it is possible I missed some.
Ideally, I think, when the new system is running, only one or two
spams will get through, and the false positive rate will still be as
low as one a month or so. Ideally, I will tune the -SPAM-marginal
score range so about 10 to 20% of spam goes there, which will hopefully
still contain all the false-positives, and so be easier and more
reliable to sort through manually.
This level of spam is
mainly addressed (outer envelope) to my main rw@firstpr.com.au
address and others at firstpr.com.au, such as root, postmaster and some
others I alias to rw. Spam addressed to my other domains should
generally caught by separate mail filters, since I don't have any
active email address in those domains. For instance, I get about
40 a day for some address at astroneu.com. However, I need to
look more closely at how I handle, with Postfix's aliases file,
messages addressed to (in the headers or in the SMTP delivery
"envelope") to addresses other than my main email address.
I
deliberately excluded some spams from SA-Spam - those which closely
resemble legitimate emails. For instance I got a bunch of
phishing spams which look very much like emails from Google Adwords.
But does Google Adwords send me emails? Yes - they
sometimes do. I also tried to get rid of backscatter bounce
emails, notification of messages not delivered etc. Maybe I
should think of them as spam too. The trouble is, if Spamassassin
did catch them, then I wouldn't get bounces for messages I actually
sent. I manually deleted the non-spam messages by scrutinizing
them when sorted according to subject and then by sender. Finally I
searched for my first name in the bodies of the messages. Messages
which look like they are from eBay and PayPal are a problem too, since
I do get legitimate emails from these companies.
For SA-OK, I
used recent good messages from the Inbox, carefully checked in a
similar way, without undeliverable messages. This was 1675
messages, from the last 7 months - 51MBytes. A few are from
lists, announcement systems etc. which I don't handle with the
filtering system. Some are eBay emails, reminders that a domain
is to expire, emails from ticket purchasing systems etc. I
removed backscatter emails from this. So I get about 7.9 messages
a day which I consider proper emails (not counting a few hundred a day
on mailing lists) and about 1000 a day which are spam.
I
copied the contents of these mailboxes with Midnight Commander via the
NFS link from the new machine nair to the old one gair (where I had
sorted them). I had to copy them first from their home in
/home/rw/Maildir to another directory outside /home, because the NFS
system doesn't get past /home. I could have copied them via the
Windows Thunderbird, which has both the rw at gair account and the rw
at nair accounts open at the same time. The NFS copying is
faster, but then I have to set all of them to rw's user and group in
the new machine. In the old machine, the mailboxes have been seen
by Thunderbird, so the messages are all in /cur. In the newly
created mailboxes in the new machine (made with Thunderbird) I copy the
messages into the /new directory, so when I open them with Thunderbird,
it will see them as new messages.
Now, from the rw account in
the new machine, as user rw, I use this script to teach Spamassassin
what's what:
#!/bin/bash
sa-learn --ham --showdots --dir ~/Maildir/.SA-OK/cur
sa-learn
--spam --showdots --dir ~/Maildir/.SA-Spam/cur
As noted
above, this requires the presence of a directory
/home/rw/.spamassassin/ and I think a file there user_prefs. This
took about 24 minutes with sa-learn chewing virtually all the CPU of
this 824MHz PIII Celeron. The results are in
/home/rw/.spamassassin/ as database files bayes_toks (5.2MB) and
bayes_seen (331KB).
More on my final mail filtering
arrangement later in this page.
Anomy Sanitizer 1.76
The
purpose of Anomy Sanitizer is detailed on the home page:
It scans for files with names which would make them
executable (for instance on Windows, there is a long list of such
filename extensions) and changes their name so they are not directly
executable. It also defangs HTML which could pose security risks.
I have been running Anomy Sanitizer 1.56 (2003-10-23) until now
(2008-04-29) and am now installing version 1.76 (2006-01-03). I
have extensive notes on my old installation in the zip file of the old
page, at its directory:
../Postfix-SA-Anomy-Maildrop/
.
The documentation is at:
http://mailtools.anomy.net/sanitizer.html
.
I downloaded the tarball and unzipped it to:
/opt/anomy-sanitizer-1.76/. I then copied the /anomy/ directory
from this (which contains everything) to
/usr/local/anomy/ This
means the location of the main Perl script is:
/usr/local/anomy/anomy/bin/sanitizer.pl
Anomy needs a Perl interpreter of 5.005_03 or later, and these Perl
modules:
- MIME::Base64
- MIME::QuotedPrint
How do I test if these are present in my system? CPAN is the way
to get Perl modules:
http://en.wikipedia.org/wiki/Cpan
Based on the last time I tried this, I gave the command:
LANG=C perl -MCPAN -e
shell
though I think just "cpan" might have done the
trick.
It asked if I was
ready for manual configuration. Yes.
Cache directory
/root/.cpan/ . 10M. atstart. yes. no. history filename =
/root/.cpan/histfile. 100. ask. (A bunch of questions I just pressed
Enter for). I selected a local mirror. Then I got a prompt:
"cpan>".
To this I gave commands to load modules:
install MIME::Base64
After some chewing away, it either downloaded this or
checked it and told me this module is up-to-date.
(I know
next to nothing about Perl modules and am not in the mood for learning
- I just want to install mailserver software right now.)
install MIME::QuotedPrint
In this case, the cpan program instantly told me it was up-to-date.
I ran the test cases by giving the command "
./testall.sh"
in the /usr/local/anomy/ . This lead to a warning message:
WARNING: Your default language setting
is en_US.UTF-8, which may enable
UTF-8 (unicode) support in various
programs, including Perl. This
may cause the Anomy Sanitizer
to malfunction.
Please read the file UNICODE.TXT for further
information.
followed by successful completion of all
tests except two which were skipped due to "F-Prot" not being
installed. I didn't do any more tests - I will test its actual
invocation from my .mailfilter .
UNICODE.TXT tells me that I
need to set two environment variables before calling Anomy Sanitizer in
the mail filtering system:
LC_ALL=C
LANG=en_US
Configuration
There is no configuration file in the
tarball. Here is how I created a config file, based on my 2003 work:
This is my file /usr/local/anomy/anomy.conf.
Note, the file can have any name, provided it is given to the
program when it is run, as I do in the .mailfilter file, as
noted below.
This is a totally freshly written
config file, but with the business end of it based on the work of Advosys.ca.
Here, I have explicitly set every option documented at: http://mailtools.anomy.net/sanitizer.html
on 31 May 2003, but that page does not mention all features. It
includes: "WARNING: This document is outdated! Please refer to
the CHANGELOG for up-to-date information on new features." Indeed
the CHANGELOG at: http://mailtools.anomy.net/CHANGELOG.sanitizer.txt
there is a new feature: feat_mime_files which is also
explained in a mailing list message from Bjarni Einarsson on 29 May
2003.
I have also tried to document the meaning of different
values for the various options, and indicate the default, in a way
which is clearer than on this page. However, I am a beginner with
this program, there is lots I don't understand - so please do not
consider my comments, or the values I have chosen, as being necessarily
wise.
Looking at the changelog in April 2008, here are some new
options added since I wrote this config file:
Added the following options to configure the HTML cleaner (all are off
by default):
feat_html_noexe Disallow links to executables
feat_html_unknown Allow unknown HTML tags
feat_html_paranoid Paranoid HTML Cleaner mode, bans all src= links
and enables feat_html_noexe paranoia as well.
Made the file-name/MIME-type sanity checks configurable (default on)
via. the feat_sane_names variable. Set to 0 to disable.
This file is available here as: anomy.conf.txt
and anomy.conf.txt.gz
.
#
Configuration file for Anomy Sanitizer
#
# Based on a file from
Advosys Consulting Inc., Ottawa
#
http://advosys.ca/papers/postfix-filtering.html
#
# Works with
Anomy Sanitizer revision 1.60
#
# Doctored by Robin
Whittle http://www.firstpr.com.au/web-mail/
#
# All
config items are set explicitly, with their defaults marked "*",
#
as per http://mailtools.anomy.net/sanitizer.html on 2003-05-31.
# Warn user about unscanned parts, etc.
# 0
Don't warn.
# * 1 Warn.
#
feat_verbose =
1
# Insert log in the message itself.
# 0 Off.
# * 1 Maybe.
# 2 Force.
#
feat_log_inline = 0
# Log to STDERR:
#
0 Don't log.
# * 1 Log.
#
feat_log_stderr =
1
# XML
format for logs.
# * 0 Off.
# 1 On.
#
feat_log_xml = 0
# Include trace info from logs.
# * 0 Off.
# 1 Include.
#
feat_log_trace = 0
# Add scratch space to part headers.
# * 0 Off.
# 1 Add.
#
feat_log_after = 0
# Enable filename-based policy decisions.
# 0 Off.
# * 1 Enable.
#
feat_files = 1
#
Force all parts (except text/html parts) to have file names.
# *
0 Don't force.
# 1 Force.
#
feat_force_name = 0
# Replace all boundary strings with
our own
# NOTE: Always breaks PGP/MIME messages!
# *
0 Off.
# 1 Replace.
#
feat_boundaries = 0
# Protect against buffer overflows
and null values.
# 0 Off.
# * 1 Protect.
#
feat_lengths = 1
# Defang incoming shell scripts.
# 0 Off.
# * 1 On.
#
feat_scripts =
1
# Defang active HTML content - Javascript and more.
# 0 Off.
# * 1 Defang.
#
feat_html
= 1
# Allow Web-bugs.
# * 0 Allow.
# 1 Disallow.
#
feat_webbugs = 0
# Scan PGP signed message parts. See custom message below.
#
* 0 Don't scan.
# 1 Scan
#
feat_trust_pgp = 0
# Sanitize inline uuencoded
files. Bjarni R. Einarsson wrote:
# This should always be set
to 1, or people will be able to send you
# uuencoded
viruses/attachments and they'll slip by the sanitizer.
# Also, if
this is 0 then uuencoded attachments won't be detected as
# such,
and will instead get treated as text or HTML - and will get
#
corrupted by the HTML cleaner.
#
# 0 Don't
sanitize.
# * 1 Sanitize.
#
feat_uuencoded = 1
# Sanitize forwarded messages.
# 0 Don't
sanitize.
# * 1 Sanitize.
#
feat_forwards = 1
# This isn't a test-case configuration.
# * 0 Not
testing.
# 1 Test case.
#
feat_testing = 0
# Fix invalid MIME, if possible.
# 0
Don't fix.
# * 1 Fix.
#
feat_fixmime = 1
# Paranoia about MIME headers etc.
# * 0 Don't be excessively
paraniod.
# 1 ???
#
feat_paranoid = 0
# Scoring and exit status.
# Any message requring this
many modifications
# will cause the sanitizer to return a non-zero
# exit code after processing the entire message.
#
# eg. the
default: score_bad = 100
#
# Here, this is disabled:
#
score_bad = 0
# Depth of recursion when including config files.
#
Default = 5. I left it alone.
#
# max_conf_recursions = 5
# Temp file and quarantine directory. This must exist
# and
be writable by the user running the sanitizer.
# Temporary or saved
files are created using this template.
#
#
file_name_tpl = /var/quarantine/att-$F-$T.$$$
#
# An attachment
named "dude.txt" might be saved as
#
#
/var/quarantine/att-dude-txt.A9Y
#
file_name_tpl =
/var/spool/anomy/att-$F-$T.$$$
# Add two lines of informational headers to
each message.
#
# header_info = X-Sanitizer: Gotcha!
# header_info += \nX-Gotcha: Sanitizer!
#
# Here is my version:
#
header_info = X-Sanitizer: Spam Assassin and Anomy Sanitizer -
see http://www.firstpr.com.au/web-mail/.
# Disable
these built-in headers:
#
header_url = 0
header_rev = 0
# Message to begin
the log.
#
msg_log_prefix = This message has been
sanitized - it may have been altered \n
msg_log_prefix += to
improve security, as described below. \n
# Define a
new, more informative message, for when a file is
# dropped.
#
msg_file_drop = \n*** Attached file dropped ***\n
msg_file_drop += An attachment named %FILENAME was deleted from this \n
msg_file_drop += message because it contained a Windows executable \n
msg_file_drop += or other potentially dangerous file type. \n
msg_file_drop += Contact the system administrator for more information.
\n
# Message suitable for not scanning PGP messages.
#
msg_pgp_warning = PGP encrypted content follows and has not been
sanitized. \n
# Default policy for attached files which do
not match any policy.
# One of:
#
accept = Leave the attachment in the message unchanged.
# * defang = Accept the file but mangle name to make
it less dangerous.
# mangle = Alter
the file name completely.
# save =
Remove from the message, but save in the file_name_tpl directory.
# drop = Remove from the message,
but leave test there that this has been done.
#
#
file_default_policy = defang
# See this entry in the CHANGELOG for version 1.60:
#
#
Made the filename checker check ALL possible file names against
# each rule, instead of just checking the "default" one. If
# feat_mime_files is set, then the default file-name for
that mime
# type will be checked as well. This is a
major improvement to
# security, but requires that
filename rules are ordered so that
# all DROP/DEFANG/MANGLE
rules precede any ACCEPT rules.
# Beyond here no more items have defaults.
# Number of rulesets we are defining.
#
file_list_rules = 2
# Both the following rule sets do not use
an external scanner
# program.
####
Rule 1 Drop (delete) probably nasty attachments.
####
####
#
#
# In practice with long virus executables, Anomy passes on
to
# the output message a shortened and in some way changed
#
version of the file with a different, non executable,
# extension.
#
# The (?i) prefix makes the regexp case insensitive.
#
#
Note, starting with version 1.56, the following will still
# match
file names with spaces such as:
#
# name=CODE
.bat
#
# which are invalid MIME, should not produce a file which
#
has an executable extension, but nonetheless are sometimes
#
created by the Bugbear virus.
file_list_1_scanner = 0
file_list_1_policy = drop
file_list_1 =
(?i)(winmail.dat)|
file_list_1 +=
(\.(exe|com|vb[se]|dll|ocx|cmd|bat|pif|lnk|hlp|ms[ip]|reg|sct|inf
file_list_1 +=
|asd|cab|sh[sb]|scr|cpl|chm|ws[fhc]|hta|vcd|eml|nws))$
Note: in the above line I used to have vcf, but this is used for Vcard attachments.
#### Rule 2 Allow known
"safe" file types and those that may be
#### scanned by
the user's desktop virus scanner.
####
file_list_2_scanner
= 0
file_list_2_policy = accept
file_list_2 = (?i)\.
# Word processor and document formats:
file_list_2 +=
(doc|dot|txt|rtf|pdf|ps|htm|[sp]?html?
# Spreadsheets:
file_list_2 += |xls|xlw|xlt|csv|wk[1-4]
# Presentation
applications:
file_list_2 += |ppt|pps|pot
# Bitmap
graphic files:
file_list_2 += |jpe?g|gif|png|tiff?|bmp|psd|pcx|jpg
# Vector graphics and diagramming:
file_list_2 +=
|vsd|drw|cdr|swf
# Multimedia:
file_list_2 +=
|mp3|avi|mpe?g|mov|ram?|mid|ogg|vcf
Note: in the above line I used not to have vcf, but this is used for Vcard attachments.
# Archives:
file_list_2 += |zip|g?z|rar|tgz|bz2|tar
# Source code:
file_list_2 += |[ch](pp|\+\+)?|s|inc|asm|patch|java|php\d?|jsp|bas)
# Any
file type not listed above gets renamed to prevent
# MS Outlook
from auto-executing it - because, above, we
# have already
specified:
#
# file_default_policy =
defang
The above config file is for all users, and
must be readable by each user. Its location is specified in the
command line which calls Anomy Sanitizer - as in my
.mailfilter
file. The new lines to run Anomy Sanitizer come after the
messages have been run through the mailing list sorting stuff and
Spamassassin. This means it only runs on messages which are not
from recognised mailing lists, and of those, the ones I don't filter
out to -SPAM or -SPAM-marginal according to Spamassassin's score.
I configure Anomy Sanitizer to drop any attached file which it
identifies as being executable (Rule 1 above). Attached files
which match Rule 2 (benign filename extensions) are allowed to pass
without changes. All other files have their filename extension
mangled to prevent it being executable if in fact the extension was of
an executable type. Messages with a dropped file were, in the
past, simply sent to a virus mailbox, and not at all to the Inbox.
Now, I will send them to the Inbox, but with an additional piece
of text at the start of the subject line: "~~~[Executable]".
Maildrop filtering for Anomy Sanitizer output
Here is a complete .mailfilter file, with
a real test for a mailing list. It is available as
mailfilter-test-4.txt It uses
several new mailboxes, but the lines which use these can easily be
changed or deleted.
# Sample Courier Maildrop file to
demonstrate filtering messages # according to: #
# 1 - Mailing list headers. # # 2
- How Spamassassin scores them. # # 3 - Whether
Anomy Sanitizer finds an executable attachment
# and drops it. # # Real
mailing list sorting may be trickier if there isn't a # clearly
identifiable list-related header line. logfile
"mailfilter-log.txt" log "========" if ( /^Subject:
Test1/ ) { log
"-------------------------------------------- Test1 found "
to "Maildir/.blah" } if (
/^List-Id: IETF Discussion/ ) { log
"-------------------------------------------- IETF "
cc "Maildir/.Lists.IETF"
xfilter "subjadd [IETF]" DELTAG=$DTAG
to "$LMB" }
# Copy all messages which are not
# caught by one of the above mailing
# list filters to be copied to
# special folder so I can find a
# message, if I want to, in
# the state before Spamassassin
# looked at it (and therefore wrote
# something in the headers) and before
# Anomy Sanitizer may have defanged
# its HTML or dropped its attachment.
#
# In this example, "Pre-Filter-Inbox"
# is a mailbox within the mailbox:
# "0-Inbox-Old" - as is another
# mailbox used at the end:
# "Post-Filter-Inbox".
cc
"Maildir/.0-Inbox-Old.Pre-Filter-Inbox"
xfilter "/usr/bin/spamassassin -x"
# Look out for marginal scoring spam &
# put it somewhere I can scrutinise it
# easily, away from the worst stuff.
# If the threshold of 3.0 is too low
# there will be many false-positives in
# -SPAM-marginal, and I will need to
# fish them out manually.
if ( ( /X-Spam-Status: No, score=3./
) \ ||( /X-Spam-Status:
No, score=4./ ) \
||( /X-Spam-Status: Yes, score=5./ ) \
) {
log "-------------------------------- Spam marginal.
" to
"Maildir/.-SPAM-marginal"
}
# Watch out for header line added by
# Spamassassin, which it will be for
# anything scoring 5.0 or above.
#
# Don't allow any blank lines after
# the if statement! if ( /^X-Spam-Flag: YES/ ) {
log "----------------------------------- Spam general. "
to
"Maildir/.-SPAM"
}
# If we are still processing the
# the message, it is because its
# Spamassassin score was below 3.0.
# Some debugging lines and a place to
# save things when I am tweaking
# Anomy Sanitizer: a mailbox "Debug". cc "Maildir/.Debug" log
"Send to Anomy"
# Set up the environment variable ANOMY
# to keep Anomy happy. ANOMY=/usr/local/anomy/
# Set up two other environment
# variables, as required by UNICODE.TXT. LC_ALL=C LANG=en_US
# Filter the message via stdin to Anomy,
# with the config file specified,
# logging output being appended to the
# Maildrop log file and then the output
# being piped to cat so cat's stdout
# sends it back to Maildrop. The use
# of "2>>" for appending stderr with
# the log material means we need the
# "| cat".
#
# If Anomy's conf file has:
#
# feat_log_inline = 0
# feat_log_stderr = 1
#
# Then a report of Anomy's progress in
# working on the message will be
# appended to the Maildrop log file.
# This seems to work fine, so
# presumably each "log" line for
# Maildrop means it opens and closes
# the log file. There could be
# multiple instances of Anomy
# running at the same time, each on
# different message.
#
# Anomy Sanitizer can be extremely
# verbose in its log output. xfilter
"/usr/local/anomy/bin/sanitizer.pl /usr/local/anomy/anomy.conf
2>>~/mailfilter-log.txt | cat" log "Anomy done."
# Watch out for text added to body
# by Anomy Sanitizer when it *drops* a
# file which is an attachment within
# the email - not just when it renames
# a file or defangs some HTML in the
# message. This:
#
# *** Attached file dropped ***
#
# is a part of the drop message I specified
# in the Anomy Sanitizer config file.
# It always starts at the start of a
# line.
#
# The backslash escapes the asterisk.
# An ordinary asterisk is a special
# character in the PCRE pattern matching
# language, so use:
#
# \*
#
# to match an asterisk.
#
# ":b" means look in the body, rather
# then the headers. if ( /^\*\*\* Attached file
dropped \*\*\*/:b ) { log
"--------------------------------- Executable attachment! "
# Make this "cc" for copy or "to" to
# not send it to Inbox.
# Copy it to the "-Executable"
# mailbox. cc
"Maildir/.-Executable"
# Add something highly visible to the
# subject line and deliver the message
# to the Inbox. The dropped attachment
# is replaced by a text file attachment
# with the warning text.
#
# Intrepid users who want to find a
# copy of the message with its original
# attachment will find it in the
# mailbox "Pre-Filter-Inbox", as
# described above.
xfilter "subjadd ~~~[Executable]"
to "Maildir" }
# Copy all messages which are not
# caught by one of the above filters
# to be copied to special folder
# so I can find them, even if I
# accidentally delete it from the
# Inbox. I don't intend to keep
# this mailbox's contents for long.
cc "Maildir/.0-Inbox-Old.Post-Filter-Inbox"
log "- - - - - - - - - - - - - - - - - - - - -
- No match"
# Deliver to the Inbox. to "Maildir"
|
Watch
/var/log/maillog
if something is not working - Maildrop will complain if it can't run
the Anomy Sanitizer line, such as due to a non-existent .directory
If it fails, then tweak the
.mailfilter file and then
issue the command, as root:
postfix flush. Another
approach to seeing SpamAssassin and Anomy Sanitizer working is
top
-d 0.3.
Anomy Sanitizer 1.76 works!
I was able
to see the "
X-Sanitizer: ..." header line in each email.
I sent it ordinary emails, those with .jpg attachments and one with a
.exe attachment.
All mails which Anomy Sanitizer is fed will
lead to a detailed diagnostic report in the
mailfilter-log.txt,
due to the way I have constructed my
.mailfilter file.
This can lead to rather large log files, so beware that Maildrop
will spit the dummy if the log file ever gets to 50,000,000 bytes - log
rotation is one approach to stopping this, but its not bulletproof if
there is a vast number of messages for some reason. I suppose
some shell script could be run at the start of the
.mailfilter
file to check if the log file was getting too long, and then rename it
to a new name so a fresh one would be generated.
Advanced filtering
My
filtering system is actually more complex than this.
Firstly,
I have some tests for certain messages I want never to see, such as
from a persistent troll on some DSP mailing lists who does not seem to
be active anymore.
Then I have a bunch of tests for messages
from mailing lists, and from other regularly generated sources such as
cron reports in various servers. These are generally copied to
their own mailbox and then delivered to the Inbox (called "Maildir" in
.mailfilter) tagged for deletion. For some of them, I add a label
at the start of the subject line, such as [IETF] for the main IETF
mailing list which curiously has no such label.
Then I have
some tests for messages from sources I definitely want to get messages
from, but which might fall foul of Spamassassin. They get copied
to the Post-Filter-Inbox and sent to the Inbox, with a log-file line to
the effect that the message was "Saved from spam filtering".
Then I use Spamassassin and Anomy Sanitizer exactly as shown above,
writing whatever messages are still being processed by Maildrop after
this first to the Post-Filter-Inbox and then to the Inbox.
There is much more which can be done with Maildrop, as the
maildropfilter.html doco (linked to above) explains. In
combination with external programs, pretty much anything could be done.
However, one probably has to be a bit of a geek to go
this far . . .
Increasing the power of the Bayesian
analysis to affect Spamassassin's score for each message
In 2003 I found I could get significantly
better results if I increased the weight given to the output of the
Bayesian analysis. Perhaps this was because at the time the other
tests, including the online tests (RBL etc. - real-time communication
with servers) were not so well developed and/or because I went to more
trouble than some users to train the Baysian system with good samples
of spam and ham. Details of why I did this are in the zipped
version of the old page at
../Postfix-SA-Anomy-Maildrop/
.
The details of how I do this now are a little different,
but the principle is the same. The Bayesian result is now
quantized into 9 levels, whereas before (2003) it was quantized into 14
levels
To see how the Bayesian result is used to affect the
final score, see the big table at:
http://spamassassin.apache.org/tests_3_2_x.html
. This page explains how to override these defaults with lines in
the config file. The items of interest are below:
|
|
|
|
L N B
BN |
|
body
|
| Bayesian spam probability is 0
to 1% | BAYES_00 | 0 0 -2.312 -2.599
| Wiki |
body |
| Bayesian spam probability is 1 to 5% | BAYES_05 |
0 0 -1.110 -1.110 | Wiki |
body |
|
Bayesian spam probability is 5 to 20% |
BAYES_20 | 0 0 -0.740 -0.740
| Wiki |
body |
| Bayesian spam probability is 20 to 40% | BAYES_40 |
0 0 -0.185 -0.185 | Wiki |
body |
|
Bayesian spam probability is 40 to 60% |
BAYES_50 | 0 0 0.001
0.001 | Wiki |
body |
| Bayesian spam probability is 60 to 80% | BAYES_60 |
0 0 1.0 1.0 | Wiki |
body |
|
Bayesian spam probability is 80 to 95% |
BAYES_80 | 0 0 2.0
2.0 | Wiki |
body |
| Bayesian spam probability is 95 to 99% | BAYES_95 |
0 0 3.0 3.0 | Wiki |
body |
|
Bayesian spam probability is 99 to 100% |
BAYES_99 | 0 0 3.5
3.5 | Wiki |
The four columns of figures are for four
situations:
Column:
L N
B BN
Local
Net Bayes Bayes & Net
Network tests,
such
as RBL etc. No
Yes No Yes
Bayes tests No
No Yes Yes
If the Bayes result is 60%, 63%, 79% or anywhere from 60 to 80%, then
1.0 will be added to the score.
If the Bayes result is, for
instance, 97%, then 3.0 will be added.
Based on my careful
analysis in 2003 (which I don't have time to replicate in 2008) I
decided to boost the scores for the higher ranges of Bayesian results.
A table of what I did then is as follows:
Bayes
Bayes Name Score
Score with
lower
upper
original my new
limit
limit
config lines
B BN
0 1
BAYES_00 -5.300 -5.200 <
1
10 BAYES_01 -5.400 -5.400 <
10 20 BAYES_10 -5.300 -4.701
<
20 30 BAYES_20 -4.701 -2.601
<
30 40 BAYES_30 -1.070 -0.927
<
40 44 BAYES_40 0.001 0.001
< << Ooops.
44
50 BAYES_44 0.001 0.001
-0.500
50 56 BAYES_50 0.001 0.001 0.500
56
60 BAYES_56 0.001 0.001
2.000
60 70
BAYES_60
1.997 1.101
3.500
70
80 BAYES_70 2.593 2.310
5.000
80 90
BAYES_80
5.300 2.862
6.000
90 99
BAYES_90
4.027 3.002
6.000
99 100
BAYES_99
5.200 3.008
6.000
Now there are
fewer levels to which the Baysian score is quantized. Here is a
table for the current version, with my new values, which are still just
guesswork. With a lot of time, one might be able to statistically
analyze the results and come up with something more optimal. But
there are other things to do in life!
Bayes
Bayes Name
Score Score with
lower
upper
original my new
limit
limit
config lines
B BN
0 1
BAYES_00 -2.312 -2.599 <
1 5 BAYES_05
-1.110 -1.110 <
5 20 BAYES_20 -0.740
-0.740 <
20 40
BAYES_40 -0.185 -0.185 <
40
60 BAYES_50 0.001
0.001 <
60 80
BAYES_60 1.000 1.000 2.000
80 95
BAYES_80 2.000 2.000 4.000
95
99 BAYES_95 3.000
3.000 6.000
99
100 BAYES_99 3.500
3.500 7.000
The lines I add to the Spamassasin config file (of each
user: /home/xx/.spamassassin/.user_prefs) are:
score BAYES_60 2.0
score
BAYES_80 4.0
score
BAYES_95 6.0
score
BAYES_99 7.0
This could
go in each user's config file: ~/.spamassassin/user_prefs .
It would be great to see a histogram of the scores produced by
Spamassassin. Ideally there would be a notch in the middle with
the spam on the right and the non-spam on the left, with few in the
middle - which is where we want to put our threshold. It would be
interesting to see the distribution of Bayesian scores with respect to
the total of scores from other tests.
I will fine-tune the
threshold between emails deemed not to be spam and those tossed into
Spam-Marginal according to my scrutiny of what false-positives wind-up
in Spam-Marginal and what false negatives arrive in the Inbox.
Further fine-tuning of Spamassassin
There is a lot of material at
http://spamassassin.apache.org
and its FAQ and connected wiki. Here are some which look
promising to me:
http://wiki.apache.org/spamassassin/VBounceRuleset
Easily enabled rules for
catching backscatter messages. But how well could those caught
messages be used to train the Bayes system? They would
superficially resemble legitimate bounce messages, I think, so the
Bayes system would then be more likely to class legitimate bounces as
spam.
http://taint.org/2007/05/30/164456a.html
Linked to from the
VBounceRuleset page, a simple config change to help Postfix reject
backscatter messages.
.