How it starts
We generally don't find out about these events until they are well underway, and members are complaining about the slowness of the mail server.
The tool for checking the mail queue is mailq. Run without arguments, it will list a little about all the email that is currently in the queue. You can count the number of emails roughly with mailq | wc -l. The gotcha here is that each email takes three lines, meaning the number you're looking for is a third of that value. Under normal conditions (as I have observed them) there should only be about 40-100 emails in the queue.
0 chavez:~/ticket6199# mailq |wc -l 125
When we looked for #6166, that number was around 985,000; meaning about 300,000 spam messages. #6199 was only about 30,000 messages.
Figuring out the culprit
Generally, the beginning of the mailq will contain a lot more of the backscatter (though its mostly backscatter). The first thing to think about is finding out what account is receiving all of the backscatter. To get an idea for that, have a look at the first 20 or so emails.
mailq |head -60 |more
Look for a recurring email address. If there's a backscatter problem, there's a greater chance that the mailbox is being delivered to a user on the server (rather than forwarding it), the string to look for is something [email protected] (assuming chavez is the server that you're looking at).
For this example lets call the user spam-account, with an email of [email protected].
You can also use a custom script to check the mail log and count how many times the various users have logged in:
mf-check-relay-mail-users
Figuring out where all the spam lives
Depending on the nature of the backscatter, the actual messages might be in several places within the spool.
Run this to get see which dirs might be full.
SPOOL=/var/spool/postfix/; for dir in $(ls $SPOOL);do echo "$dir: $(find $SPOOL/$dir -type f |wc -l)"; done
It looks something like this when run:
0 chavez:/var/spool/postfix# SPOOL=/var/spool/postfix/; for dir in $(ls $SPOOL);do echo "$dir: $(find $SPOOL/$dir -type f |wc -l)"; done active: 2 bounce: 0 corrupt: 0 defer: 22 deferred: 22 dev: 0 etc: 6 flush: 0 hold: 0 incoming: 0 lib: 6 maildrop: 0 pid: 17 private: 0 public: 0 saved: 0 trace: 0 usr: 0 0 chavez:/var/spool/postfix#
If the server were dealing with queue issues, some of those numbers would be in the thousands.
Make a note of which directories are full of spam.
Getting the mail flow started again
Since our members seem to actually want their emails delivered in a timely fashion, we need to non-destructively clean out the queue.
Follow each of these steps!
Step 0: Disable the account
Log into the control panel, and find the spam-account user account and disable it. You must disable the user account to prevent more relaying from happening. Disabling the email address will not stop relaying from happening.
After disabling the account in the control panel, restart saslauthd on the server to flush the cache and prevent the user account from being used to relay more mail:
/etc/init.d/saslauthd restart
Step 1: Stop postfix
service postfix stop
Step 2: Create clean spool dirs
Now that our postfix spool directories aren't being written to, we can clean out directories. Since we don't want to destroy any real emails that are in the queue, we want to move the spool directories without deleting the mail itself.
Lets say the problem dirs in this case are incoming and active. First move them to something with a different name that postfix won't write to. For the scripts that I've written I use two variant suffixes. .spamfull and .name-collisions. I'll get to .name-collisions later, for now, we're just moving the dir.
Note: Remember to change these if you're not dealing with incoming or active.
mv /var/spool/postfix/incoming /var/spool/postfix/incoming.spamfull mv /var/spool/postfix/active /var/spool/postfix/incoming/active.spamfull
Now we need to recreate the original dirs.
mkdir /var/spool/postfix/incoming mkdir /var/spool/postfix/active
And make sure the permissions are correct.
chmod 700 /var/spool/postfix/incoming chmod 700 /var/spool/postfix/active chown postfix:postfix /var/spool/postfix/incoming chown postfix:postfix /var/spool/postfix/active
Now we have empty queue directories that postfix can write to. When it restarts all new mail should start getting handled as normal.
Step 3: Start Postfix
service postfix start
Email should once again be flowing at reasonable speeds... and now we can worry about separating out backscatter spam from the ham.
Reinserting the good messages back into the mail queue
So now that we have the new mail getting delivered, we need to get the real email messages delivered to their recipients.
This step takes a little bit of art. What we're trying to do is come up with a couple of grep expressions that will hopefully match all of the spam messages. Thankfully, spammers usually have similar stuff in all of the emails they send out.
Have a look at the first few emails in the .spamfull directory, particularly looking for mails to [email protected]. That's your first grep expression. In #6166 the second pattern was that all the messages were pointing to various domains with a media.php file. In #6199 all of them had an email address in the body ([email protected]).
Figuring this out takes a little work because the emails themselves are in a binary file format, so we need to run them through strings. I tend to work on a subset of mails.
for foo in $(ls incoming.spamful/ |head -20);do strings incoming.spamful/$foo > ~/ticket6199/testmails/$foo-strings-spamful;done
That will give you twenty messages, hopefully containing spam and ham. Be sure to delete this dir when you are done, since there are some actual emails there'''
Test your greps. For example, lets say our two criteria are [email protected] and media.php. Any mail that contains both of those is spam.
This is an edited example of how to look for such things. Assuming that you are in the test mail dir
0 chavez:~/ticket6199/testmails# for foo in $(ls); do echo "$foo: $(grep -c -e [email protected] -e [email protected] $foo)"; done 000338CE68-strings-spamful: 3 0003A711C9-strings-spamful: 3 001127DED8-strings-spamful: 3 001387199E-strings-spamful: 3 0014B8C579-strings-spamful: 0 001FD8D839-strings-spamful: 2 002138C8FE-strings-spamful: 3 002278D8B0-strings-spamful: 2 0023D4EF87-strings-spamful: 3 00285C0EC-strings-spamful: 2 0029970F32-strings-spamful: 0 002A38C95D-strings-spamful: 3 002CA8DCD9-strings-spamful: 2 002D7713F4-strings-spamful: 3 002FF4E7D8-strings-spamful: 2 003568D3ED-strings-spamful: 0 003FF8CB54-strings-spamful: 2 0048B8DBD1-strings-spamful: 2 004F37DCD8-strings-spamful: 0 004F87D469-strings-spamful: 0
The lines with 0 at the end are ham, and everything else spam. Probably in this case, some results with a 1 would have counted as ham (since there could perhaps be legit emails to the spam-account address; however since it is disabled it probably doesn't matter.
So assuming we're satisfied with our grep patterns, you now need to create a handful of scripts to do the actual move of ham back into the mail queue for delivery to their recipients.
Remember the .name-collisions directories. These are there out of an abundance of caution. None of us were sure how postfix assigns names to the email files, so the script checks to make sure it doesn't exist before moving something overtop of it. If there is a name collision we move the message into that directory. With both tickets, this wasn't an issue.
Before running the script(s) we first need to create those dirs. As per our example:
mkdir /var/spool/postfix/incoming.name-collisions mkdir /var/spool/postfix/active.name-collisions
This script will need to be modified to meet the various criteria of your spam search. It will only work on the dirs that don't have sub directories (eg: not defer and deferred)
#!/bin/bash # simple script by nat to cleanup the spam backscatter in mailq that was killing the # server on 9/18/12 SPAM_DIR=/var/spool/postfix/incoming.spamfull HAM_DIR=/var/spool/postfix/incoming COLLISION_DIR=/var/spool/postfix/incoming.name-collisions for message in $(ls -U $SPAM_DIR/);do if [[ $(strings $SPAM_DIR/$message |grep -c -e [email protected] -e media.php) -eq 0 ]];then if [[ ! -f $HAM_DIR/$message ]]; then mv -n $SPAM_DIR/$message $HAM_DIR/; else mv -n $SPAM_DIR/$message $COLLISION_DIR/ fi fi done exit 0
If you have to traverse sub directories, something like this should work. Again, remember to change the grep patterns.
#!/bin/bash # simple script by nat to cleanup the spam backscatter in mailq that was killing the # server on 9/5/12 SPAM_DIR=/var/spool/postfix/defer.spamfull HAM_DIR=/var/spool/postfix/defer COLLISION_DIR=/var/spool/postfix/defer.name-collisions for dir in $(ls $SPAM_DIR);do for message in $(ls -U $SPAM_DIR/$dir);do if [[ $(strings $SPAM_DIR/$dir/$message |grep -c media.php) -eq 0 || $(strings $SPAM_DIR/$dir/$message |grep -c spam-user) -eq 0 ]];then if [[ -d $HAM_DIR/$dir && ! -f $HAM_DIR/$dir/$message ]]; then mv -n $SPAM_DIR/$dir/$message $HAM_DIR/$dir/ else if [[ ! -d $COLLISION_DIR/$dir ]]; then mkdir $COLLISION_DIR/$dir fi mv -n $SPAM_DIR/$dir/$message $COLLISION_DIR/$dir/ fi fi done done exit 0
Once these are done, all of the real emails should be delivered, and you now have a directory full of spam.
These scripts are non-destructive. They move things, not delete them. So if you screwed up your greps, you could always refine or repeat.
I have been using one script per directory that I'm cleaning. Future versions could probably do more than one at a time. Future versions may also benefit from turning the grep patterns into variables as well.
Shortcut for Bulk Servers
If you know a specific expression that can be deleted from the mailq (for example a server that should not be in the queue), you can do a quick one-line command for removal. Be careful that you know removal should occur. From /var/spool/postfix/deferred run:
for i in $(find -type f); do b=$(grep "EXPRESION_TO_REMOVE" $i | cut -f3 -d'/' | cut -f1 -d' '); if [ -n "$b" ]; then postsuper -d $b; fi; done;
Where EXPRESSION_TO_REMOVE = the regular expression that should not be in the mailq.